date:20140723

Re: [PATCH v5 1/3] arm64: ptrace: reload a syscall number after ptrace operations

2014-07-23 Thread AKASHI Takahiro


On 07/24/2014 12:54 PM, Andy Lutomirski wrote:

On 07/22/2014 02:14 AM, AKASHI Takahiro wrote:

Arm64 holds a syscall number in w8(x8) register. Ptrace tracer may change
its value either to:
   * any valid syscall number to alter a system call, or
   * -1 to skip a system call

This patch implements this behavior by reloading that value into syscallno
in struct pt_regs after tracehook_report_syscall_entry() or
secure_computing(). In case of '-1', a return value of system call can also
be changed by the tracer setting the value to x0 register, and so
sys_ni_nosyscall() should not be called.

See also:
 42309ab4, ARM: 8087/1: ptrace: reload syscall number after
  secure_computing() check

Signed-off-by: AKASHI Takahiro 
---
  arch/arm64/kernel/entry.S  |2 ++
  arch/arm64/kernel/ptrace.c |   13 +
  2 files changed, 15 insertions(+)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 5141e79..de8bdbc 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -628,6 +628,8 @@ ENDPROC(el0_svc)
  __sys_trace:
  movx0, sp
  blsyscall_trace_enter
+cmpw0, #-1// skip syscall?
+b.eqret_to_user


Does this mean that skipped syscalls will cause exit tracing to be skipped?


Yes. (and I guess yes on arm, too)

> If so, then you risk (at least) introducing

a nice user-triggerable OOPS if audit is enabled.


Can you please elaborate this?
Since I didn't find any definition of audit's behavior when syscall is
rewritten to -1, I thought it is reasonable to skip "exit tracing" of
"skipped" syscall.
(otherwise, "fake" seems to be more appropriate :)

-Takahiro AKASHI


This bug existed for *years* on x86_32, and it amazes me that no one
ever triggered it by accident. (Grr, audit.)

--Andy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the driver-core tree with the drm tree

2014-07-23 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the driver-core tree got a conflict in
drivers/gpu/drm/armada/armada_crtc.c between commit d8c96083cf5e
("drm/armada: permit CRTCs to be registered as separate devices") from
the drm tree and commit c9d53c0f2d23 ("devres: remove
devm_request_and_ioremap()") from the driver-core tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/gpu/drm/armada/armada_crtc.c
index 3f620e21e06b,3aedf9e993e6..
--- a/drivers/gpu/drm/armada/armada_crtc.c
+++ b/drivers/gpu/drm/armada/armada_crtc.c
@@@ -1064,11 -1039,9 +1064,9 @@@ int armada_drm_crtc_create(struct drm_d
if (ret)
return ret;
  
-   base = devm_request_and_ioremap(dev, res);
-   if (!base) {
-   DRM_ERROR("failed to ioremap register\n");
-   return -ENOMEM;
-   }
 -  base = devm_ioremap_resource(dev->dev, res);
++  base = devm_ioremap_resource(dev, res);
+   if (IS_ERR(base))
+   return PTR_ERR(base);
  
dcrtc = kzalloc(sizeof(*dcrtc), GFP_KERNEL);
if (!dcrtc) {


signature.asc
Description: PGP signature

Re: [dm-devel] [PATCH] md/dm-ioctl.c: optimize memory allocation in copy_params

2014-07-23 Thread Zhang, Yanmin


On 2014/7/24 1:14, Mikulas Patocka wrote:


On Wed, 23 Jul 2014, Alasdair G Kergon wrote:


On Wed, Jul 23, 2014 at 08:16:58AM -0400, Mikulas Patocka wrote:

So, it means that you do not use device mapper at all. So, why are you
trying to change memory allocation in device mapper?
  
So the *test* they run is asking device-mapper to briefly reserve a 64KB

buffer when there is no data to report:  The answer is not to run that
pointless test:)

And if a single 64KB allocation really is a big deal, then patch 'vold'
in userspace so it doesn't ask for 64KB when it clearly doesn't need it!

+ int Devmapper::dumpState(SocketClient *c) {
+char *buffer = (char *) malloc(1024 * 64);

The code has just
#define DEVMAPPER_BUFFER_SIZE 4096
for all the other dm ioctls it issues.

Only use a larger value when it is needed i.e. if DM_BUFFER_FULL_FLAG gets set.

Alasdair

Device mapper shouldn't depend on allocation on any contiguous memory - it
will fall back to vmalloc. I still can't believe that their suggested
patch makes any difference.

This pattern is being repeated over and over in the kernel - for example:

 if (PIDLIST_TOO_LARGE(count))
 return vmalloc(count * sizeof(pid_t));
 else
 return kmalloc(count * sizeof(pid_t), GFP_KERNEL);


 if (is_vmalloc_addr(p))
 vfree(p);
 else
 kfree(p);

- I think we should make two functions that do this operation (for example
kvalloc and kvfree) and convert device mapper and other users to these
functions. Then, other kernel subsystems can start to use them to fix
memory fragmentation issues.


Thank Mikulas and Alasdair. Before sending out the patch, we know the result. :)
It's hard to balance between performance and stability.

Anyway, we would try to change seq_read.

Yanmin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] gpio: remove gpio_ensure_requested()

2014-07-23 Thread Alexandre Courbot

gpio_ensure_requested() has been introduced in Feb. 2008 by commit
d2876d08d86f2 to force users of the GPIO API to explicitly request GPIOs
before using them.

Hopefully by now all GPIOs are correctly requested and this extra check
can be omitted ; in any case the GPIO maintainers won't feel bad if
machines start failing after 6 years of warnings.

This patch removes that function from the dark ages.

Signed-off-by: Alexandre Courbot 
---
 drivers/gpio/gpiolib-legacy.c | 106 --
 include/asm-generic/gpio.h|  15 --
 2 files changed, 12 insertions(+), 109 deletions(-)

diff --git a/drivers/gpio/gpiolib-legacy.c b/drivers/gpio/gpiolib-legacy.c
index 0f9429b2522a..078ae6c2df79 100644
--- a/drivers/gpio/gpiolib-legacy.c
+++ b/drivers/gpio/gpiolib-legacy.c
@@ -5,64 +5,6 @@
 
 #include "gpiolib.h"
 
-/* Warn when drivers omit gpio_request() calls -- legal but ill-advised
- * when setting direction, and otherwise illegal.  Until board setup code
- * and drivers use explicit requests everywhere (which won't happen when
- * those calls have no teeth) we can't avoid autorequesting.  This nag
- * message should motivate switching to explicit requests... so should
- * the weaker cleanup after faults, compared to gpio_request().
- *
- * NOTE: the autorequest mechanism is going away; at this point it's
- * only "legal" in the sense that (old) code using it won't break yet,
- * but instead only triggers a WARN() stack dump.
- */
-static int gpio_ensure_requested(struct gpio_desc *desc)
-{
-   struct gpio_chip *chip = desc->chip;
-   unsigned long flags;
-   bool request = false;
-   int err = 0;
-
-   spin_lock_irqsave(_lock, flags);
-
-   if (WARN(test_and_set_bit(FLAG_REQUESTED, >flags) == 0,
-   "autorequest GPIO-%d\n", desc_to_gpio(desc))) {
-   if (!try_module_get(chip->owner)) {
-   gpiod_err(desc, "%s: module can't be gotten\n",
-   __func__);
-   clear_bit(FLAG_REQUESTED, >flags);
-   /* lose */
-   err = -EIO;
-   goto end;
-   }
-   desc->label = "[auto]";
-   /* caller must chip->request() w/o spinlock */
-   if (chip->request)
-   request = true;
-   }
-
-end:
-   spin_unlock_irqrestore(_lock, flags);
-
-   if (request) {
-   might_sleep_if(chip->can_sleep);
-   err = chip->request(chip, gpio_chip_hwgpio(desc));
-
-   if (err < 0) {
-   gpiod_dbg(desc, "%s: chip request fail, %d\n",
-   __func__, err);
-   spin_lock_irqsave(_lock, flags);
-
-   desc->label = NULL;
-   clear_bit(FLAG_REQUESTED, >flags);
-
-   spin_unlock_irqrestore(_lock, flags);
-   }
-   }
-
-   return err;
-}
-
 void gpio_free(unsigned gpio)
 {
gpiod_free(gpio_to_desc(gpio));
@@ -158,51 +100,3 @@ void gpio_free_array(const struct gpio *array, size_t num)
gpio_free((array++)->gpio);
 }
 EXPORT_SYMBOL_GPL(gpio_free_array);
-
-int gpio_direction_input(unsigned gpio)
-{
-   struct gpio_desc *desc = gpio_to_desc(gpio);
-   int err;
-
-   if (!desc)
-   return -EINVAL;
-
-   err = gpio_ensure_requested(desc);
-   if (err < 0)
-   return err;
-
-   return gpiod_direction_input(desc);
-}
-EXPORT_SYMBOL_GPL(gpio_direction_input);
-
-int gpio_direction_output(unsigned gpio, int value)
-{
-   struct gpio_desc *desc = gpio_to_desc(gpio);
-   int err;
-
-   if (!desc)
-   return -EINVAL;
-
-   err = gpio_ensure_requested(desc);
-   if (err < 0)
-   return err;
-
-   return gpiod_direction_output_raw(desc, value);
-}
-EXPORT_SYMBOL_GPL(gpio_direction_output);
-
-int gpio_set_debounce(unsigned gpio, unsigned debounce)
-{
-   struct gpio_desc *desc = gpio_to_desc(gpio);
-   int err;
-
-   if (!desc)
-   return -EINVAL;
-
-   err = gpio_ensure_requested(desc);
-   if (err < 0)
-   return err;
-
-   return gpiod_set_debounce(desc, debounce);
-}
-EXPORT_SYMBOL_GPL(gpio_set_debounce);
diff --git a/include/asm-generic/gpio.h b/include/asm-generic/gpio.h
index 39a1d06950d9..c1d4105e1c1d 100644
--- a/include/asm-generic/gpio.h
+++ b/include/asm-generic/gpio.h
@@ -63,10 +63,19 @@ static inline struct gpio_chip *gpio_to_chip(unsigned gpio)
 extern int gpio_request(unsigned gpio, const char *label);
 extern void gpio_free(unsigned gpio);
 
-extern int gpio_direction_input(unsigned gpio);
-extern int gpio_direction_output(unsigned gpio, int value);
+static inline int gpio_direction_input(unsigned gpio)
+{
+   return gpiod_direction_input(gpio_to_desc(gpio));
+}
+static inline int

Re: [RFC PATCH] gpiolib: Provide and export gpiod_export_name

2014-07-23 Thread Alexandre Courbot

On Thu, Jul 24, 2014 at 3:12 AM, Guenter Roeck  wrote:
> gpiod_export_name is similar to gpiod_export, but lets the user
> determine the name used to export a gpio pin.
>
> Currently, the pin name is determined by the chip driver with
> the 'names' array in the gpio_chip data structure, or it is set
> to gpioX, where X is the pin number, if no name is provided by
> the chip driver.

Oh, my. I did not even know about this 'names' array. This is pretty
bad - a GPIO provider should not decide what its GPIOs are used for.

Luckily for you, this creates a precedent that makes this patch more
acceptable, in that it is not making the situation worse. Even though
I consider both solutions to be bad, I actually prefer your
gpiod_export_name() function to that 'names' array in gpio_chip...

>
> It is, however, desirable to be able to provide the pin name when
> exporting the pin, for example from platform code. In other words,
> it would be useful to move the naming decision from the pin provider
> to the pin consumer. The gpio-pca953x driver provides this capability
> as part of its platform data. Other drivers could be enhanced in a
> similar way; however, this is not always possible or easy to accomplish.
> For example, mfd client drivers such as gpio-ich already use platform
> data to pass information from the mfd master driver to the client driver.
> Overloading this platform data to also provide an array of gpio pin names
> would be a challenge if not impossible.
>
> The alternative to use gpiod_export_link is also not always desirable,
> since it only creates a named link to a different directory, meaning
> the named gpio pin is not available in /sys/class/gpio but only
> in some platform specific directory and thus not as generic as possible
> and/or useful.
>
> A specific example for a use case is a gpio pin which reports AC power
> loss to user space. Depending on the platform and platform variant,
> the pin can be provided by various gpio chip drivers and pin numbers.
> It would be very desirable to have a well defined location such as
> /sys/class/gpio/ac_power_loss for this pin, so user space knows where
> to find the attribute without knowledge of the underlying platform
> variant or oher hardware details.

As I explained on the other thread, I still encourage you to have
these GPIOs under device nodes that give a hint of who is provided the
GPIO (effectively exporting the (dev, function) tuple to user-space)
instead of having them popping out under /sys/class/gpio where nobody
knows where they come from and name collisions are much more likely.

Your message sounds like you have no choice but have the named GPIO
link under the gpiochip's device node, but this is not the case -
gpio_export_link() let's you specify the device under which the link
should appear. Make that device be your "scu" device and you can have
a consistent sysfs path to access your GPIOs.

Allowing GPIOs to pop up in the same directory with an arbitrary name
is just a recipe for a mess. But that's a recipe that is already
allowed thanks to that 'names' array. So I'm really confused about
what to do with this patch. If you can do with gpio_export_link() (and
I have not seen evidence of the contrary), please go that way instead.

>
> Signed-off-by: Guenter Roeck 
> ---
> Applies to tip of linux-gpio/for-next.
>
>  Documentation/gpio/sysfs.txt  | 12 
>  drivers/gpio/gpiolib-sysfs.c  | 23 ---
>  include/linux/gpio/consumer.h |  9 +
>  3 files changed, 33 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/gpio/sysfs.txt b/Documentation/gpio/sysfs.txt
> index c2c3a97..8e301b2 100644
> --- a/Documentation/gpio/sysfs.txt
> +++ b/Documentation/gpio/sysfs.txt
> @@ -125,7 +125,11 @@ requested using gpio_request():
> /* export the GPIO to userspace */
> int gpiod_export(struct gpio_desc *desc, bool direction_may_change);
>
> -   /* reverse gpio_export() */
> +   /* export named GPIO to userspace */
> +   int gpiod_export_name(struct gpio_desc *desc, const char *ioname,
> +   bool direction_may_change);
> +
> +   /* reverse gpio_export() / gpiod_export_name() */
> void gpiod_unexport(struct gpio_desc *desc);
>
> /* create a sysfs link to an exported GPIO node */
> @@ -136,9 +140,9 @@ requested using gpio_request():
> int gpiod_sysfs_set_active_low(struct gpio_desc *desc, int value);
>
>  After a kernel driver requests a GPIO, it may only be made available in
> -the sysfs interface by gpiod_export(). The driver can control whether the
> -signal direction may change. This helps drivers prevent userspace code
> -from accidentally clobbering important system state.
> +the sysfs interface by gpiod_export() or gpiod_export_name(). The driver
> +can control whether the signal direction may change. This helps drivers
> +prevent userspace code from accidentally clobbering important system state.
>
>  This explicit exporting can

Re: [PATCH v5 3/3] arm64: Add seccomp support

2014-07-23 Thread AKASHI Takahiro


On 07/24/2014 12:52 PM, Andy Lutomirski wrote:

On 07/22/2014 02:14 AM, AKASHI Takahiro wrote:

secure_computing() should always be called first in syscall_trace_enter().

If secure_computing() returns -1, we should stop further handling. Then
that system call may eventually fail with a specified return value (errno),
be trapped or the process itself be killed depending on loaded rules.
In these cases, syscall_trace_enter() also returns -1, that results in
skiping a normal syscall handling as well as syscall_trace_exit().

Signed-off-by: AKASHI Takahiro 
---
  arch/arm64/Kconfig   |   14 ++
  arch/arm64/include/asm/seccomp.h |   25 +
  arch/arm64/include/asm/unistd.h  |3 +++
  arch/arm64/kernel/ptrace.c   |5 +
  4 files changed, 47 insertions(+)
  create mode 100644 arch/arm64/include/asm/seccomp.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3a18571..eeac003 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -32,6 +32,7 @@ config ARM64
  select HAVE_ARCH_AUDITSYSCALL
  select HAVE_ARCH_JUMP_LABEL
  select HAVE_ARCH_KGDB
+select HAVE_ARCH_SECCOMP_FILTER
  select HAVE_ARCH_TRACEHOOK
  select HAVE_C_RECORDMCOUNT
  select HAVE_DEBUG_BUGVERBOSE
@@ -259,6 +260,19 @@ config ARCH_HAS_CACHE_LINE_SIZE

  source "mm/Kconfig"

+config SECCOMP
+bool "Enable seccomp to safely compute untrusted bytecode"
+---help---
+  This kernel feature is useful for number crunching applications
+  that may need to compute untrusted bytecode during their
+  execution. By using pipes or other transports made available to
+  the process as file descriptors supporting the read/write
+  syscalls, it's possible to isolate those applications in
+  their own address space using seccomp. Once seccomp is
+  enabled via prctl(PR_SET_SECCOMP), it cannot be disabled
+  and the task is only allowed to execute a few safe syscalls
+  defined by each seccomp mode.
+
  config XEN_DOM0
  def_bool y
  depends on XEN
diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h
new file mode 100644
index 000..c76fac9
--- /dev/null
+++ b/arch/arm64/include/asm/seccomp.h
@@ -0,0 +1,25 @@
+/*
+ * arch/arm64/include/asm/seccomp.h
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _ASM_SECCOMP_H
+#define _ASM_SECCOMP_H
+
+#include 
+
+#ifdef CONFIG_COMPAT
+#define __NR_seccomp_read_32__NR_compat_read
+#define __NR_seccomp_write_32__NR_compat_write
+#define __NR_seccomp_exit_32__NR_compat_exit
+#define __NR_seccomp_sigreturn_32__NR_compat_rt_sigreturn
+#endif /* CONFIG_COMPAT */
+
+#include 
+
+#endif /* _ASM_SECCOMP_H */
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index c980ab7..729c155 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -31,6 +31,9 @@
   * Compat syscall numbers used by the AArch64 kernel.
   */
  #define __NR_compat_restart_syscall0
+#define __NR_compat_exit1
+#define __NR_compat_read3
+#define __NR_compat_write4
  #define __NR_compat_sigreturn119
  #define __NR_compat_rt_sigreturn173

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 100d7d1..e477f6f 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1115,6 +1116,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
  saved_x0 = regs->regs[0];
  saved_x8 = regs->regs[8];

+if (secure_computing(regs->syscallno) == -1)
+/* seccomp failures shouldn't expose any additional code. */
+return -1;
+


This will conflict with the fastpath stuff in Kees' tree.  (Actually, it's 
likely to apply cleanly, but fail to
compile.)  The fix is trivial, but, given that the fastpath stuff is new, can 
you take a look and see if arm64 can use
it effectively?


I will look into the code later.


I suspect that the performance considerations are rather different on arm64 as 
compared to x86 (I really hope that x86
is the only architecture with the absurd sysret vs. iret distinction), but at 
least the seccomp_data stuff ought to help
anywhere.  (It looks like there's a distinct fast path, too, so the two-phase 
thing might also be a fairly large win if
it's supportable.)

See:

https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath

Also, I'll ask the usual question?  What are all of the factors other than nr 
and args that affect syscall execution?
What are the audit arch values?  Do they match correctly?


As far as I know,


For example, it looks

Re: [PATCH 1/1] Drivers: scsi: storvsc: Add blist flags

2014-07-23 Thread Hannes Reinecke


On 07/22/2014 01:06 AM, K. Y. Srinivasan wrote:

Add blist flags to permit the reading of the VPD pages even when
the target may claim SPC-2 compliance. MSFT targets currently
claim SPC-2 compliance while they implement post SPC-2 features.
With this patch we can correctly handle WRITE_SAME_16 issues.

Signed-off-by: K. Y. Srinivasan 
---
  drivers/scsi/storvsc_drv.c |   10 ++
  1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 29d0329..15ba695 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -327,6 +327,8 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size 
(bytes)");
   */
  static int storvsc_timeout = 180;

+static int msft_blist_flags = BLIST_TRY_VPD_PAGES;
+
  #define STORVSC_MAX_IO_REQUESTS   200

  static void storvsc_on_channel_callback(void *context);
@@ -1449,6 +1451,14 @@ static int storvsc_device_configure(struct scsi_device 
*sdevice)

sdevice->no_write_same = 1;

+   /*
+* Add blist flags to permit the reading of the VPD pages even when
+* the target may claim SPC-2 compliance. MSFT targets currently
+* claim SPC-2 compliance while they implement post SPC-2 features.
+* With this patch we can correctly handle WRITE_SAME_16 issues.
+*/
+   sdevice->sdev_bflags |= msft_blist_flags;
+
return 0;
  }



Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx

2014-07-23 Thread Ren, Qiaowei



On 2014-07-24, Hansen, Dave wrote:
> On 07/23/2014 05:56 PM, Ren, Qiaowei wrote:
>> On 2014-07-24, Hansen, Dave wrote:
>>> On 07/22/2014 07:35 PM, Ren, Qiaowei wrote:
 The checking about MPX feature should be as follow:
 
 if (pcntxt_mask & XSTATE_EAGER) {
 if (eagerfpu == DISABLE) {
 pr_err("eagerfpu not present, disabling
> some
>>> xstate features: 0x%llx\n",
 pcntxt_mask &
>>> XSTATE_EAGER);
 pcntxt_mask &= ~XSTATE_EAGER; } else {
 eagerfpu = ENABLE;
 }
 }
 This patch was merged into kernel the ending of last year
 (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/c
 om
 mi
 t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )
>>> 
>>> Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here?
>>> 
>>> This isn't major, but I can't _ever_ imagine a user being able to
>>> track down why MPX is not working from this message. Should we
>>> spruce it up somehow?
>> 
>> Maybe. If the error log "disabling some xstate features:" is changed
>> to "disabling MPX xstate features:", do you think it is OK?
> 
> That's better.  Is it really disabling MPX, though?
> 
> And shouldn't we clear the cpu feature bit too?

I am not sure. I am suspecting whether this checking should be moved before 
xstate_enable().

Peter, what do you think of it?

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 2/3] asm-generic: Add generic seccomp.h for secure computing mode 1

2014-07-23 Thread AKASHI Takahiro


On 07/24/2014 01:41 PM, Kees Cook wrote:

On Wed, Jul 23, 2014 at 8:40 PM, Andy Lutomirski  wrote:

On 07/22/2014 02:14 AM, AKASHI Takahiro wrote:


Those values (__NR_seccomp_*) are used solely in secure_computing()
to identify mode 1 system calls. If compat system calls have different
syscall numbers, asm/seccomp.h may override them.

Acked-by: Arnd Bergmann 
Signed-off-by: AKASHI Takahiro 
---
   include/asm-generic/seccomp.h |   28 
   1 file changed, 28 insertions(+)
   create mode 100644 include/asm-generic/seccomp.h

diff --git a/include/asm-generic/seccomp.h b/include/asm-generic/seccomp.h
new file mode 100644
index 000..5e97022
--- /dev/null
+++ b/include/asm-generic/seccomp.h
@@ -0,0 +1,28 @@
+/*
+ * include/asm-generic/seccomp.h
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _ASM_GENERIC_SECCOMP_H
+#define _ASM_GENERIC_SECCOMP_H
+
+#include 
+
+#if defined(CONFIG_COMPAT) && !defined(__NR_seccomp_read_32)
+#define __NR_seccomp_read_32   __NR_read
+#define __NR_seccomp_write_32  __NR_write
+#define __NR_seccomp_exit_32   __NR_exit
+#define __NR_seccomp_sigreturn_32  __NR_rt_sigreturn
+#endif /* CONFIG_COMPAT && ! already defined */
+
+#define __NR_seccomp_read  __NR_read
+#define __NR_seccomp_write __NR_write
+#define __NR_seccomp_exit  __NR_exit
+#define __NR_seccomp_sigreturn __NR_rt_sigreturn



I don't like these names.  __NR_seccomp_read sounds like the number of a
syscall called seccomp_read.

Also, shouldn't something be including this header?  I'm confused.


Ah! Good catch. These names are correct (see kernel/seccomp.c's
mode1_syscalls and mode1_syscalls_32 arrays), but the location of this
change was unexpected. I was expecting this file to live in
arch/*/include/asm/seccomp.h, not in include/asm-generic/seccomp.h.

However, since it's always the same list, it might make sense to
consolidate them into a single place as a default to make arch porting
easier.


Yeah, that is why I put this file under include/asm-generic.


However, I think that should be a separate patch.


Do you mean that the code for all the existing archs should also be changed
to use this (common) header?

-Takahiro AKASHI


-Kees


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-23 Thread Viresh Kumar

On 24 July 2014 08:32, Saravana Kannan  wrote:
> Ok, I think I've figured this out. But one question. Is it possible to
> physically remove one CPU in a bunch of "related cpus" without also
> unplugging the rest? Put another way, can you unplug one core from a
> cluster?

Are we talking about doing this here:

echo 0 > /sys/devices/system/cpu/cpuX/online  ??

If yes, then what's the confusion all about? Yes we do it all the time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v2 net-next 10/16] bpf: add eBPF verifier

2014-07-23 Thread Alexei Starovoitov

On Wed, Jul 23, 2014 at 5:48 PM, Alexei Starovoitov  wrote:
> check_reg_arg() is indeed incorrect here. Will fix. That was a good catch.
> Thank you for review!

I fixed this missing check_reg_arg() check, addressed feedback for
other patches, rebased it and pushed it:
https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/log/

Also replaced dump into syslog from verifier with print into user supplied
buffer, so that user space can see why verifier rejected the program.
The more I play with it the more I like fd-based user interface. Cannot
thank enough Andy for suggesting it :)
Currently map_id is still there as kernel internal id that programs are
using to access maps. I'll try to replace it with direct pointer.
That will save one idr_find() lookup in critical path. That's a good
reason to add new 16-byte eBPF instruction. It will be
"load 64-bit immediate" that we've discussed before.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 3/5] x86,random: Add an x86 implementation of arch_get_rng_seed

2014-07-23 Thread Andy Lutomirski

This does the same thing as the generic implementation, except
that it logs how many bits of each type it collected.  I want to
know whether the initial seeding is working and, if so, whether
the RNG is fast enough.

(I know that hpa assures me that the hardware RNG is more than
 fast enough, but I'd still like a direct way to verify this.)

Arguably, arch_get_random_seed could be removed now: I'm having some
trouble imagining a sensible non-architecture-specific use of it
that wouldn't be better served by arch_get_rng_seed.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/archrandom.h |  6 +
 arch/x86/kernel/Makefile  |  2 ++
 arch/x86/kernel/archrandom.c  | 51 +++
 3 files changed, 59 insertions(+)
 create mode 100644 arch/x86/kernel/archrandom.c

diff --git a/arch/x86/include/asm/archrandom.h 
b/arch/x86/include/asm/archrandom.h
index 69f1366..88f9c5a 100644
--- a/arch/x86/include/asm/archrandom.h
+++ b/arch/x86/include/asm/archrandom.h
@@ -117,6 +117,12 @@ GET_SEED(arch_get_random_seed_int, unsigned int, 
RDSEED_INT, ASM_NOP4);
 #define arch_has_random()  static_cpu_has(X86_FEATURE_RDRAND)
 #define arch_has_random_seed() static_cpu_has(X86_FEATURE_RDSEED)
 
+#define __HAVE_ARCH_GET_RNG_SEED
+extern void arch_get_rng_seed(void *ctx,
+ void (*seed)(void *ctx, u32 data),
+ int bits_per_source,
+ const char *log_prefix);
+
 #else
 
 static inline int rdrand_long(unsigned long *v)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 047f9ff..0718bae 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -92,6 +92,8 @@ obj-$(CONFIG_PARAVIRT)+= paravirt.o 
paravirt_patch_$(BITS).o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
 
+obj-$(CONFIG_ARCH_RANDOM)  += archrandom.o
+
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
 
 obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
new file mode 100644
index 000..47d13b0
--- /dev/null
+++ b/arch/x86/kernel/archrandom.c
@@ -0,0 +1,51 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2014 Andy Lutomirski
+ * Authors: Andy Lutomirski 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+void arch_get_rng_seed(void *ctx,
+  void (*seed)(void *ctx, u32 data),
+  int bits_per_source,
+  const char *log_prefix)
+{
+   int i;
+   int rdseed_bits = 0, rdrand_bits = 0;
+   char buf[128] = "";
+   char *msgptr = buf;
+
+   for (i = 0; i < bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long())
+   rdseed_bits += 8 * sizeof(rv);
+   else if (arch_get_random_long())
+   rdrand_bits += 8 * sizeof(rv);
+   else
+   continue;   /* Don't waste time mixing. */
+
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG > 32
+   seed(ctx, (u32)(rv >> 32));
+#endif
+   }
+
+   if (rdseed_bits)
+   msgptr += sprintf(msgptr, ", %d bits from RDSEED", rdseed_bits);
+   if (rdrand_bits)
+   msgptr += sprintf(msgptr, ", %d bits from RDRAND", rdrand_bits);
+   if (buf[0])
+   pr_info("%s with %s\n", log_prefix, buf + 2);
+}
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 4/5] x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed

2014-07-23 Thread Andy Lutomirski

This is a straightforward implementation: for each bit of internal
RNG state, request one bit from KVM_GET_RNG_SEED.  This is done even
if RDSEED/RDRAND worked, since KVM_GET_RNG_SEED is likely to provide
cryptographically secure output even if the CPU's RNG is weak or
compromised.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/Kconfig |  4 
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/kernel/archrandom.c | 25 -
 arch/x86/kernel/kvm.c| 10 ++
 4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a8f749e..adfa09c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -593,6 +593,7 @@ config KVM_GUEST
bool "KVM Guest support (including kvmclock)"
depends on PARAVIRT
select PARAVIRT_CLOCK
+   select ARCH_RANDOM
default y
---help---
  This option enables various optimizations for running under the KVM
@@ -1507,6 +1508,9 @@ config ARCH_RANDOM
  If supported, this is a high bandwidth, cryptographically
  secure hardware random number generator.
 
+ This also enables paravirt RNGs such as KVM's if the relevant
+ PV guest support is enabled.
+
 config X86_SMAP
def_bool y
prompt "Supervisor Mode Access Prevention" if EXPERT
diff --git a/arch/x86/include/asm/kvm_guest.h b/arch/x86/include/asm/kvm_guest.h
index a92b176..8c4dbd5 100644
--- a/arch/x86/include/asm/kvm_guest.h
+++ b/arch/x86/include/asm/kvm_guest.h
@@ -3,4 +3,13 @@
 
 int kvm_setup_vsyscall_timeinfo(void);
 
+#if defined(CONFIG_KVM_GUEST) && defined(CONFIG_ARCH_RANDOM)
+extern bool kvm_get_rng_seed(u64 *rv);
+#else
+static inline bool kvm_get_rng_seed(u64 *rv)
+{
+   return false;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_GUEST_H */
diff --git a/arch/x86/kernel/archrandom.c b/arch/x86/kernel/archrandom.c
index 47d13b0..8c8d021 100644
--- a/arch/x86/kernel/archrandom.c
+++ b/arch/x86/kernel/archrandom.c
@@ -15,6 +15,7 @@
  */
 
 #include 
+#include 
 
 void arch_get_rng_seed(void *ctx,
   void (*seed)(void *ctx, u32 data),
@@ -22,7 +23,7 @@ void arch_get_rng_seed(void *ctx,
   const char *log_prefix)
 {
int i;
-   int rdseed_bits = 0, rdrand_bits = 0;
+   int rdseed_bits = 0, rdrand_bits = 0, kvm_bits = 0;
char buf[128] = "";
char *msgptr = buf;
 
@@ -42,10 +43,32 @@ void arch_get_rng_seed(void *ctx,
 #endif
}
 
+   /*
+* Use KVM_GET_RNG_SEED regardless of whether the CPU RNG
+* worked, since it incorporates entropy unavailable to the CPU,
+* and we shouldn't trust the hardware RNG more than we need to.
+* We request enough bits for the entire internal RNG state,
+* because there's no good reason not to.
+*/
+   for (i = 0; i < bits_per_source; i += 64) {
+   u64 rv;
+
+   if (kvm_get_rng_seed()) {
+   seed(ctx, (u32)rv);
+   seed(ctx, (u32)(rv >> 32));
+   kvm_bits += 8 * sizeof(rv);
+   } else {
+   break;  /* If it fails once, it will keep failing. */
+   }
+   }
+
if (rdseed_bits)
msgptr += sprintf(msgptr, ", %d bits from RDSEED", rdseed_bits);
if (rdrand_bits)
msgptr += sprintf(msgptr, ", %d bits from RDRAND", rdrand_bits);
+   if (kvm_bits)
+   msgptr += sprintf(msgptr, ", %d bits from KVM_GET_RNG_BITS",
+ kvm_bits);
if (buf[0])
pr_info("%s with %s\n", log_prefix, buf + 2);
 }
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3dd8e2c..bd8783a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -416,6 +416,16 @@ void kvm_disable_steal_time(void)
wrmsr(MSR_KVM_STEAL_TIME, 0, 0);
 }
 
+bool kvm_get_rng_seed(u64 *v)
+{
+   /*
+* Allow migration from a hypervisor with the GET_RNG_SEED
+* feature to a hypervisor without it.
+*/
+   return (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED) &&
+   rdmsrl_safe(MSR_KVM_GET_RNG_SEED, v) == 0);
+}
+
 #ifdef CONFIG_SMP
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 0/5] random,x86,kvm: Rework arch RNG seeds and get some from kvm

2014-07-23 Thread Andy Lutomirski

This introduces and uses a very simple synchronous mechanism to get
/dev/urandom-style bits appropriate for initial KVM PV guest RNG
seeding.

It also re-works the way that architectural random data is fed into
random.c's pools.  I added a new arch hook called arch_get_rng_seed.
The default implementation is more or less the same as the current
code, except that random_get_entropy is now called unconditionally.

x86 gets a custom arch_get_rng_seed.  It will use KVM_GET_RNG_SEED
if available, and, if it does anything, it will log the number of
bits collected from each available architectural source.  If more
paravirt seed sources show up, it will be a natural place to add
them.

I sent the corresponding kvm-unit-tests and qemu changes separately.

Changes from v4:
 - Got rid of the RDRAND behavior change.  If this series is accepted,
   I may resend it separately, but I think it's an unrelated issue.
 - Fix up the changelog entries -- I misunderstood how the old code
   worked.
 - Avoid lots of failed attempts to use KVM_GET_RNG_SEED if it's not
   available.

Changes from v3:
 - Other than KASLR, the guest pieces are completely rewritten.
   Patches 2-4 have essentially nothing in common with v2.

Changes from v2:
 - Bisection fix (patch 2 had a misplaced brace).  The final states is
   identical to that of v2.
 - Improve the 0/5 description a little bit.

Changes from v1:
 - Split patches 2 and 3
 - Log all arch sources in init_std_data
 - Fix the 32-bit kaslr build

Andy Lutomirski (5):
  x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit
  random: Add and use arch_get_rng_seed
  x86,random: Add an x86 implementation of arch_get_rng_seed
  x86,random,kvm: Use KVM_GET_RNG_SEED in arch_get_rng_seed
  x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

 Documentation/virtual/kvm/cpuid.txt  |  3 ++
 arch/x86/Kconfig |  4 ++
 arch/x86/boot/compressed/aslr.c  | 27 +
 arch/x86/include/asm/archrandom.h|  6 +++
 arch/x86/include/asm/kvm_guest.h |  9 +
 arch/x86/include/asm/processor.h | 21 --
 arch/x86/include/uapi/asm/kvm_para.h |  2 +
 arch/x86/kernel/Makefile |  2 +
 arch/x86/kernel/archrandom.c | 74 
 arch/x86/kernel/kvm.c| 10 +
 arch/x86/kvm/cpuid.c |  3 +-
 arch/x86/kvm/x86.c   |  4 ++
 drivers/char/random.c| 14 +--
 include/linux/random.h   | 40 +++
 14 files changed, 212 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/kernel/archrandom.c

-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 5/5] x86,kaslr: Use MSR_KVM_GET_RNG_SEED for KASLR if available

2014-07-23 Thread Andy Lutomirski

It's considerably better than any of the alternatives on KVM.

Rather than reinventing all of the cpu feature query code, this fixes
native_cpuid to work in PIC objects.

I haven't combined it with boot/cpuflags.c's cpuid implementation:
including asm/processor.h from boot/cpuflags.c results in a flood of
unrelated errors, and fixing it might be messy.

Reviewed-by: Kees Cook 
Signed-off-by: Andy Lutomirski 
---
 arch/x86/boot/compressed/aslr.c  | 27 +++
 arch/x86/include/asm/processor.h | 21 ++---
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/aslr.c b/arch/x86/boot/compressed/aslr.c
index fc6091a..8583f0e 100644
--- a/arch/x86/boot/compressed/aslr.c
+++ b/arch/x86/boot/compressed/aslr.c
@@ -5,6 +5,8 @@
 #include 
 #include 
 
+#include 
+
 #include 
 #include 
 #include 
@@ -15,6 +17,22 @@
 static const char build_str[] = UTS_RELEASE " (" LINUX_COMPILE_BY "@"
LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;
 
+static bool kvm_para_has_feature(unsigned int feature)
+{
+   u32 kvm_base;
+   u32 features;
+
+   if (!has_cpuflag(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   kvm_base = hypervisor_cpuid_base("KVMKVMKVM\0\0\0", KVM_CPUID_FEATURES);
+   if (!kvm_base)
+   return false;
+
+   features = cpuid_eax(kvm_base | KVM_CPUID_FEATURES);
+   return features & (1UL << feature);
+}
+
 #define I8254_PORT_CONTROL 0x43
 #define I8254_PORT_COUNTER00x40
 #define I8254_CMD_READBACK 0xC0
@@ -81,6 +99,15 @@ static unsigned long get_random_long(void)
}
}
 
+   if (kvm_para_has_feature(KVM_FEATURE_GET_RNG_SEED)) {
+   u64 seed;
+
+   debug_putstr(" MSR_KVM_GET_RNG_SEED");
+   rdmsrl(MSR_KVM_GET_RNG_SEED, seed);
+   random ^= (unsigned long)seed;
+   use_i8254 = false;
+   }
+
if (has_cpuflag(X86_FEATURE_TSC)) {
debug_putstr(" RDTSC");
rdtscll(raw);
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a4ea023..6096f3c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,10 +189,25 @@ static inline int have_cpuid_p(void)
 static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
-   /* ecx is often an input as well as an output. */
-   asm volatile("cpuid"
+   /*
+* This function can be used from the boot code, so it needs
+* to avoid using EBX in constraints in PIC mode.
+*
+* ecx is often an input as well as an output.
+*/
+   asm volatile(".ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t"
+"movl  %%ebx,%1\n\t"
+".endif ; .endif   \n\t"
+"cpuid \n\t"
+".ifnc %%ebx,%1 ; .ifnc %%rbx,%1   \n\t"
+"xchgl %%ebx,%1\n\t"
+".endif ; .endif"
: "=a" (*eax),
- "=b" (*ebx),
+#if defined(__i386__) && defined(__PIC__)
+ "=r" (*ebx),  /* gcc won't let us use ebx */
+#else
+ "=b" (*ebx),  /* ebx is okay */
+#endif
  "=c" (*ecx),
  "=d" (*edx)
: "0" (*eax), "2" (*ecx)
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 2/5] random: Add and use arch_get_rng_seed

2014-07-23 Thread Andy Lutomirski

Currently, init_std_data contains its own logic for using arch
random sources.  This replaces that logic with a generic function
arch_get_rng_seed that allows arch code to supply its own logic.
The default implementation tries arch_get_random_seed_long and
arch_get_random_long individually.

The only functional change here is that random_get_entropy() is used
unconditionally instead of being used only when the arch sources
fail.  This may add a tiny amount of security.

Signed-off-by: Andy Lutomirski 
---
 drivers/char/random.c  | 14 +++---
 include/linux/random.h | 40 
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0a7ac0a..be7a94e 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes)
 }
 EXPORT_SYMBOL(get_random_bytes_arch);
 
+static void seed_entropy_store(void *ctx, u32 data)
+{
+   mix_pool_bytes((struct entropy_store *)ctx, , sizeof(data), NULL);
+}
 
 /*
  * init_std_data - initialize pool with system data
@@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r)
int i;
ktime_t now = ktime_get_real();
unsigned long rv;
+   char log_prefix[128];
 
r->last_pulled = jiffies;
mix_pool_bytes(r, , sizeof(now), NULL);
for (i = r->poolinfo->poolbytes; i > 0; i -= sizeof(rv)) {
-   if (!arch_get_random_seed_long() &&
-   !arch_get_random_long())
-   rv = random_get_entropy();
+   rv = random_get_entropy();
mix_pool_bytes(r, , sizeof(rv), NULL);
}
+
+   sprintf(log_prefix, "random: seeded %s pool", r->name);
+   arch_get_rng_seed(r, seed_entropy_store, 8 * r->poolinfo->poolbytes,
+ log_prefix);
+
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
 }
 
diff --git a/include/linux/random.h b/include/linux/random.h
index 57fbbff..81a6145 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void)
 }
 #endif
 
+#ifndef __HAVE_ARCH_GET_RNG_SEED
+
+/**
+ * arch_get_rng_seed() - get architectural rng seed data
+ * @ctx: context for the seed function
+ * @seed: function to call for each u32 obtained
+ * @bits_per_source: number of bits from each source to try to use
+ * @log_prefix: beginning of log output (may be NULL)
+ *
+ * Synchronously load some architectural entropy or other best-effort
+ * random seed data.  An arch-specific implementation should be no worse
+ * than this generic implementation.  If the arch code does something
+ * interesting, it may log something of the form "log_prefix with
+ * 8 bits of stuff".
+ *
+ * No arch-specific implementation should be any worse than the generic
+ * implementation.
+ */
+static inline void arch_get_rng_seed(void *ctx,
+void (*seed)(void *ctx, u32 data),
+int bits_per_source,
+const char *log_prefix)
+{
+   int i;
+
+   for (i = 0; i < bits_per_source; i += 8 * sizeof(long)) {
+   unsigned long rv;
+
+   if (arch_get_random_seed_long() ||
+   arch_get_random_long()) {
+   seed(ctx, (u32)rv);
+#if BITS_PER_LONG > 32
+   seed(ctx, (u32)(rv >> 32));
+#endif
+   }
+   }
+}
+
+#endif /* __HAVE_ARCH_GET_RNG_SEED */
+
 /* Pseudo random number generator from numerical recipes. */
 static inline u32 next_pseudo_random32(u32 seed)
 {
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v5 1/5] x86,kvm: Add MSR_KVM_GET_RNG_SEED and a matching feature bit

2014-07-23 Thread Andy Lutomirski

This adds a simple interface to allow a guest to request 64 bits of
host nonblocking entropy.  This is independent of virtio-rng for a
couple of reasons:

 - It's intended to be usable during early boot, when a trivial
   synchronous interface is needed.

 - virtio-rng gives blocking entropy, and making guest boot wait for
   the host's /dev/random will cause problems.

MSR_KVM_GET_RNG_SEED is intended to provide 64 bits of best-effort
cryptographically secure data for use as a seed.  It provides no
guarantee that the result contains any actual entropy.

Signed-off-by: Andy Lutomirski 
---
 Documentation/virtual/kvm/cpuid.txt  | 3 +++
 arch/x86/include/uapi/asm/kvm_para.h | 2 ++
 arch/x86/kvm/cpuid.c | 3 ++-
 arch/x86/kvm/x86.c   | 4 
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/cpuid.txt 
b/Documentation/virtual/kvm/cpuid.txt
index 3c65feb..0ab043b 100644
--- a/Documentation/virtual/kvm/cpuid.txt
+++ b/Documentation/virtual/kvm/cpuid.txt
@@ -54,6 +54,9 @@ KVM_FEATURE_PV_UNHALT  || 7 || guest checks 
this feature bit
||   || before enabling paravirtualized
||   || spinlock support.
 --
+KVM_FEATURE_GET_RNG_SEED   || 8 || host provides rng seed data via
+   ||   || MSR_KVM_GET_RNG_SEED.
+--
 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side
||   || per-cpu warps are expected in
||   || kvmclock.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h 
b/arch/x86/include/uapi/asm/kvm_para.h
index 94dc8ca..e2eaf93 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -24,6 +24,7 @@
 #define KVM_FEATURE_STEAL_TIME 5
 #define KVM_FEATURE_PV_EOI 6
 #define KVM_FEATURE_PV_UNHALT  7
+#define KVM_FEATURE_GET_RNG_SEED   8
 
 /* The last 8 bits are used to indicate how to interpret the flags field
  * in pvclock structure. If no bits are set, all flags are ignored.
@@ -40,6 +41,7 @@
 #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN  0x4b564d04
+#define MSR_KVM_GET_RNG_SEED 0x4b564d05
 
 struct kvm_steal_time {
__u64 steal;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..40d6763 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -479,7 +479,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
 (1 << KVM_FEATURE_ASYNC_PF) |
 (1 << KVM_FEATURE_PV_EOI) |
 (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
-(1 << KVM_FEATURE_PV_UNHALT);
+(1 << KVM_FEATURE_PV_UNHALT) |
+(1 << KVM_FEATURE_GET_RNG_SEED);
 
if (sched_info_on())
entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f644933..4e81853 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -48,6 +48,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define CREATE_TRACE_POINTS
@@ -2480,6 +2481,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_KVM_PV_EOI_EN:
data = vcpu->arch.pv_eoi.msr_val;
break;
+   case MSR_KVM_GET_RNG_SEED:
+   get_random_bytes(, sizeof(data));
+   break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
case MSR_IA32_MCG_CAP:
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/14] arm64: eBPF JIT compiler

2014-07-23 Thread Z Lim

On Wed, Jul 23, 2014 at 3:32 AM, Catalin Marinas
 wrote:
> On Mon, Jul 21, 2014 at 04:49:29PM +0100, Alexei Starovoitov wrote:
>> On Mon, Jul 21, 2014 at 2:16 AM, Will Deacon  wrote:
>> > On Fri, Jul 18, 2014 at 07:28:06PM +0100, Zi Shen Lim wrote:
[...]
>> >> This series applies against net-next and is tested working
>> >> with lib/test_bpf on ARMv8 Foundation Model.
>> >
>> > Looks like it works on my Juno board too, so:
>> >
>> >   Acked-by: Will Deacon 
>> >
>> > for the series.
>> >
>> > It's a bit late for 3.17 now, so I guess we'll queue this for 3.18 (which
>> > also means the dependency on -next isn't an issue). Perhaps you could 
>> > repost
>> > around -rc3?
>>
>> Thanks for testing! Nice to see it working on real hw.
>> I'm not sure why you're proposing a 4+ week delay. The patches
>> will rot instead of getting used and tested. Imo it makes sense to
>> get them into net-next now for 3.17.
>> JIT is disabled by sysctl by default anyway.
>
> We normally like some patches (especially new functionality) to sit in
> linux-next for a while before the mering window (ideally starting with
> -rc4 or -rc5). We are at -rc6 already, so getting close to the 3.17
> merging window.
>
> Another aspect is that the arm64/bpf branch depends on the net tree, so
> it can't easily go in via the arm64 tree for 3.17 (3.18 would not be a
> problem).

Hi Catalin, I take it you prefer this series going through arm64 tree,
targeting 3.18, is that right?

I understand your preference to have it sitting in linux-next for a
longer period for arm64 material, I'll repost this again after 3.17 so
it gets more exposure in linux-next.

BTW, are you open to this series going through net tree? I'm
(preemptively) asking because during development of this series, I've
had to rebase a couple times against net-next to handle dependencies.
Or is the general practice to handle conflicts in linux-next itself?

>
> --
> Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 0/6] File Sealing & memfd_create()

2014-07-23 Thread Hugh Dickins

On Sun, 20 Jul 2014, David Herrmann wrote:

> Hi
> 
> This is v4 of the File-Sealing and memfd_create() patches. You can find v1 
> with
> a longer introduction at gmane [1], there's also v2 [2] and v3 [3] available.
> See also the article about sealing on LWN [4], and a high-level introduction 
> on
> the new API in my blog [5]. Last but not least, man-page proposals are
> available in my private repository [6].
> 
> This series introduces two new APIs:
>   memfd_create(): Think of this syscall as malloc() but it returns a
>   file-descriptor instead of a pointer. That file-descriptor 
> is
>   backed by anon-memory and can be memory-mapped for access.
>   sealing: The sealing API can be used to prevent a specific set of operations
>on a file-descriptor. You 'seal' the file and give thus the
>guarantee, that those operations will be rejected from now on.
> 
> This series adds the memfd_create(2) syscall only to x86 and x86-64. Patches 
> for
> most other architectures are available in my private repository [7]. Missing
> architectures are:
> alpha, avr32, blackfin, cris, frv, m32r, microblaze, mn10300, sh, sparc,
> um, xtensa
> These architectures lack several newer syscalls, so those should be added 
> first
> before adding memfd_create(2). I can provide patches for those, if required.
> However, I think it should be kept separate from this series.
> 
> Changes in v4:
>   - drop page-isolation in favor of shmem_wait_for_pins()
>   - add unlikely(info->seals) to write_begin hot-path
>   - return EPERM for F_ADD_SEALS if file is not writable
>   - moved shmem_wait_for_pins() entirely into it's own commit
>   - make O_LARGEFILE mandatory part of memfd_create() ABI
>   - add lru_add_drain() to shmem_tag_pins() hot-path
>   - minor coding-style changes
> 
> Thanks
> David
> 
> 
> [1]memfd v1: http://thread.gmane.org/gmane.comp.video.dri.devel/102241
> [2]memfd v2: http://thread.gmane.org/gmane.linux.kernel.mm/115713
> [3]memfd v3: http://thread.gmane.org/gmane.linux.kernel.mm/118721
> [4] LWN article: https://lwn.net/Articles/593918/
> [5]   API Intro: http://dvdhrm.wordpress.com/2014/06/10/memfd_create2/
> [6]   Man-pages: http://cgit.freedesktop.org/~dvdhrm/man-pages/log/?h=memfd
> [7]Dev-repo: http://cgit.freedesktop.org/~dvdhrm/linux/log/?h=memfd
> 
> 
> David Herrmann (6):
>   mm: allow drivers to prevent new writable mappings
>   shm: add sealing API
>   shm: add memfd_create() syscall
>   selftests: add memfd_create() + sealing tests
>   selftests: add memfd/sealing page-pinning tests
>   shm: wait for pins to be released when sealing

Andrew, I've now given my Ack to all of these, and think they are
ready for inclusion in mmotm, if you agree with the addition of
this sealing feature and the memfd_create() syscall.

Andy Lutomirsky and I agree that it's somewhat unsatisfactory that a
sealed sparse file could be passed, and inflate to something OOMing
when read by the recipient; but I think we can live with that as a
limitation of the initial implementation (the suspicious recipient
can verify non-sparseness with lseek SEEK_HOLE), and it's my job
to work on fixing that aspect - though probably not for 3.17.

Thanks,
Hugh

> 
>  arch/x86/syscalls/syscall_32.tbl   |   1 +
>  arch/x86/syscalls/syscall_64.tbl   |   1 +
>  fs/fcntl.c |   5 +
>  fs/inode.c |   1 +
>  include/linux/fs.h |  29 +-
>  include/linux/shmem_fs.h   |  17 +
>  include/linux/syscalls.h   |   1 +
>  include/uapi/linux/fcntl.h |  15 +
>  include/uapi/linux/memfd.h |   8 +
>  kernel/fork.c  |   2 +-
>  kernel/sys_ni.c|   1 +
>  mm/mmap.c  |  30 +-
>  mm/shmem.c | 324 +
>  mm/swap_state.c|   1 +
>  tools/testing/selftests/Makefile   |   1 +
>  tools/testing/selftests/memfd/.gitignore   |   4 +
>  tools/testing/selftests/memfd/Makefile |  41 ++
>  tools/testing/selftests/memfd/fuse_mnt.c   | 110 +++
>  tools/testing/selftests/memfd/fuse_test.c  | 311 +
>  tools/testing/selftests/memfd/memfd_test.c | 913 
> +
>  tools/testing/selftests/memfd/run_fuse_test.sh |  14 +
>  21 files changed, 1821 insertions(+), 9 deletions(-)
>  create mode 100644 include/uapi/linux/memfd.h
>  create mode 100644 tools/testing/selftests/memfd/.gitignore
>  create mode 100644 tools/testing/selftests/memfd/Makefile
>  create mode 100755 tools/testing/selftests/memfd/fuse_mnt.c
>  create mode 100644 tools/testing/selftests/memfd/fuse_test.c
>  create mode 100644 tools/testing/selftests/memfd/memfd_test.c
>  create mode 100755

Re: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx

2014-07-23 Thread Dave Hansen

On 07/23/2014 05:56 PM, Ren, Qiaowei wrote:
> On 2014-07-24, Hansen, Dave wrote:
>> On 07/22/2014 07:35 PM, Ren, Qiaowei wrote:
>>> The checking about MPX feature should be as follow:
>>>
>>> if (pcntxt_mask & XSTATE_EAGER) {
>>> if (eagerfpu == DISABLE) {
>>> pr_err("eagerfpu not present, disabling some
>> xstate features: 0x%llx\n",
>>> pcntxt_mask &
>> XSTATE_EAGER);
>>> pcntxt_mask &= ~XSTATE_EAGER; } else { eagerfpu
>>> = ENABLE;
>>> }
>>> }
>>> This patch was merged into kernel the ending of last year
>>> (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com
>>> mi
>>> t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )
>>
>> Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here?
>> 
>> This isn't major, but I can't _ever_ imagine a user being able to 
>> track down why MPX is not working from this message. Should we
>> spruce it up somehow?
> 
> Maybe. If the error log "disabling some xstate features:" is changed
> to "disabling MPX xstate features:", do you think it is OK?

That's better.  Is it really disabling MPX, though?

And shouldn't we clear the cpu feature bit too?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arc: FIXME in file uwind.c

2014-07-23 Thread Nick Krause

On Thu, Jul 24, 2014 at 12:38 AM, Vineet Gupta
 wrote:
> Hi Nick,
>
> On Saturday 05 July 2014 08:05 AM, Nick Krause wrote:
>> When using cscope I get a FIX ME message in uwind.c , it seems to have
>> four NULL returns.
>> I am wondering since I am new to part of the kernel what would you
>> like me to return here?
>> Cheers Nick
>
> The issues were related to incorrect .debug_frame stuff generated by gcc at 
> the
> time. The current unwinder works well so we've not had the need to revisit it.
>
> Thx,
> -Vineet
Finally , a nice comment :). I seem to be getting hate for some bad patches.
Probably should quit :(.
Nick
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 2/3] asm-generic: Add generic seccomp.h for secure computing mode 1

2014-07-23 Thread Kees Cook

On Wed, Jul 23, 2014 at 8:40 PM, Andy Lutomirski  wrote:
> On 07/22/2014 02:14 AM, AKASHI Takahiro wrote:
>>
>> Those values (__NR_seccomp_*) are used solely in secure_computing()
>> to identify mode 1 system calls. If compat system calls have different
>> syscall numbers, asm/seccomp.h may override them.
>>
>> Acked-by: Arnd Bergmann 
>> Signed-off-by: AKASHI Takahiro 
>> ---
>>   include/asm-generic/seccomp.h |   28 
>>   1 file changed, 28 insertions(+)
>>   create mode 100644 include/asm-generic/seccomp.h
>>
>> diff --git a/include/asm-generic/seccomp.h b/include/asm-generic/seccomp.h
>> new file mode 100644
>> index 000..5e97022
>> --- /dev/null
>> +++ b/include/asm-generic/seccomp.h
>> @@ -0,0 +1,28 @@
>> +/*
>> + * include/asm-generic/seccomp.h
>> + *
>> + * Copyright (C) 2014 Linaro Limited
>> + * Author: AKASHI Takahiro 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +#ifndef _ASM_GENERIC_SECCOMP_H
>> +#define _ASM_GENERIC_SECCOMP_H
>> +
>> +#include 
>> +
>> +#if defined(CONFIG_COMPAT) && !defined(__NR_seccomp_read_32)
>> +#define __NR_seccomp_read_32   __NR_read
>> +#define __NR_seccomp_write_32  __NR_write
>> +#define __NR_seccomp_exit_32   __NR_exit
>> +#define __NR_seccomp_sigreturn_32  __NR_rt_sigreturn
>> +#endif /* CONFIG_COMPAT && ! already defined */
>> +
>> +#define __NR_seccomp_read  __NR_read
>> +#define __NR_seccomp_write __NR_write
>> +#define __NR_seccomp_exit  __NR_exit
>> +#define __NR_seccomp_sigreturn __NR_rt_sigreturn
>
>
> I don't like these names.  __NR_seccomp_read sounds like the number of a
> syscall called seccomp_read.
>
> Also, shouldn't something be including this header?  I'm confused.

Ah! Good catch. These names are correct (see kernel/seccomp.c's
mode1_syscalls and mode1_syscalls_32 arrays), but the location of this
change was unexpected. I was expecting this file to live in
arch/*/include/asm/seccomp.h, not in include/asm-generic/seccomp.h.

However, since it's always the same list, it might make sense to
consolidate them into a single place as a default to make arch porting
easier. However, I think that should be a separate patch.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: arc: FIXME in file uwind.c

2014-07-23 Thread Vineet Gupta

Hi Nick,

On Saturday 05 July 2014 08:05 AM, Nick Krause wrote:
> When using cscope I get a FIX ME message in uwind.c , it seems to have
> four NULL returns.
> I am wondering since I am new to part of the kernel what would you
> like me to return here?
> Cheers Nick

The issues were related to incorrect .debug_frame stuff generated by gcc at the
time. The current unwinder works well so we've not had the need to revisit it.

Thx,
-Vineet
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 6/6] shm: wait for pins to be released when sealing

2014-07-23 Thread Hugh Dickins

On Sun, 20 Jul 2014, David Herrmann wrote:

> If we set SEAL_WRITE on a file, we must make sure there cannot be any
> ongoing write-operations on the file. For write() calls, we simply lock
> the inode mutex, for mmap() we simply verify there're no writable
> mappings. However, there might be pages pinned by AIO, Direct-IO and
> similar operations via GUP. We must make sure those do not write to the
> memfd file after we set SEAL_WRITE.
> 
> As there is no way to notify GUP users to drop pages or to wait for them
> to be done, we implement the wait ourself: When setting SEAL_WRITE, we
> check all pages for their ref-count. If it's bigger than 1, we know
> there's some user of the page. We then mark the page and wait for up to
> 150ms for those ref-counts to be dropped. If the ref-counts are not
> dropped in time, we refuse the seal operation.
> 
> Signed-off-by: David Herrmann 

Acked-by: Hugh Dickins 

I'd have moved this one up before the testing ones - except changing
the sequence in between postings can be confusing.  I'd be happy if
akpm happened to move it up - but unconcerned if he did not.

> ---
>  mm/shmem.c | 110 
> -
>  1 file changed, 109 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 770e072..df1aceb 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1780,9 +1780,117 @@ static loff_t shmem_file_llseek(struct file *file, 
> loff_t offset, int whence)
>   return offset;
>  }
>  
> +/*
> + * We need a tag: a new tag would expand every radix_tree_node by 8 bytes,
> + * so reuse a tag which we firmly believe is never set or cleared on shmem.
> + */
> +#define SHMEM_TAG_PINNEDPAGECACHE_TAG_TOWRITE
> +#define LAST_SCAN   4   /* about 150ms max */
> +
> +static void shmem_tag_pins(struct address_space *mapping)
> +{
> + struct radix_tree_iter iter;
> + void **slot;
> + pgoff_t start;
> + struct page *page;
> +
> + lru_add_drain();
> + start = 0;
> + rcu_read_lock();
> +
> +restart:
> + radix_tree_for_each_slot(slot, >page_tree, , start) {
> + page = radix_tree_deref_slot(slot);
> + if (!page || radix_tree_exception(page)) {
> + if (radix_tree_deref_retry(page))
> + goto restart;
> + } else if (page_count(page) - page_mapcount(page) > 1) {
> + spin_lock_irq(>tree_lock);
> + radix_tree_tag_set(>page_tree, iter.index,
> +SHMEM_TAG_PINNED);
> + spin_unlock_irq(>tree_lock);
> + }
> +
> + if (need_resched()) {
> + cond_resched_rcu();
> + start = iter.index + 1;
> + goto restart;
> + }
> + }
> + rcu_read_unlock();
> +}
> +
> +/*
> + * Setting SEAL_WRITE requires us to verify there's no pending writer. 
> However,
> + * via get_user_pages(), drivers might have some pending I/O without any 
> active
> + * user-space mappings (eg., direct-IO, AIO). Therefore, we look at all pages
> + * and see whether it has an elevated ref-count. If so, we tag them and wait 
> for
> + * them to be dropped.
> + * The caller must guarantee that no new user will acquire writable 
> references
> + * to those pages to avoid races.
> + */
>  static int shmem_wait_for_pins(struct address_space *mapping)
>  {
> - return 0;
> + struct radix_tree_iter iter;
> + void **slot;
> + pgoff_t start;
> + struct page *page;
> + int error, scan;
> +
> + shmem_tag_pins(mapping);
> +
> + error = 0;
> + for (scan = 0; scan <= LAST_SCAN; scan++) {
> + if (!radix_tree_tagged(>page_tree, SHMEM_TAG_PINNED))
> + break;
> +
> + if (!scan)
> + lru_add_drain_all();
> + else if (schedule_timeout_killable((HZ << scan) / 200))
> + scan = LAST_SCAN;
> +
> + start = 0;
> + rcu_read_lock();
> +restart:
> + radix_tree_for_each_tagged(slot, >page_tree, ,
> +start, SHMEM_TAG_PINNED) {
> +
> + page = radix_tree_deref_slot(slot);
> + if (radix_tree_exception(page)) {
> + if (radix_tree_deref_retry(page))
> + goto restart;
> +
> + page = NULL;
> + }
> +
> + if (page &&
> + page_count(page) - page_mapcount(page) != 1) {
> + if (scan < LAST_SCAN)
> + goto continue_resched;
> +
> + /*
> +  * On the last scan, we clean up all those tags
> +  * we inserted; but make a note that we still
> +

Re: [PATCH v4 5/6] selftests: add memfd/sealing page-pinning tests

2014-07-23 Thread Hugh Dickins

On Sun, 20 Jul 2014, David Herrmann wrote:

> Setting SEAL_WRITE is not possible if there're pending GUP users. This
> commit adds selftests for memfd+sealing that use FUSE to create pending
> page-references. FUSE is very helpful here in that it allows us to delay
> direct-IO operations for an arbitrary amount of time. This way, we can
> force the kernel to pin pages and then run our normal selftests.
> 
> Signed-off-by: David Herrmann 

Acked-by: Hugh Dickins 

My "Permission denied" problem was actually not with /dev/fuse,
but with the executability (or not) of ./run_fuse_test.sh.
I see now that your git patch has create mode 100755, but that
got missed when I applied it to my tree with "patch -p1".
I would not be surprised if it goes missing on its way through
the quilt-style mmotm, or I may be under-rating akpm.  Personally,
I'd change the Makefile one way or another, not to rely on 755.

> ---
>  tools/testing/selftests/memfd/.gitignore   |   2 +
>  tools/testing/selftests/memfd/Makefile |  14 +-
>  tools/testing/selftests/memfd/fuse_mnt.c   | 110 +
>  tools/testing/selftests/memfd/fuse_test.c  | 311 
> +
>  tools/testing/selftests/memfd/run_fuse_test.sh |  14 ++
>  5 files changed, 450 insertions(+), 1 deletion(-)
>  create mode 100755 tools/testing/selftests/memfd/fuse_mnt.c
>  create mode 100644 tools/testing/selftests/memfd/fuse_test.c
>  create mode 100755 tools/testing/selftests/memfd/run_fuse_test.sh
> 
> diff --git a/tools/testing/selftests/memfd/.gitignore 
> b/tools/testing/selftests/memfd/.gitignore
> index bcc8ee2..afe87c4 100644
> --- a/tools/testing/selftests/memfd/.gitignore
> +++ b/tools/testing/selftests/memfd/.gitignore
> @@ -1,2 +1,4 @@
> +fuse_mnt
> +fuse_test
>  memfd_test
>  memfd-test-file
> diff --git a/tools/testing/selftests/memfd/Makefile 
> b/tools/testing/selftests/memfd/Makefile
> index 36653b9..6816c49 100644
> --- a/tools/testing/selftests/memfd/Makefile
> +++ b/tools/testing/selftests/memfd/Makefile
> @@ -7,6 +7,7 @@ ifeq ($(ARCH),x86_64)
>   ARCH := X86
>  endif
>  
> +CFLAGS += -D_FILE_OFFSET_BITS=64
>  CFLAGS += -I../../../../arch/x86/include/generated/uapi/
>  CFLAGS += -I../../../../arch/x86/include/uapi/
>  CFLAGS += -I../../../../include/uapi/
> @@ -25,5 +26,16 @@ ifeq ($(ARCH),X86)
>  endif
>   @./memfd_test || echo "memfd_test: [FAIL]"
>  
> +build_fuse:
> +ifeq ($(ARCH),X86)
> + gcc $(CFLAGS) fuse_mnt.c `pkg-config fuse --cflags --libs` -o fuse_mnt
> + gcc $(CFLAGS) fuse_test.c -o fuse_test
> +else
> + echo "Not an x86 target, can't build memfd selftest"
> +endif
> +
> +run_fuse: build_fuse
> + @./run_fuse_test.sh || echo "fuse_test: [FAIL]"
> +
>  clean:
> - $(RM) memfd_test
> + $(RM) memfd_test fuse_test
> diff --git a/tools/testing/selftests/memfd/fuse_mnt.c 
> b/tools/testing/selftests/memfd/fuse_mnt.c
> new file mode 100755
> index 000..feacf12
> --- /dev/null
> +++ b/tools/testing/selftests/memfd/fuse_mnt.c
> @@ -0,0 +1,110 @@
> +/*
> + * memfd test file-system
> + * This file uses FUSE to create a dummy file-system with only one file 
> /memfd.
> + * This file is read-only and takes 1s per read.
> + *
> + * This file-system is used by the memfd test-cases to force the kernel to 
> pin
> + * pages during reads(). Due to the 1s delay of this file-system, this is a
> + * nice way to test race-conditions against get_user_pages() in the kernel.
> + *
> + * We use direct_io==1 to force the kernel to use direct-IO for this
> + * file-system.
> + */
> +
> +#define FUSE_USE_VERSION 26
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static const char memfd_content[] = "memfd-example-content";
> +static const char memfd_path[] = "/memfd";
> +
> +static int memfd_getattr(const char *path, struct stat *st)
> +{
> + memset(st, 0, sizeof(*st));
> +
> + if (!strcmp(path, "/")) {
> + st->st_mode = S_IFDIR | 0755;
> + st->st_nlink = 2;
> + } else if (!strcmp(path, memfd_path)) {
> + st->st_mode = S_IFREG | 0444;
> + st->st_nlink = 1;
> + st->st_size = strlen(memfd_content);
> + } else {
> + return -ENOENT;
> + }
> +
> + return 0;
> +}
> +
> +static int memfd_readdir(const char *path,
> +  void *buf,
> +  fuse_fill_dir_t filler,
> +  off_t offset,
> +  struct fuse_file_info *fi)
> +{
> + if (strcmp(path, "/"))
> + return -ENOENT;
> +
> + filler(buf, ".", NULL, 0);
> + filler(buf, "..", NULL, 0);
> + filler(buf, memfd_path + 1, NULL, 0);
> +
> + return 0;
> +}
> +
> +static int memfd_open(const char *path, struct fuse_file_info *fi)
> +{
> + if (strcmp(path, memfd_path))
> + return -ENOENT;
> +
> + if ((fi->flags & 3) != O_RDONLY)
> + return -EACCES;
> +
> + /* force direct-IO */
> + fi->direct_io =

[PATCH 1/1] alienware-wmi: make hdmi_mux enabled on case-by-case basis

2014-07-23 Thread Mario Limonciello

Not all HW supporting WMAX method will support the HDMI mux feature.
Explicitly quirk the HW that does support it.

Signed-off-by: Mario Limonciello 
---
 drivers/platform/x86/alienware-wmi.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/alienware-wmi.c 
b/drivers/platform/x86/alienware-wmi.c
index 297b664..7b1fe5f 100644
--- a/drivers/platform/x86/alienware-wmi.c
+++ b/drivers/platform/x86/alienware-wmi.c
@@ -59,16 +59,24 @@ enum WMAX_CONTROL_STATES {
 
 struct quirk_entry {
u8 num_zones;
+   u8 hdmi_mux;
 };
 
 static struct quirk_entry *quirks;
 
 static struct quirk_entry quirk_unknown = {
.num_zones = 2,
+   .hdmi_mux = 0,
 };
 
 static struct quirk_entry quirk_x51_family = {
.num_zones = 3,
+   .hdmi_mux = 0,
+};
+
+static struct quirk_entry quirk_asm100 = {
+   .num_zones = 2,
+   .hdmi_mux = 1,
 };
 
 static int dmi_matched(const struct dmi_system_id *dmi)
@@ -96,6 +104,15 @@ static struct dmi_system_id alienware_quirks[] = {
 },
 .driver_data = _x51_family,
 },
+   {
+.callback = dmi_matched,
+.ident = "Alienware ASM100",
+.matches = {
+DMI_MATCH(DMI_SYS_VENDOR, "Alienware"),
+DMI_MATCH(DMI_PRODUCT_NAME, "ASM100"),
+},
+.driver_data = _asm100,
+},
{}
 };
 
@@ -537,7 +554,8 @@ static struct attribute_group hdmi_attribute_group = {
 
 static void remove_hdmi(struct platform_device *dev)
 {
-   sysfs_remove_group(>dev.kobj, _attribute_group);
+   if (quirks->hdmi_mux > 0)
+   sysfs_remove_group(>dev.kobj, _attribute_group);
 }
 
 static int create_hdmi(struct platform_device *dev)
@@ -583,7 +601,7 @@ static int __init alienware_wmi_init(void)
if (ret)
goto fail_platform_device2;
 
-   if (interface == WMAX) {
+   if (quirks->hdmi_mux > 0) {
ret = create_hdmi(platform_device);
if (ret)
goto fail_prep_hdmi;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] fb: backlight: add driver for iPAQ micro backlight

2014-07-23 Thread Jingoo Han

On Wednesday, July 23, 2014 6:04 PM, Linus Walleij wrote:
> 
> This adds a driver for the backlight controlled by the microcontroller
> on the Compaq iPAQ series of handheld computers: h3100, h3600
> and h3700.
> 
> Signed-off-by: Linus Walleij 
> ---
> ChangeLog v1->v2:
> - Add a comment to clarify message format
> - Coding format and style fixes
> - Drop driver announce boilerplate
> - Drop empty remove() function
> ---
>  drivers/video/backlight/Kconfig |  9 
>  drivers/video/backlight/Makefile|  1 +
>  drivers/video/backlight/ipaq_micro_bl.c | 83 
> +
>  3 files changed, 93 insertions(+)
>  create mode 100644 drivers/video/backlight/ipaq_micro_bl.c

[.]

> --- /dev/null
> +++ b/drivers/video/backlight/ipaq_micro_bl.c
> @@ -0,0 +1,83 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * iPAQ microcontroller backlight support
> + * Author : Linus Walleij 
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Would you re-order these headers alphabetically?
It enhances the readability.

[.]

> +MODULE_LICENSE("GPL");

How about using "GPL v2" as below?
+MODULE_LICENSE("GPL v2");

Others look good. Thanks.
Acked-by: Jingoo Han 

Best regards,
Jingoo Han

> +MODULE_DESCRIPTION("driver for iPAQ Atmel micro backlight");
> +MODULE_ALIAS("platform:ipaq-micro-backlight");
> --
> 1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH tip/core/rcu] Do not keep timekeeping CPU tick running for non-nohz_full= CPUs

2014-07-23 Thread Mike Galbraith

On Wed, 2014-07-23 at 18:02 +0200, Frederic Weisbecker wrote:

> Yes. Distros seem to want to make full dynticks available for users but they
> also want the off case (when nohz_full= isn't passed) to keep the lowest 
> overhead
> as possible.

Yup, zero being the _desired_ number.  The general case can't afford any
fastpath cycles being added by a fringe feature, we're too fat now.

Imagines marketeer highlighting shiny new fringe feature: "...and most
network benchmarks show that we only lost a few percent throughput, so
you won't have to increase the size of your server farm _too_ much".

Ok, so my imagination conjured up a _stupid_ marketeer, you get it ;-)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 3/6] shm: add memfd_create() syscall

2014-07-23 Thread Hugh Dickins

On Sun, 20 Jul 2014, David Herrmann wrote:

> memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
> that you can pass to mmap(). It can support sealing and avoids any
> connection to user-visible mount-points. Thus, it's not subject to quotas
> on mounted file-systems, but can be used like malloc()'ed memory, but
> with a file-descriptor to it.
> 
> memfd_create() returns the raw shmem file, so calls like ftruncate() can
> be used to modify the underlying inode. Also calls like fstat()
> will return proper information and mark the file as regular file. If you
> want sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not
> supported (like on all other regular files).
> 
> Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
> subject to a filesystem size limit. It is still properly accounted to
> memcg limits, though, and to the same overcommit or no-overcommit
> accounting as all user memory.
> 
> Signed-off-by: David Herrmann 

Acked-by: Hugh Dickins 

It appears to be the new syscall season, and I'm afraid I've delayed
you just long enough for two or three more to interpose themselves
after sys_renameat2.  If he agrees, I think that messiness is best
left to akpm, who I expect would adapt the patch according to what
has actually reached Linus by the time he's ready to send this in.

> ---
>  arch/x86/syscalls/syscall_32.tbl |  1 +
>  arch/x86/syscalls/syscall_64.tbl |  1 +
>  include/linux/syscalls.h |  1 +
>  include/uapi/linux/memfd.h   |  8 +
>  kernel/sys_ni.c  |  1 +
>  mm/shmem.c   | 73 
> 
>  6 files changed, 85 insertions(+)
>  create mode 100644 include/uapi/linux/memfd.h
> 
> diff --git a/arch/x86/syscalls/syscall_32.tbl 
> b/arch/x86/syscalls/syscall_32.tbl
> index d6b8679..e7495b4 100644
> --- a/arch/x86/syscalls/syscall_32.tbl
> +++ b/arch/x86/syscalls/syscall_32.tbl
> @@ -360,3 +360,4 @@
>  351  i386sched_setattr   sys_sched_setattr
>  352  i386sched_getattr   sys_sched_getattr
>  353  i386renameat2   sys_renameat2
> +354  i386memfd_createsys_memfd_create
> diff --git a/arch/x86/syscalls/syscall_64.tbl 
> b/arch/x86/syscalls/syscall_64.tbl
> index ec255a1..28be0e1 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -323,6 +323,7 @@
>  314  common  sched_setattr   sys_sched_setattr
>  315  common  sched_getattr   sys_sched_getattr
>  316  common  renameat2   sys_renameat2
> +317  common  memfd_createsys_memfd_create
>  
>  #
>  # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index b0881a0..de00585 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -802,6 +802,7 @@ asmlinkage long sys_timerfd_settime(int ufd, int flags,
>  asmlinkage long sys_timerfd_gettime(int ufd, struct itimerspec __user *otmr);
>  asmlinkage long sys_eventfd(unsigned int count);
>  asmlinkage long sys_eventfd2(unsigned int count, int flags);
> +asmlinkage long sys_memfd_create(const char __user *uname_ptr, unsigned int 
> flags);
>  asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);
>  asmlinkage long sys_old_readdir(unsigned int, struct old_linux_dirent __user 
> *, unsigned int);
>  asmlinkage long sys_pselect6(int, fd_set __user *, fd_set __user *,
> diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
> new file mode 100644
> index 000..534e364
> --- /dev/null
> +++ b/include/uapi/linux/memfd.h
> @@ -0,0 +1,8 @@
> +#ifndef _UAPI_LINUX_MEMFD_H
> +#define _UAPI_LINUX_MEMFD_H
> +
> +/* flags for memfd_create(2) (unsigned int) */
> +#define MFD_CLOEXEC  0x0001U
> +#define MFD_ALLOW_SEALING0x0002U
> +
> +#endif /* _UAPI_LINUX_MEMFD_H */
> diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
> index 36441b5..489a4e6 100644
> --- a/kernel/sys_ni.c
> +++ b/kernel/sys_ni.c
> @@ -197,6 +197,7 @@ cond_syscall(compat_sys_timerfd_settime);
>  cond_syscall(compat_sys_timerfd_gettime);
>  cond_syscall(sys_eventfd);
>  cond_syscall(sys_eventfd2);
> +cond_syscall(sys_memfd_create);
>  
>  /* performance counters: */
>  cond_syscall(sys_perf_event_open);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 51dccd0..770e072 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -66,7 +66,9 @@ static struct vfsmount *shm_mnt;
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -2678,6 +2680,77 @@ static int shmem_show_options(struct seq_file *seq, 
> struct dentry *root)
>   shmem_show_mpol(seq, sbinfo->mpol);
>   return 0;
>  }
> +
> +#define MFD_NAME_PREFIX "memfd:"
> +#define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
> +#define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
> +
> +#define MFD_ALL_FLAGS (MFD_CLOEXEC

Re: [PATCH v4 4/6] selftests: add memfd_create() + sealing tests

2014-07-23 Thread Hugh Dickins

On Sun, 20 Jul 2014, David Herrmann wrote:

> Some basic tests to verify sealing on memfds works as expected and
> guarantees the advertised semantics.
> 
> Signed-off-by: David Herrmann 

Acked-by: Hugh Dickins 

> ---
>  tools/testing/selftests/Makefile   |   1 +
>  tools/testing/selftests/memfd/.gitignore   |   2 +
>  tools/testing/selftests/memfd/Makefile |  29 +
>  tools/testing/selftests/memfd/memfd_test.c | 913 
> +
>  4 files changed, 945 insertions(+)
>  create mode 100644 tools/testing/selftests/memfd/.gitignore
>  create mode 100644 tools/testing/selftests/memfd/Makefile
>  create mode 100644 tools/testing/selftests/memfd/memfd_test.c
> 
> diff --git a/tools/testing/selftests/Makefile 
> b/tools/testing/selftests/Makefile
> index e66e710..5ef80cb 100644
> --- a/tools/testing/selftests/Makefile
> +++ b/tools/testing/selftests/Makefile
> @@ -2,6 +2,7 @@ TARGETS = breakpoints
>  TARGETS += cpu-hotplug
>  TARGETS += efivarfs
>  TARGETS += kcmp
> +TARGETS += memfd
>  TARGETS += memory-hotplug
>  TARGETS += mqueue
>  TARGETS += net
> diff --git a/tools/testing/selftests/memfd/.gitignore 
> b/tools/testing/selftests/memfd/.gitignore
> new file mode 100644
> index 000..bcc8ee2
> --- /dev/null
> +++ b/tools/testing/selftests/memfd/.gitignore
> @@ -0,0 +1,2 @@
> +memfd_test
> +memfd-test-file
> diff --git a/tools/testing/selftests/memfd/Makefile 
> b/tools/testing/selftests/memfd/Makefile
> new file mode 100644
> index 000..36653b9
> --- /dev/null
> +++ b/tools/testing/selftests/memfd/Makefile
> @@ -0,0 +1,29 @@
> +uname_M := $(shell uname -m 2>/dev/null || echo not)
> +ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/)
> +ifeq ($(ARCH),i386)
> + ARCH := X86
> +endif
> +ifeq ($(ARCH),x86_64)
> + ARCH := X86
> +endif
> +
> +CFLAGS += -I../../../../arch/x86/include/generated/uapi/
> +CFLAGS += -I../../../../arch/x86/include/uapi/
> +CFLAGS += -I../../../../include/uapi/
> +CFLAGS += -I../../../../include/
> +
> +all:
> +ifeq ($(ARCH),X86)
> + gcc $(CFLAGS) memfd_test.c -o memfd_test
> +else
> + echo "Not an x86 target, can't build memfd selftest"
> +endif
> +
> +run_tests: all
> +ifeq ($(ARCH),X86)
> + gcc $(CFLAGS) memfd_test.c -o memfd_test
> +endif
> + @./memfd_test || echo "memfd_test: [FAIL]"
> +
> +clean:
> + $(RM) memfd_test
> diff --git a/tools/testing/selftests/memfd/memfd_test.c 
> b/tools/testing/selftests/memfd/memfd_test.c
> new file mode 100644
> index 000..3634c90
> --- /dev/null
> +++ b/tools/testing/selftests/memfd/memfd_test.c
> @@ -0,0 +1,913 @@
> +#define _GNU_SOURCE
> +#define __EXPORTED_HEADERS__
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define MFD_DEF_SIZE 8192
> +#define STACK_SIZE 65535
> +
> +static int sys_memfd_create(const char *name,
> + unsigned int flags)
> +{
> + return syscall(__NR_memfd_create, name, flags);
> +}
> +
> +static int mfd_assert_new(const char *name, loff_t sz, unsigned int flags)
> +{
> + int r, fd;
> +
> + fd = sys_memfd_create(name, flags);
> + if (fd < 0) {
> + printf("memfd_create(\"%s\", %u) failed: %m\n",
> +name, flags);
> + abort();
> + }
> +
> + r = ftruncate(fd, sz);
> + if (r < 0) {
> + printf("ftruncate(%llu) failed: %m\n", (unsigned long long)sz);
> + abort();
> + }
> +
> + return fd;
> +}
> +
> +static void mfd_fail_new(const char *name, unsigned int flags)
> +{
> + int r;
> +
> + r = sys_memfd_create(name, flags);
> + if (r >= 0) {
> + printf("memfd_create(\"%s\", %u) succeeded, but failure 
> expected\n",
> +name, flags);
> + close(r);
> + abort();
> + }
> +}
> +
> +static __u64 mfd_assert_get_seals(int fd)
> +{
> + long r;
> +
> + r = fcntl(fd, F_GET_SEALS);
> + if (r < 0) {
> + printf("GET_SEALS(%d) failed: %m\n", fd);
> + abort();
> + }
> +
> + return r;
> +}
> +
> +static void mfd_assert_has_seals(int fd, __u64 seals)
> +{
> + __u64 s;
> +
> + s = mfd_assert_get_seals(fd);
> + if (s != seals) {
> + printf("%llu != %llu = GET_SEALS(%d)\n",
> +(unsigned long long)seals, (unsigned long long)s, fd);
> + abort();
> + }
> +}
> +
> +static void mfd_assert_add_seals(int fd, __u64 seals)
> +{
> + long r;
> + __u64 s;
> +
> + s = mfd_assert_get_seals(fd);
> + r = fcntl(fd, F_ADD_SEALS, seals);
> + if (r < 0) {
> + printf("ADD_SEALS(%d, %llu -> %llu) failed: %m\n",
> +fd, (unsigned long long)s, (unsigned long long)seals);
> + abort();
> + }
> +}
> +
> +static void mfd_fail_add_seals(int fd, __u64 seals)
> +{
> + long r;
> + __u64 s;

Re: [PATCH v5 0/2] i2c: add DMA support for freescale i2c driver

2014-07-23 Thread Marek Vasut

On Thursday, July 24, 2014 at 05:36:34 AM, Yao Yuan wrote:
> Hi,
> 
> Marek Vasut wrote:
> > On Wednesday, July 23, 2014 at 10:24:41 AM, Yuan Yao wrote:
> > > Changed in v5:
> > > - add "*chan_dev = dma->chan_using->device->dev" for reduce the call
> > > time.
> > 
> > Did you check if the compiler generates different code ?
> 
> Sorry, I didn't compare the assembly code. It's a subtle change.
> As you mentioned the "noodle" before.
> 
> Old:
> dma_map_single(dma->chan_using->device->dev, ...);
> dma_mapping_error(dma->chan_using->device->dev, ...);
> dma_unmap_single(dma->chan_using->device->dev, ...);
> 
> New:
> struct device *chan_dev = dma->chan_using->device->dev;
> dma_map_single(chan_dev, ...);
> dma_mapping_error(chan_dev, ...);
> dma_unmap_single(chan_dev, ...);

You should not use optimization and code cleanup interchangably. Thanks for 
clarifying what this is.

Best regards,
Marek Vasut
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 1/2] i2c: add DMA support for freescale i2c driver

2014-07-23 Thread Marek Vasut

On Thursday, July 24, 2014 at 05:19:27 AM, Yao Yuan wrote:
> On Fri, May 21, 2014 at 4:01 PM +0200, Wolfram Sang wrote:
> > Ping. Any updates? I think this was pretty close to inclusion?
> 
> Hi, Wolfram
>   Thanks for your concern. Sorry for my reply so late. I had on a business
> trip for months. At that time I have no device to debug it. Now, I'm come
> back and I will try my best to finish it. I have sent the patch for V5.
> Thanks for your review.

So you have no means to test this ? Can this be tested on MX5 ? MX6 ?

Best regards,
Marek Vasut
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4 2/6] shm: add sealing API

2014-07-23 Thread Hugh Dickins

On Sun, 20 Jul 2014, David Herrmann wrote:

> If two processes share a common memory region, they usually want some
> guarantees to allow safe access. This often includes:
>   - one side cannot overwrite data while the other reads it
>   - one side cannot shrink the buffer while the other accesses it
>   - one side cannot grow the buffer beyond previously set boundaries
> 
> If there is a trust-relationship between both parties, there is no need
> for policy enforcement. However, if there's no trust relationship (eg.,
> for general-purpose IPC) sharing memory-regions is highly fragile and
> often not possible without local copies. Look at the following two
> use-cases:
>   1) A graphics client wants to share its rendering-buffer with a
>  graphics-server. The memory-region is allocated by the client for
>  read/write access and a second FD is passed to the server. While
>  scanning out from the memory region, the server has no guarantee that
>  the client doesn't shrink the buffer at any time, requiring rather
>  cumbersome SIGBUS handling.
>   2) A process wants to perform an RPC on another process. To avoid huge
>  bandwidth consumption, zero-copy is preferred. After a message is
>  assembled in-memory and a FD is passed to the remote side, both sides
>  want to be sure that neither modifies this shared copy, anymore. The
>  source may have put sensible data into the message without a separate
>  copy and the target may want to parse the message inline, to avoid a
>  local copy.
> 
> While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
> ways to achieve most of this, the first one is unproportionally ugly to
> use in libraries and the latter two are broken/racy or even disabled due
> to denial of service attacks.
> 
> This patch introduces the concept of SEALING. If you seal a file, a
> specific set of operations is blocked on that file forever.
> Unlike locks, seals can only be set, never removed. Hence, once you
> verified a specific set of seals is set, you're guaranteed that no-one can
> perform the blocked operations on this file, anymore.
> 
> An initial set of SEALS is introduced by this patch:
>   - SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
> in size. This affects ftruncate() and open(O_TRUNC).
>   - GROW: If SEAL_GROW is set, the file in question cannot be increased
>   in size. This affects ftruncate(), fallocate() and write().
>   - WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
>are possible. This affects fallocate(PUNCH_HOLE), mmap() and
>write().
>   - SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
>   This basically prevents the F_ADD_SEAL operation on a file and
>   can be set to prevent others from adding further seals that you
>   don't want.
> 
> The described use-cases can easily use these seals to provide safe use
> without any trust-relationship:
>   1) The graphics server can verify that a passed file-descriptor has
>  SEAL_SHRINK set. This allows safe scanout, while the client is
>  allowed to increase buffer size for window-resizing on-the-fly.
>  Concurrent writes are explicitly allowed.
>   2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
>  SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
>  process can modify the data while the other side parses it.
>  Furthermore, it guarantees that even with writable FDs passed to the
>  peer, it cannot increase the size to hit memory-limits of the source
>  process (in case the file-storage is accounted to the source).
> 
> The new API is an extension to fcntl(), adding two new commands:
>   F_GET_SEALS: Return a bitset describing the seals on the file. This
>can be called on any FD if the underlying file supports
>sealing.
>   F_ADD_SEALS: Change the seals of a given file. This requires WRITE
>access to the file and F_SEAL_SEAL may not already be set.
>Furthermore, the underlying file must support sealing and
>there may not be any existing shared mapping of that file.
>Otherwise, EBADF/EPERM is returned.
>The given seals are _added_ to the existing set of seals
>on the file. You cannot remove seals again.
> 
> The fcntl() handler is currently specific to shmem and disabled on all
> files. A file needs to explicitly support sealing for this interface to
> work. A separate syscall is added in a follow-up, which creates files that
> support sealing. There is no intention to support this on other
> file-systems. Semantics are unclear for non-volatile files and we lack any
> use-case right now. Therefore, the implementation is specific to shmem.
> 
> Signed-off-by: David Herrmann 

Acked-by: Hugh Dickins 

We've just changed the context lines of your

Re: [PATCH v4 1/6] mm: allow drivers to prevent new writable mappings

2014-07-23 Thread Hugh Dickins

On Sun, 20 Jul 2014, David Herrmann wrote:

> The i_mmap_writable field counts existing writable mappings of an
> address_space. To allow drivers to prevent new writable mappings, make
> this counter signed and prevent new writable mappings if it is negative.
> This is modelled after i_writecount and DENYWRITE.
> 
> This will be required by the shmem-sealing infrastructure to prevent any
> new writable mappings after the WRITE seal has been set. In case there
> exists a writable mapping, this operation will fail with EBUSY.
> 
> Note that we rely on the fact that iff you already own a writable mapping,
> you can increase the counter without using the helpers. This is the same
> that we do for i_writecount.
> 
> Acked-by: Hugh Dickins 

Yup!

> Signed-off-by: David Herrmann 
> ---
>  fs/inode.c |  1 +
>  include/linux/fs.h | 29 +++--
>  kernel/fork.c  |  2 +-
>  mm/mmap.c  | 30 --
>  mm/swap_state.c|  1 +
>  5 files changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 6eecb7f..9945b11 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -165,6 +165,7 @@ int inode_init_always(struct super_block *sb, struct 
> inode *inode)
>   mapping->a_ops = _aops;
>   mapping->host = inode;
>   mapping->flags = 0;
> + atomic_set(>i_mmap_writable, 0);
>   mapping_set_gfp_mask(mapping, GFP_HIGHUSER_MOVABLE);
>   mapping->private_data = NULL;
>   mapping->backing_dev_info = _backing_dev_info;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e11d60c..152b254 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -387,7 +387,7 @@ struct address_space {
>   struct inode*host;  /* owner: inode, block_device */
>   struct radix_tree_root  page_tree;  /* radix tree of all pages */
>   spinlock_t  tree_lock;  /* and lock protecting it */
> - unsigned inti_mmap_writable;/* count VM_SHARED mappings */
> + atomic_ti_mmap_writable;/* count VM_SHARED mappings */
>   struct rb_root  i_mmap; /* tree of private and shared 
> mappings */
>   struct list_headi_mmap_nonlinear;/*list VM_NONLINEAR mappings */
>   struct mutexi_mmap_mutex;   /* protect tree, count, list */
> @@ -470,10 +470,35 @@ static inline int mapping_mapped(struct address_space 
> *mapping)
>   * Note that i_mmap_writable counts all VM_SHARED vmas: do_mmap_pgoff
>   * marks vma as VM_SHARED if it is shared, and the file was opened for
>   * writing i.e. vma may be mprotected writable even if now readonly.
> + *
> + * If i_mmap_writable is negative, no new writable mappings are allowed. You
> + * can only deny writable mappings, if none exists right now.
>   */
>  static inline int mapping_writably_mapped(struct address_space *mapping)
>  {
> - return mapping->i_mmap_writable != 0;
> + return atomic_read(>i_mmap_writable) > 0;
> +}
> +
> +static inline int mapping_map_writable(struct address_space *mapping)
> +{
> + return atomic_inc_unless_negative(>i_mmap_writable) ?
> + 0 : -EPERM;
> +}
> +
> +static inline void mapping_unmap_writable(struct address_space *mapping)
> +{
> + atomic_dec(>i_mmap_writable);
> +}
> +
> +static inline int mapping_deny_writable(struct address_space *mapping)
> +{
> + return atomic_dec_unless_positive(>i_mmap_writable) ?
> + 0 : -EBUSY;
> +}
> +
> +static inline void mapping_allow_writable(struct address_space *mapping)
> +{
> + atomic_inc(>i_mmap_writable);
>  }
>  
>  /*
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 6a13c46..9d37dfd 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -421,7 +421,7 @@ static int dup_mmap(struct mm_struct *mm, struct 
> mm_struct *oldmm)
>   atomic_dec(>i_writecount);
>   mutex_lock(>i_mmap_mutex);
>   if (tmp->vm_flags & VM_SHARED)
> - mapping->i_mmap_writable++;
> + atomic_inc(>i_mmap_writable);
>   flush_dcache_mmap_lock(mapping);
>   /* insert tmp into the share list, just after mpnt */
>   if (unlikely(tmp->vm_flags & VM_NONLINEAR))
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 129b847..7f548e4 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -216,7 +216,7 @@ static void __remove_shared_vm_struct(struct 
> vm_area_struct *vma,
>   if (vma->vm_flags & VM_DENYWRITE)
>   atomic_inc(_inode(file)->i_writecount);
>   if (vma->vm_flags & VM_SHARED)
> - mapping->i_mmap_writable--;
> + mapping_unmap_writable(mapping);
>  
>   flush_dcache_mmap_lock(mapping);
>   if (unlikely(vma->vm_flags & VM_NONLINEAR))
> @@ -617,7 +617,7 @@ static void __vma_link_file(struct vm_area_struct *vma)
>   if (vma->vm_flags & VM_DENYWRITE)
>

Re: [PATCH net-next 0/2] net: filter: split eBPF interpreter out of core networking

2014-07-23 Thread David Miller

From: Alexei Starovoitov 
Date: Tue, 22 Jul 2014 23:01:57 -0700

> I believe my recent set of RFC/patches [1] provided good visibility on where
> I would like to take eBPF subsystem. These two trivial patches is a first step
> in that direction:
> patch 1 - mechanical split of eBPF interpreter out of filter.c
> patch 2 - nominate myself as a maintainer for eBPF core pieces
> In the foreseeable future eBPF patches will be going through net-next,
> so put netdev as a primary mailing list
> 
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf master

Series applied to net-next.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] sched_clock: Avoid corrupting hrtimer tree during suspend

2014-07-23 Thread John Stultz

From: Stephen Boyd 

During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.

Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: stable 
Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd 
Signed-off-by: John Stultz 
---

Ingo/Thomas: Here's a fix for tip/timers/urgent for 3.16 and -stable.
Let me know if you have any objections.

thanks
-john

 kernel/time/sched_clock.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 445106d..01d2d15 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -191,7 +191,8 @@ void __init sched_clock_postinit(void)
 
 static int sched_clock_suspend(void)
 {
-   sched_clock_poll(_clock_timer);
+   update_sched_clock();
+   hrtimer_cancel(_clock_timer);
cd.suspended = true;
return 0;
 }
@@ -199,6 +200,7 @@ static int sched_clock_suspend(void)
 static void sched_clock_resume(void)
 {
cd.epoch_cyc = read_sched_clock();
+   hrtimer_start(_clock_timer, cd.wrap_kt, HRTIMER_MODE_REL);
cd.suspended = false;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/16] rcu: Check for spurious wakeup using return value

2014-07-23 Thread Pranith Kumar

On Wed, Jul 23, 2014 at 11:43 PM, Paul E. McKenney
 wrote:
> On Wed, Jul 23, 2014 at 10:36:19PM -0400, Pranith Kumar wrote:
>> On Wed, Jul 23, 2014 at 8:26 AM, Paul E. McKenney
>>  wrote:
>> > On Wed, Jul 23, 2014 at 01:09:48AM -0400, Pranith Kumar wrote:
>> >> When the gp_kthread wakes up from the wait event, it returns 0 if the 
>> >> wake up is
>> >> due to the condition having been met. This commit checks this return value
>> >> for a spurious wake up before calling rcu_gp_init().
>> >>
>> >> Signed-off-by: Pranith Kumar 
>> >
>> > How does this added check help?  I don't see that it does.  If the flag
>> > is set, we want to wake up.  If we get a spurious wakeup, but then the
>> > flag gets set before we actually wake up, we still want to wake up.
>>
>> So I took a look at the docs again, and using the return value is the
>> recommended way to check for spurious wakeups.
>>
>> The condition in wait_event_interruptible() is checked when the task
>> is woken up (either due to stray signals or explicitly) and it returns
>> true if condition evaluates to true.

this should be returns '0' if the condition evaluates to true.

>>
>> In the current scenario, if we get a spurious wakeup, we take the
>> costly path of checking this condition again (with a barrier and lock)
>> before going back to wait.
>>
>> The scenario of getting an actual wakeup after getting a spurious
>> wakeup exists even today, this is the window after detecting a
>> spurious wakeup and before going back to wait. I am not sure if using
>> the return value enlarges that window as we are going back to sleep
>> immediately.
>>
>> Thoughts?
>
> If the flag is set, why should we care whether or not the wakeup was
> spurious?  If the flag is not set, why should we care whether or not
> wait_event_interruptible() thought that the wakeup was not spurious?
>

A correction about the return value above: return will be 0 if the
condition is true, in this case if the flag is set.

If the flag is set, ret will be 0 and we will go ahead with
rcu_gp_init(). (no change wrt current behavior)

If the flag is not set, currently we go ahead and call rcu_gp_init()
from where we check if the flag is set (after a lock+barrier) and
return.

If we care about what wait_event_interruptible() returns, we can go
back and wait for an actual wakeup much earlier without the additional
overhead of calling rcu_gp_init().

-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] ARM: mvebu: Added dts defintion for Lenovo Iomega ix4-300d NAS

2014-07-23 Thread Baruch Siach

Hi Benoit,

On Wed, Jul 23, 2014 at 03:52:53PM -0700, Benoit Masson wrote:
> The Lenovo Iomega ix4-300d is a 4-Bay sata NAS with dual Gb,
>  USB2.0 & 3.0, powered by a Marvell Armada XP MV78230 dual core CPU.
> 
> http://shop.lenovo.com/fr/fr/servers/network-storage/lenovoemc/ix4-300d/

I guess most users would prefer an English URL:
http://shop.lenovo.com/us/en/servers/network-storage/lenovoemc/ix4-300d/

Also, the accepted convention is to add a blank line between the commit log 
body and the Signed-off-by line.

> Signed-off-by: Benoit Masson 

baruch

-- 
 http://baruch.siach.name/blog/  ~. .~   Tk Open Systems
=}ooO--U--Ooo{=
   - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC] suspend/hibernation: Fix racing timers

2014-07-23 Thread John Stultz

On 07/21/2014 10:35 AM, Soren Brinkmann wrote:
> On platforms that do not power off during suspend, successfully entering
> suspend races with timers.
>
> The race happening in a couple of location is:
>
>   1. disable IRQs (e.g. arch_suspend_disable_irqs())
>  ...
>   2. syscore_suspend()
> -> tick_suspend() (timers are turned off here)
>  ...
>   3. wfi  (wait for wake-IRQ here)
>
> Between steps 1 and 2 the timers can still generate interrupts that are
> not handled and stay pending until step 3. That pending IRQ causes an
> immediate - spurious - wake.
>
> The solution is to remove the timekeeping suspend/resume functions from
> the syscore functions and explictly call them at the appropriate time in
> the suspend/hibernation patchs. I.e. timers are suspend _before_ IRQs
> get disabled. And accordingly in the resume path.

So.. I sort of follow this, though from the description disabling
timekeeping to turn off timers seems a little indirect (I do see that
suspending timekeeping calls clockevents_suspend() which is the key
part). Maybe this could be clarified in a future version of the patch
description?

I worry that moving timekeeping_suspend earlier in the suspend process
might cause problems where things access time in the suspend path. I
recall these orderings have been problematic in the past, and slightly
tweaking them can often destabilize things badly.

I wonder if it would be better just to move the clockevent_suspend()
call to the earlier site, that way timers are halted but timekeeping
continues until its normal suspend point.

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 9/9] fs: dlm: lockd: Convert int result to unsigned char type

2014-07-23 Thread Joe Perches

On Wed, 2014-07-23 at 14:11 -0400, Jeff Layton wrote:
> On Sun, 20 Jul 2014 11:23:43 -0700 Joe Perches  wrote:
> > op->info.rv is an s32, but it's only used as a u8.
> I don't understand this patch. info.rv is s32 (and I assume that "rv"
> stands for "return value").

In this case it's not a return value but an input.

> What I don't get is why you think it's just
> used as a u8.

Because it's tested only in nlmsvc_grant_deferred
and nlmsvc_update_deferred_block against 0.

As far as I can tell, it's not possible to set it
to a negative value.

> It seems to be used more like a bool than anything else,
> and I'm not sure that "type" is really a good description for it. Maybe
> it should be a "bool" and named "conflict",

Maybe.  But it seemed likely and possible to expand
it from a single bool to a value.

> given the comments in dlm_posix_get ?

Maybe, though I don't see how the comments relate to
this change.  The rv value returned from that call
is either -ENOMEM or 0 and is unchanged by this patch.

> > diff --git a/include/linux/fs.h b/include/linux/fs.h
[]
> > @@ -842,7 +842,7 @@ struct lock_manager_operations {
> > int (*lm_compare_owner)(struct file_lock *fl1, struct file_lock *fl2);
> > unsigned long (*lm_owner_key)(struct file_lock *fl);
> > void (*lm_notify)(struct file_lock *fl);/* unblock callback */
> > -   int (*lm_grant)(struct file_lock *fl, int result);
> > +   int (*lm_grant)(struct file_lock *fl, unsigned char type);
> > void (*lm_break)(struct file_lock *fl);
> > int (*lm_change)(struct file_lock **fl, int type);
> >  };

I used variable name "type" because that's what
lm_change uses.  No worries if you think a name
like conflict is better.

The only in-kernel setter of lm_grant is:

fs/lockd/svclock.c: .lm_grant = nlmsvc_grant_deferred,

and for that, I think using a variable name of
"result" is misleading at best.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3] video: hyperv: hyperv_fb: refresh the VM screen by force on VM panic

2014-07-23 Thread gre...@linuxfoundation.org

On Thu, Jul 24, 2014 at 01:54:58AM +, Dexuan Cui wrote:
> Ping again -- any one has further comment for the v3 version?
> 
> It looks the framebuffer layer's Tomi and Jean-Christophe are out recently?
> Recently I don't see mail from them since Jul 1 and Jul 8 in the lists.
> 
> Though the patch belongs to driver/video/fbdev/, it only changes hyper-v stuff
> and only changes one file and the patch itself is straightforward IMO.
> 
> So, hi Greg and all, 
> If you think the patch itself is OK, may I know if it's OK for the patch to go
> into Greg's char-misc.git tree, as some other hyper-v patches did?

No, it needs to go through the fb tree, not mine, sorry.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 1/3] arm64: ptrace: reload a syscall number after ptrace operations

2014-07-23 Thread Andy Lutomirski


On 07/22/2014 02:14 AM, AKASHI Takahiro wrote:

Arm64 holds a syscall number in w8(x8) register. Ptrace tracer may change
its value either to:
   * any valid syscall number to alter a system call, or
   * -1 to skip a system call

This patch implements this behavior by reloading that value into syscallno
in struct pt_regs after tracehook_report_syscall_entry() or
secure_computing(). In case of '-1', a return value of system call can also
be changed by the tracer setting the value to x0 register, and so
sys_ni_nosyscall() should not be called.

See also:
 42309ab4, ARM: 8087/1: ptrace: reload syscall number after
  secure_computing() check

Signed-off-by: AKASHI Takahiro 
---
  arch/arm64/kernel/entry.S  |2 ++
  arch/arm64/kernel/ptrace.c |   13 +
  2 files changed, 15 insertions(+)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 5141e79..de8bdbc 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -628,6 +628,8 @@ ENDPROC(el0_svc)
  __sys_trace:
mov x0, sp
bl  syscall_trace_enter
+   cmp w0, #-1 // skip syscall?
+   b.eqret_to_user


Does this mean that skipped syscalls will cause exit tracing to be 
skipped?  If so, then you risk (at least) introducing a nice 
user-triggerable OOPS if audit is enabled.  This bug existed for *years* 
on x86_32, and it amazes me that no one ever triggered it by accident. 
(Grr, audit.)


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 3/3] arm64: Add seccomp support

2014-07-23 Thread Andy Lutomirski


On 07/22/2014 02:14 AM, AKASHI Takahiro wrote:

secure_computing() should always be called first in syscall_trace_enter().

If secure_computing() returns -1, we should stop further handling. Then
that system call may eventually fail with a specified return value (errno),
be trapped or the process itself be killed depending on loaded rules.
In these cases, syscall_trace_enter() also returns -1, that results in
skiping a normal syscall handling as well as syscall_trace_exit().

Signed-off-by: AKASHI Takahiro 
---
  arch/arm64/Kconfig   |   14 ++
  arch/arm64/include/asm/seccomp.h |   25 +
  arch/arm64/include/asm/unistd.h  |3 +++
  arch/arm64/kernel/ptrace.c   |5 +
  4 files changed, 47 insertions(+)
  create mode 100644 arch/arm64/include/asm/seccomp.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3a18571..eeac003 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -32,6 +32,7 @@ config ARM64
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KGDB
+   select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
select HAVE_C_RECORDMCOUNT
select HAVE_DEBUG_BUGVERBOSE
@@ -259,6 +260,19 @@ config ARCH_HAS_CACHE_LINE_SIZE

  source "mm/Kconfig"

+config SECCOMP
+   bool "Enable seccomp to safely compute untrusted bytecode"
+   ---help---
+ This kernel feature is useful for number crunching applications
+ that may need to compute untrusted bytecode during their
+ execution. By using pipes or other transports made available to
+ the process as file descriptors supporting the read/write
+ syscalls, it's possible to isolate those applications in
+ their own address space using seccomp. Once seccomp is
+ enabled via prctl(PR_SET_SECCOMP), it cannot be disabled
+ and the task is only allowed to execute a few safe syscalls
+ defined by each seccomp mode.
+
  config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h
new file mode 100644
index 000..c76fac9
--- /dev/null
+++ b/arch/arm64/include/asm/seccomp.h
@@ -0,0 +1,25 @@
+/*
+ * arch/arm64/include/asm/seccomp.h
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _ASM_SECCOMP_H
+#define _ASM_SECCOMP_H
+
+#include 
+
+#ifdef CONFIG_COMPAT
+#define __NR_seccomp_read_32   __NR_compat_read
+#define __NR_seccomp_write_32  __NR_compat_write
+#define __NR_seccomp_exit_32   __NR_compat_exit
+#define __NR_seccomp_sigreturn_32  __NR_compat_rt_sigreturn
+#endif /* CONFIG_COMPAT */
+
+#include 
+
+#endif /* _ASM_SECCOMP_H */
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index c980ab7..729c155 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -31,6 +31,9 @@
   * Compat syscall numbers used by the AArch64 kernel.
   */
  #define __NR_compat_restart_syscall   0
+#define __NR_compat_exit   1
+#define __NR_compat_read   3
+#define __NR_compat_write  4
  #define __NR_compat_sigreturn 119
  #define __NR_compat_rt_sigreturn  173

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 100d7d1..e477f6f 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1115,6 +1116,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs)
saved_x0 = regs->regs[0];
saved_x8 = regs->regs[8];

+   if (secure_computing(regs->syscallno) == -1)
+   /* seccomp failures shouldn't expose any additional code. */
+   return -1;
+


This will conflict with the fastpath stuff in Kees' tree.  (Actually, 
it's likely to apply cleanly, but fail to compile.)  The fix is trivial, 
but, given that the fastpath stuff is new, can you take a look and see 
if arm64 can use it effectively?


I suspect that the performance considerations are rather different on 
arm64 as compared to x86 (I really hope that x86 is the only 
architecture with the absurd sysret vs. iret distinction), but at least 
the seccomp_data stuff ought to help anywhere.  (It looks like there's a 
distinct fast path, too, so the two-phase thing might also be a fairly 
large win if it's supportable.)


See:

https://git.kernel.org/cgit/linux/kernel/git/kees/linux.git/log/?h=seccomp/fastpath

Also, I'll ask the usual question?  What are all of the factors other 
than nr and args that affect syscall execution?  What are the audit arch 
values?  Do they match correctly?


For example,

[PATCH] staging: Remove checkpatch errors in InterfaceMarcos.h

2014-07-23 Thread Nicholas Krause

This removes the two errors I get when running checkpatch on this
file. The first being not to use C99 comments and the second to
remove spacing issues with the define statement on line 7.

Signed-off-by: Nicholas Krause 
---
 drivers/staging/bcm/InterfaceMacros.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/bcm/InterfaceMacros.h 
b/drivers/staging/bcm/InterfaceMacros.h
index 7001caf..531aadf 100644
--- a/drivers/staging/bcm/InterfaceMacros.h
+++ b/drivers/staging/bcm/InterfaceMacros.h
@@ -4,13 +4,13 @@
 #define BCM_USB_MAX_READ_LENGTH 2048
 
 #define MAXIMUM_USB_TCB  128
-#define MAXIMUM_USB_RCB 128
+#define MAXIMUM_USB_RCB 128
 
 #define MAX_BUFFERS_PER_QUEUE   256
 
 #define MAX_DATA_BUFFER_SIZE2048
 
-//Num of Asynchronous reads pending
+/* Num of Asynchronous reads pending */
 #define NUM_RX_DESC 64
 
 #define SYS_CFG 0x0F000C00
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/16] rcu: Check for spurious wakeup using return value

2014-07-23 Thread Paul E. McKenney

On Wed, Jul 23, 2014 at 10:36:19PM -0400, Pranith Kumar wrote:
> On Wed, Jul 23, 2014 at 8:26 AM, Paul E. McKenney
>  wrote:
> > On Wed, Jul 23, 2014 at 01:09:48AM -0400, Pranith Kumar wrote:
> >> When the gp_kthread wakes up from the wait event, it returns 0 if the wake 
> >> up is
> >> due to the condition having been met. This commit checks this return value
> >> for a spurious wake up before calling rcu_gp_init().
> >>
> >> Signed-off-by: Pranith Kumar 
> >
> > How does this added check help?  I don't see that it does.  If the flag
> > is set, we want to wake up.  If we get a spurious wakeup, but then the
> > flag gets set before we actually wake up, we still want to wake up.
> 
> So I took a look at the docs again, and using the return value is the
> recommended way to check for spurious wakeups.
> 
> The condition in wait_event_interruptible() is checked when the task
> is woken up (either due to stray signals or explicitly) and it returns
> true if condition evaluates to true.
> 
> In the current scenario, if we get a spurious wakeup, we take the
> costly path of checking this condition again (with a barrier and lock)
> before going back to wait.
> 
> The scenario of getting an actual wakeup after getting a spurious
> wakeup exists even today, this is the window after detecting a
> spurious wakeup and before going back to wait. I am not sure if using
> the return value enlarges that window as we are going back to sleep
> immediately.
> 
> Thoughts?

If the flag is set, why should we care whether or not the wakeup was
spurious?  If the flag is not set, why should we care whether or not
wait_event_interruptible() thought that the wakeup was not spurious?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 2/3] asm-generic: Add generic seccomp.h for secure computing mode 1

2014-07-23 Thread Andy Lutomirski


On 07/22/2014 02:14 AM, AKASHI Takahiro wrote:

Those values (__NR_seccomp_*) are used solely in secure_computing()
to identify mode 1 system calls. If compat system calls have different
syscall numbers, asm/seccomp.h may override them.

Acked-by: Arnd Bergmann 
Signed-off-by: AKASHI Takahiro 
---
  include/asm-generic/seccomp.h |   28 
  1 file changed, 28 insertions(+)
  create mode 100644 include/asm-generic/seccomp.h

diff --git a/include/asm-generic/seccomp.h b/include/asm-generic/seccomp.h
new file mode 100644
index 000..5e97022
--- /dev/null
+++ b/include/asm-generic/seccomp.h
@@ -0,0 +1,28 @@
+/*
+ * include/asm-generic/seccomp.h
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef _ASM_GENERIC_SECCOMP_H
+#define _ASM_GENERIC_SECCOMP_H
+
+#include 
+
+#if defined(CONFIG_COMPAT) && !defined(__NR_seccomp_read_32)
+#define __NR_seccomp_read_32   __NR_read
+#define __NR_seccomp_write_32  __NR_write
+#define __NR_seccomp_exit_32   __NR_exit
+#define __NR_seccomp_sigreturn_32  __NR_rt_sigreturn
+#endif /* CONFIG_COMPAT && ! already defined */
+
+#define __NR_seccomp_read  __NR_read
+#define __NR_seccomp_write __NR_write
+#define __NR_seccomp_exit  __NR_exit
+#define __NR_seccomp_sigreturn __NR_rt_sigreturn


I don't like these names.  __NR_seccomp_read sounds like the number of a 
syscall called seccomp_read.


Also, shouldn't something be including this header?  I'm confused.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v5 0/2] i2c: add DMA support for freescale i2c driver

2014-07-23 Thread Yao Yuan

Hi,

Marek Vasut wrote:
> On Wednesday, July 23, 2014 at 10:24:41 AM, Yuan Yao wrote:
> > Changed in v5:
> > - add "*chan_dev = dma->chan_using->device->dev" for reduce the call time.
> 
> Did you check if the compiler generates different code ?
> 

Sorry, I didn't compare the assembly code. It's a subtle change. 
As you mentioned the "noodle" before.

Old:
dma_map_single(dma->chan_using->device->dev, ...);
dma_mapping_error(dma->chan_using->device->dev, ...);
dma_unmap_single(dma->chan_using->device->dev, ...);

New:
struct device *chan_dev = dma->chan_using->device->dev;
dma_map_single(chan_dev, ...);
dma_mapping_error(chan_dev, ...);
dma_unmap_single(chan_dev, ...);

> > - add the test logs.
> 
> [...]
> 
> Best regards,
> Marek Vasut
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: Join lines in IntefaceIdleMode.c

2014-07-23 Thread Nicholas Krause

This joins two lines that need to be joined as this improves
the coding style and makes it much easier to read.

Signed-off-by: Nicholas Krause 
---
 drivers/staging/bcm/InterfaceIdleMode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/bcm/InterfaceIdleMode.c 
b/drivers/staging/bcm/InterfaceIdleMode.c
index c84ee49..e075f0e 100644
--- a/drivers/staging/bcm/InterfaceIdleMode.c
+++ b/drivers/staging/bcm/InterfaceIdleMode.c
@@ -211,8 +211,7 @@ static int InterfaceAbortIdlemode(struct bcm_mini_adapter 
*Adapter,
else
BCM_DEBUG_PRINT(Adapter, DBG_TYPE_OTHERS,
IDLE_MODE, DBG_LVL_ALL,
-   "Number of completed iteration to"
-   "read chip-id :%lu", itr);
+   "Number of completed iteration to read chip-id 
:%lu", itr);
 
status = wrmalt(Adapter, SW_ABORT_IDLEMODE_LOC,
, sizeof(status));
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3 1/2] mm: introduce vm_ops->map_pages()

2014-07-23 Thread Sasha Levin

On 02/27/2014 02:53 PM, Kirill A. Shutemov wrote:
> The patch introduces new vm_ops callback ->map_pages() and uses it for
> mapping easy accessible pages around fault address.
> 
> On read page fault, if filesystem provides ->map_pages(), we try to map
> up to FAULT_AROUND_PAGES pages around page fault address in hope to
> reduce number of minor page faults.
> 
> We call ->map_pages first and use ->fault() as fallback if page by the
> offset is not ready to be mapped (cold page cache or something).
> 
> Signed-off-by: Kirill A. Shutemov 
> ---

Hi all,

This patch triggers use-after-free when fuzzing using trinity and the KASAN
patchset.

KASAN's report is:

[  663.269187] AddressSanitizer: use after free in 
do_read_fault.isra.40+0x3c2/0x510 at addr 88048a733110
[  663.275260] page:ea001229ccc0 count:0 mapcount:0 mapping:  
(null) index:0x0
[  663.277061] page flags: 0xaf80008000(tail)
[  663.278759] page dumped because: kasan error
[  663.280645] CPU: 6 PID: 9262 Comm: trinity-c104 Not tainted 
3.16.0-rc6-next-20140723-sasha-00047-g289342b-dirty #929
[  663.282898]  00fb  ea001229ccc0 
88038ac0fb78
[  663.288759]  a5e40903 88038ac0fc48 88038ac0fc38 
a142acfc
[  663.291496]  0001 880509ff5aa8 88038ac10038 
88038ac0fbb0
[  663.294379] Call Trace:
[  663.294806] dump_stack (lib/dump_stack.c:52)
[  663.300665] kasan_report_error (mm/kasan/report.c:98 mm/kasan/report.c:166)
[  663.301659] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[  663.304645] ? preempt_count_sub (kernel/sched/core.c:2606)
[  663.305800] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 
kernel/locking/lockdep.c:254)
[  663.306839] ? do_read_fault.isra.40 (mm/memory.c:2784 mm/memory.c:2849 
mm/memory.c:2898)
[  663.307515] __asan_load8 (mm/kasan/kasan.c:364)
[  663.308038] ? do_read_fault.isra.40 (mm/memory.c:2864 mm/memory.c:2898)
[  663.309158] do_read_fault.isra.40 (mm/memory.c:2864 mm/memory.c:2898)
[  663.310311] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:98 
include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:183)
[  663.311282] ? __pte_alloc (mm/memory.c:598)
[  663.312331] handle_mm_fault (mm/memory.c:3092 mm/memory.c:3225 
mm/memory.c:3345 mm/memory.c:3374)
[  663.313895] ? pud_huge (./arch/x86/include/asm/paravirt.h:611 
arch/x86/mm/hugetlbpage.c:76)
[  663.314793] __get_user_pages (mm/gup.c:286 mm/gup.c:478)
[  663.315775] __mlock_vma_pages_range (mm/mlock.c:262)
[  663.316879] __mm_populate (mm/mlock.c:710)
[  663.317813] SyS_remap_file_pages (mm/mmap.c:2653 mm/mmap.c:2593)
[  663.318848] tracesys (arch/x86/kernel/entry_64.S:541)
[  663.319683] Read of size 8 by thread T9262:
[  663.320580] Memory state around the buggy address:
[  663.321392]  88048a732e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.322573]  88048a732f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.323802]  88048a732f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.325080]  88048a733000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.326327]  88048a733080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.327572] >88048a733100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.328840]  ^
[  663.329487]  88048a733180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.330762]  88048a733200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.331994]  88048a733280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.333262]  88048a733300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb
[  663.334488]  88048a733380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb 
fb

This also proceeds with the traditional:

[  663.474532] BUG: unable to handle kernel paging request at 88048a635de8
[  663.474548] IP: do_read_fault.isra.40 (mm/memory.c:2864 mm/memory.c:2898)

But the rest of it got scrambled between the KASAN prints and the other BUG 
info + trace (who
broke printk?!).


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: Fix space issues for header of headers.h

2014-07-23 Thread Nicholas Krause

This patch fixes the space errors checkpatch gives on this file
for the header of this file.

Signed-off-by: Nicholas Krause 
---
 drivers/staging/bcm/headers.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/bcm/headers.h b/drivers/staging/bcm/headers.h
index 6f3270c..af5b6df 100644
--- a/drivers/staging/bcm/headers.h
+++ b/drivers/staging/bcm/headers.h
@@ -1,6 +1,6 @@
 
 /***
-*  Headers.h
+* Headers.h
 ***/
 #ifndef __HEADERS_H__
 #define __HEADERS_H__
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v4 1/2] i2c: add DMA support for freescale i2c driver

2014-07-23 Thread Yao Yuan

On Fri, May 21, 2014 at 4:01 PM +0200, Wolfram Sang wrote:
> 
> Ping. Any updates? I think this was pretty close to inclusion?

Hi, Wolfram
Thanks for your concern. Sorry for my reply so late. I had on a 
business trip for months.
At that time I have no device to debug it. Now, I'm come back and I will try my 
best to finish it.
I have sent the patch for V5. Thanks for your review.

Best regards,
Yuan Yao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 9/9] block: loop: support to submit I/O via kernel aio based

2014-07-23 Thread Ming Lei

On Thu, Jul 24, 2014 at 6:55 AM, Ming Lei  wrote:
> Part of the patch is based on Dave's previous post.
>
> It is easy to observe that loop block device thoughput
> can be increased by > 100% in single job randread,
> libaio engine, direct I/O fio test.
>
> Cc: Zach Brown 
> Cc: Dave Kleikamp 
> Cc: Benjamin LaHaise 
> Signed-off-by: Ming Lei 
> ---
>  drivers/block/loop.c  |  128 
> ++---
>  drivers/block/loop.h  |1 +
>  include/uapi/linux/loop.h |1 +
>  3 files changed, 122 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 96a8913..2279d26 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c

> +static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
> +bool write, loff_t pos)
> +{
> +   struct file *file = lo->lo_backing_file;
> +   struct request *rq = cmd->rq;
> +   struct kiocb *iocb;
> +   unsigned int op, i = 0;
> +   struct iov_iter iter;
> +   struct bio_vec *bvec, bv;
> +   size_t nr_segs = 0;
> +   struct req_iterator r_iter;
> +   int ret = -EIO;
> +
> +   /* how many segments */
> +   rq_for_each_segment(bv, rq, r_iter)
> +   nr_segs++;
> +
> +   iocb = aio_kernel_alloc(GFP_NOIO, nr_segs * sizeof(*bvec));
> +   if (!iocb) {
> +   ret = -ENOMEM;
> +   goto failed;
> +   }
> +
> +   cmd->iocb = iocb;
> +   bvec = (struct bio_vec *)(iocb + 1);
> +   rq_for_each_segment(bv, rq, r_iter)
> +   bvec[i++] = bv;
> +
> +   if (write)
> +   op = IOCB_CMD_WRITE_ITER;
> +   else
> +   op = IOCB_CMD_READ_ITER;
> +
> +   iter.type = ITER_BVEC | (write ? WRITE : 0);
> +   iter.bvec = bvec;
> +   iter.nr_segs = nr_segs;
> +   iter.count = blk_rq_bytes(rq);
> +   iter.iov_offset = 0;
> +
> +   aio_kernel_init_rw(iocb, file, iov_iter_count(), pos,
> +  lo_rw_aio_complete, (u64)(uintptr_t)cmd);
> +   ret = aio_kernel_submit(iocb, op, );
> +
> +   /*
> +* use same policy with userspace aio, req may have been
> +* completed already, so relase it by aio completion.
> +*/
> +   if (ret != -EIOCBQUEUED)
> +   lo_rw_aio_complete((u64)cmd, ret);

The above stuff should have been put inside aio_kernel_submit(),
will do it in V1.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: build failure after merge of the crypto tree

2014-07-23 Thread Stephen Rothwell

: symbol DRM_KMS_FB_HELPER is selected by 
DRM_KMS_CMA_HELPER
drivers/gpu/drm/Kconfig:74: symbol DRM_KMS_CMA_HELPER is selected by 
DRM_TILCDC
drivers/gpu/drm/tilcdc/Kconfig:1:   symbol DRM_TILCDC depends on OF
drivers/of/Kconfig:4:   symbol OF is selected by CRYPTO_DEV_CCP_DD
drivers/crypto/ccp/Kconfig:1:   symbol CRYPTO_DEV_CCP_DD depends on CRYPTO
crypto/Kconfig:15:  symbol CRYPTO is selected by IP_SCTP
net/sctp/Kconfig:5: symbol IP_SCTP is selected by DLM
fs/dlm/Kconfig:1:   symbol DLM depends on SYSFS
fs/sysfs/Kconfig:1: symbol SYSFS is selected by AT91_ADC
drivers/iio/adc/Kconfig:110:symbol AT91_ADC depends on IIO
drivers/iio/Kconfig:5:  symbol IIO is selected by RTC_DRV_HID_SENSOR_TIME
drivers/rtc/Kconfig:1370:   symbol RTC_DRV_HID_SENSOR_TIME depends on 
USB_HID
drivers/hid/usbhid/Kconfig:4:   symbol USB_HID depends on USB
*
* Restart config...
*
*
* USB HID Boot Protocol drivers
*
USB HIDBP Keyboard (simple Boot) support (USB_KBD) [N/m/y/?] (NEW) aborted!

Console input/output is redirected. Run 'make oldconfig' to update 
configuration.


I am not sure exactly what caused all this, but it is probably commit
126ae9adc1ec ("crypto: ccp - Base AXI DMA cache settings on device
tree").

I have used the version of the crypto tree from next-20140723 for today.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

Re: [PATCH] cpufreq: Don't destroy/realloc policy/sysfs on hotplug/suspend

2014-07-23 Thread Saravana Kannan


On 07/16/2014 03:02 PM, Rafael J. Wysocki wrote:

On Wednesday, July 09, 2014 07:37:30 PM Saravana Kannan wrote:

Preliminary patch. Not tested. Just sending out to give an idea of what I'm
looking to do. Expect a lot more simplification when it's done.

Benefits:
* A lot more simpler code.
* Less stability issues.
* Suspend/resume time would improve.
* Hotplug time would improve.
* Sysfs file permissions would be maintained.
* More policy settings would be maintained across suspend/resume.
* cpufreq stats would be maintained across hotplug for all CPUs.


One problem.  The real hotplug (when the CPU actually goes away) depends on
offline removing all that stuff for it.  How are you going to address that?



Ok, I think I've figured this out. But one question. Is it possible to 
physically remove one CPU in a bunch of "related cpus" without also 
unplugging the rest? Put another way, can you unplug one core from a 
cluster?


It's not too hard to support that too, but if it's not a realistic case, 
I would rather not write code for that.


-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: Add blank line in sync.c

2014-07-23 Thread Nicholas Krause

This patch adds a blank line after line 708 as declared when
running checkpatch on this file.

Signed-off-by: Nicholas Krause 
---
 drivers/staging/android/sync.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index e7b2e02..0d37495 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -705,6 +705,7 @@ static long sync_fence_ioctl(struct file *file, unsigned 
int cmd,
 unsigned long arg)
 {
struct sync_fence *fence = file->private_data;
+
switch (cmd) {
case SYNC_IOC_WAIT:
return sync_fence_ioctl_wait(fence, arg);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: Add blank lines in sw_sync.c

2014-07-23 Thread Nicholas Krause

This adds two blank lines as stated by checkpatch before lines,
100 and 159 respectively.

Signed-off-by: Nicholas Krause 
---
 drivers/staging/android/sw_sync.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/staging/android/sw_sync.c 
b/drivers/staging/android/sw_sync.c
index a76db3f..863d4b1 100644
--- a/drivers/staging/android/sw_sync.c
+++ b/drivers/staging/android/sw_sync.c
@@ -97,6 +97,7 @@ static void sw_sync_pt_value_str(struct sync_pt *sync_pt,
   char *str, int size)
 {
struct sw_sync_pt *pt = (struct sw_sync_pt *)sync_pt;
+
snprintf(str, size, "%d", pt->value);
 }
 
@@ -156,6 +157,7 @@ static int sw_sync_open(struct inode *inode, struct file 
*file)
 static int sw_sync_release(struct inode *inode, struct file *file)
 {
struct sw_sync_timeline *obj = file->private_data;
+
sync_timeline_destroy(>obj);
return 0;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the net-next tree with the wireless tree

2014-07-23 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in
net/mac80211/cfg.c between commit fa8f136fe9a8 ("mac80211: fix crash on
getting sta info with uninitialized rate control") from the wireless
tree and commit b7ffbd7ef675 ("cfg80211: make ethtool the driver's
responsibility") from the net-next tree.

I fixed it up (the latter moved the code modified by the former, so I
removed the code and added the following patch) and can carry the fix
as necessary (no action is required).

From: Stephen Rothwell 
Date: Thu, 24 Jul 2014 12:32:49 +1000
Subject: [PATCH] mac80211: fix for code movement

Signed-off-by: Stephen Rothwell 
---
 net/mac80211/sta_info.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index f41177f58b30..c6ee2139fbc5 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1724,12 +1724,15 @@ void sta_set_sinfo(struct sta_info *sta, struct 
station_info *sinfo)
 {
struct ieee80211_sub_if_data *sdata = sta->sdata;
struct ieee80211_local *local = sdata->local;
-   struct rate_control_ref *ref = local->rate_ctrl;
+   struct rate_control_ref *ref = NULL;
struct timespec uptime;
u64 packets = 0;
u32 thr = 0;
int i, ac;
 
+   if (test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
+   ref = local->rate_ctrl;
+
sinfo->generation = sdata->local->sta_generation;
 
sinfo->filled = STATION_INFO_INACTIVE_TIME |
-- 
2.0.1

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

Re: WARNING: at kernel/cpuset.c:1139

2014-07-23 Thread Mike Qiu


On 07/24/2014 08:27 AM, Li Zefan wrote:

On 2014/7/23 23:12, Tejun Heo wrote:

On Wed, Jul 23, 2014 at 10:50:29AM +0800, Mike Qiu wrote:

commit 734d45130cb ("cpuset: update cs->effective_{cpus, mems} when config
changes") introduce the below warning in my server.

[   35.652137] [ cut here ]
[   35.652141] WARNING: at kernel/cpuset.c:1139

Hah, can you reproduce it?  If so, can you detail how?


It's a typo.

WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
nodes_equal(cp->mems_allowed, cp->effective_mems));

should be

WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
!nodes_equal(cp->mems_allowed, cp->effective_mems));


Yes, it is. This warning disappeared after this patch.

Reported-and-Tested-by: Mike Qiu 




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: Remove checkpatch error from ion.c

2014-07-23 Thread Nicholas Krause

This patch removes a checkpatch error by adding a line below
the definitions of the character array buf and the pointer path.

Signed-off-by: Nicholas Krause 
---
 drivers/staging/android/ion/ion.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/android/ion/ion.c 
b/drivers/staging/android/ion/ion.c
index 2703609..cad76ae 100644
--- a/drivers/staging/android/ion/ion.c
+++ b/drivers/staging/android/ion/ion.c
@@ -805,6 +805,7 @@ struct ion_client *ion_client_create(struct ion_device *dev,
client, _client_fops);
if (!client->debug_root) {
char buf[256], *path;
+
path = dentry_path(dev->clients_debug_root, buf, 256);
pr_err("Failed to create client debugfs at %s/%s\n",
path, client->display_name);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/16] rcu: Check for spurious wakeup using return value

2014-07-23 Thread Pranith Kumar

On Wed, Jul 23, 2014 at 8:26 AM, Paul E. McKenney
 wrote:
> On Wed, Jul 23, 2014 at 01:09:48AM -0400, Pranith Kumar wrote:
>> When the gp_kthread wakes up from the wait event, it returns 0 if the wake 
>> up is
>> due to the condition having been met. This commit checks this return value
>> for a spurious wake up before calling rcu_gp_init().
>>
>> Signed-off-by: Pranith Kumar 
>
> How does this added check help?  I don't see that it does.  If the flag
> is set, we want to wake up.  If we get a spurious wakeup, but then the
> flag gets set before we actually wake up, we still want to wake up.
>

So I took a look at the docs again, and using the return value is the
recommended way to check for spurious wakeups.

The condition in wait_event_interruptible() is checked when the task
is woken up (either due to stray signals or explicitly) and it returns
true if condition evaluates to true.

In the current scenario, if we get a spurious wakeup, we take the
costly path of checking this condition again (with a barrier and lock)
before going back to wait.

The scenario of getting an actual wakeup after getting a spurious
wakeup exists even today, this is the window after detecting a
spurious wakeup and before going back to wait. I am not sure if using
the return value enlarges that window as we are going back to sleep
immediately.

Thoughts?

-- 
Pranith
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

iommu/vt-d: Fix build error caused by unknown definition of acpi_handle

2014-07-23 Thread Jiang Liu

When both CONFIG_ACPI and CONFIG_DMAR_TABLE are disabled, commit
"Implement DMAR unit hotplug framework" causes build failure as below:
  CC  arch/x86/kernel/pci-dma.o
In file included from arch/x86/kernel/pci-dma.c:3:0:
include/linux/dmar.h:168:35: error: unknown type name ‘acpi_handle’
 static inline int dmar_device_add(acpi_handle handle)
   ^
include/linux/dmar.h:173:38: error: unknown type name ‘acpi_handle’
 static inline int dmar_device_remove(acpi_handle handle)
  ^
make[2]: *** [arch/x86/kernel/pci-dma.o] Error 1
make[1]: *** [arch/x86/kernel] Error 2
make: *** [arch/x86] Error 2

Signed-off-by: Jiang Liu 
---
Hi Joerg,
Could you please help to merge or fold this patch to fix the
build error?
Regards!
Gerry
---
 include/linux/dmar.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 9c06bb4b5b14..594d4ac79e75 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -165,12 +165,12 @@ static inline int dmar_ir_hotplug(struct dmar_drhd_unit 
*dmaru, bool insert)
 
 #else /* CONFIG_DMAR_TABLE */
 
-static inline int dmar_device_add(acpi_handle handle)
+static inline int dmar_device_add(void *handle)
 {
return 0;
 }
 
-static inline int dmar_device_remove(acpi_handle handle)
+static inline int dmar_device_remove(void *handle)
 {
return 0;
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 2/2] pwm: rockchip: Added to support for RK3288 SoC

2014-07-23 Thread caesar


Hi Heiko & thierry,

Thank you for your suggestion.

于 2014年07月24日 00:01, Heiko Stübner 写道:

Hi Caesar.

Am Mittwoch, 23. Juli 2014, 14:38:41 schrieb Caesar Wang:

This patch added to support the PWM controller found on
RK3288 SoC.

Signed-off-by: Caesar Wang 
---
  drivers/pwm/pwm-rockchip.c | 141
+++-- 1 file changed, 122
insertions(+), 19 deletions(-)

diff --git a/drivers/pwm/pwm-rockchip.c b/drivers/pwm/pwm-rockchip.c
index eec2145..8d72a98 100644
--- a/drivers/pwm/pwm-rockchip.c
+++ b/drivers/pwm/pwm-rockchip.c
@@ -2,6 +2,7 @@
   * PWM driver for Rockchip SoCs
   *
   * Copyright (C) 2014 Beniamino Galvani 
+ * Copyright (C) 2014 Caesar Wang 

you might want to check who actually holds the copyright for your
contributions, I guess a "Copyright (C) 2014 Rockchip"-something would be more
appropriate?


Yes,you are right.

   *
   * This program is free software; you can redistribute it and/or
   * modify it under the terms of the GNU General Public License
@@ -12,6 +13,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -25,17 +27,89 @@

  #define PRESCALER 2

+#define PWM_ENABLE (1 << 0)
+#define PWM_CONTINUOUS (1 << 1)
+#define PWM_DUTY_POSITIVE  (1 << 3)
+#define PWM_INACTIVE_NEGATIVE  (0 << 4)
+#define PWM_OUTPUT_LEFT(0 << 5)
+#define PWM_LP_DISABLE (0 << 8)
+
  struct rockchip_pwm_chip {
struct pwm_chip chip;
struct clk *clk;
+   const struct rockchip_pwm_data *data;
void __iomem *base;
  };

+struct rockchip_pwm_regs {
+   unsigned long duty;
+   unsigned long period;
+   unsigned long cntr;
+   unsigned long ctrl;
+};
+
+struct rockchip_pwm_data {
+   struct rockchip_pwm_regs regs;
+   unsigned int prescaler;
+
+   void (*set_enable)(struct pwm_chip *chip, bool enable);
+};
+
  static inline struct rockchip_pwm_chip *to_rockchip_pwm_chip(struct
pwm_chip *c) {
return container_of(c, struct rockchip_pwm_chip, chip);
  }

+static void rockchip_pwm_set_enable_v1(struct pwm_chip *chip, bool enable)
+{
+   struct rockchip_pwm_chip *pc = to_rockchip_pwm_chip(chip);
+   u32 val = 0;
+   u32 enable_conf = PWM_CTRL_OUTPUT_EN | PWM_CTRL_TIMER_EN;
+
+   val = readl_relaxed(pc->base + pc->data->regs.ctrl);
+
+   if (enable)
+   val |= enable_conf;
+   else
+   val &= ~enable_conf;
+
+   writel_relaxed(val, pc->base + pc->data->regs.ctrl);
+}
+
+static void rockchip_pwm_set_enable_v2(struct pwm_chip *chip, bool enable)
+{
+   struct rockchip_pwm_chip *pc = to_rockchip_pwm_chip(chip);
+   u32 val = 0;
+   u32 enable_conf = PWM_OUTPUT_LEFT | PWM_LP_DISABLE | PWM_ENABLE |
+   PWM_CONTINUOUS | PWM_DUTY_POSITIVE | PWM_INACTIVE_NEGATIVE;
+
+   val = readl_relaxed(pc->base + pc->data->regs.ctrl);
+
+   if (enable)
+   val |= enable_conf;
+   else
+   val &= ~enable_conf;
+
+   writel_relaxed(val, pc->base + pc->data->regs.ctrl);
+}
+
+static void rockchip_pwm_set_enable_vop(struct pwm_chip *chip, bool enable)
+{
+   struct rockchip_pwm_chip *pc = to_rockchip_pwm_chip(chip);
+   u32 val = 0;
+   u32 enable_conf = PWM_OUTPUT_LEFT | PWM_LP_DISABLE | PWM_ENABLE |
+   PWM_CONTINUOUS | PWM_DUTY_POSITIVE | PWM_INACTIVE_NEGATIVE;
+
+   val = readl_relaxed(pc->base + pc->data->regs.ctrl);
+
+   if (enable)
+   val |= enable_conf;
+   else
+   val &= ~enable_conf;
+
+   writel_relaxed(val, pc->base + pc->data->regs.ctrl);
+}

not sure if I'm just blind ... do rockchip_pwm_set_enable_v2 and
rockchip_pwm_set_enable_vop differ at all?

If they don't differ, I guess pwm_data_vop should just use
rockchip_pwm_set_enable_v2 instead of duplicating it.


Heiko

Yes, the rockchip_pwm_set_enable_v1 & v2 & vop is similar.

So my v2 patch use "u32 enable_conf" instead of it .
+struct rockchip_pwm_data {
> + .
> + u32 enable_conf;
> +};


The thierry has suggested it [1] in my v2 patch:

For this I think it would be more readable to provide function pointers
rather than a variable. That is:

struct rockchip_pwm_data {
...
int (*enable)(struct pwm_chip *chip, struct pwm_device *pwm);
int (*disable)(struct pwm_chip *chip, struct pwm_device *pwm);
};
Then you can implement these for each variant of the chip and call them
from the common rockchip_pwm_enable(), somewhat like this.


Perhaps,thierry's suggestion I got it wrong.

Hi thierry& Heiko :-)
Maybe,could you suggest solve it reasonable? thanks.

[1]: https://lkml.org/lkml/2014/7/21/113

+
  static int rockchip_pwm_config(struct pwm_chip *chip, struct pwm_device
*pwm, int duty_ns, int period_ns)
  {
@@ -52,20 +126,20 @@ static int rockchip_pwm_config(struct pwm_chip *chip,
struct pwm_device *pwm, * default prescaler value for all practical

Re: [RFC PATCH v1 12/70] x86, pci, amd_bus: _FROZEN Cleanup

2014-07-23 Thread Chen, Gong

On Wed, Jul 23, 2014 at 08:07:18PM +0200, Borislav Petkov wrote:
> Date: Wed, 23 Jul 2014 20:07:18 +0200
> From: Borislav Petkov 
> To: "Chen, Gong" 
> Cc: linux-kernel@vger.kernel.org, mi...@kernel.org, t...@linutronix.de,
>  pau...@samba.org, b...@kernel.crashing.org, tony.l...@intel.com,
>  h...@zytor.com, jkos...@suse.cz, rafael.j.wyso...@intel.com,
>  li...@arm.linux.org.uk, r...@linux-mips.org, schwidef...@de.ibm.com,
>  da...@davemloft.net, v...@zeniv.linux.org.uk, fweis...@gmail.com,
>  c...@linux.com, a...@linux-foundation.org, ax...@kernel.dk,
>  jbottom...@parallels.com, ne...@suse.de, christoffer.d...@linaro.org,
>  rost...@goodmis.org, r...@kernel.org, gre...@linuxfoundation.org,
>  mho...@suse.cz, da...@fromorbit.com
> Subject: Re: [RFC PATCH v1 12/70] x86, pci, amd_bus: _FROZEN Cleanup
> User-Agent: Mutt/1.5.23 (2014-03-12)
> 
> On Tue, Jul 22, 2014 at 09:58:48PM -0400, Chen, Gong wrote:
> > Remove XXX_FROZEN state from x86/pci/amd_bus.
> > 
> > Signed-off-by: Chen, Gong 
> > ---
> >  arch/x86/pci/amd_bus.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
> > index c20d2cc..30f0fca9 100644
> > --- a/arch/x86/pci/amd_bus.c
> > +++ b/arch/x86/pci/amd_bus.c
> > @@ -341,9 +341,8 @@ static int amd_cpu_notify(struct notifier_block *self, 
> > unsigned long action,
> >   void *hcpu)
> >  {
> > int cpu = (long)hcpu;
> > -   switch (action) {
> > +   switch (action & ~CPU_TASKS_FROZEN) {
> > case CPU_ONLINE:
> > -   case CPU_ONLINE_FROZEN:
> > smp_call_function_single(cpu, enable_pci_io_ecs, NULL, 0);
> > break;
> > default:
> 
> Or you can kill all the switch-case gunk and make it even more readable:
> 
But what if new action is added? We have to change it back. If you prefer
to use that style I can update in next version.


signature.asc
Description: Digital signature

Re: [RFC PATCH] PM/CPU: Parallel enabling nonboot cpus with resume devices

2014-07-23 Thread Lan Tianyu

On 2014年07月23日 18:53, Pavel Machek wrote:
> Hi!
> 
>> In the current world, all nonboot cpus are enabled serially during system
>> resume. System resume sequence is that boot cpu enables nonboot cpu one by
>> one and then resume devices. Before resuming devices, there are few tasks
>> assigned to nonboot cpus after they are brought up. This waste cpu usage.
>>
>> To accelerate S3, this patches adds a new kernel configure
>> PM_PARALLEL_CPU_UP_FOR_SUSPEND to allow boot cpu to go forward to resume
>> devices after bringing up one nonboot cpu. The nonboot cpu will be in charge
>> of bringing up other cpus. This makes enabling cpu2~x parallel with resuming
>> devices. From the test result on 4 logical core laptop, the time of resume
>> device almost wasn't affected by enabling nonboot cpus lately while the start
>> point is almost 30ms earlier than before.
> 
> Does this mean that userspace can now run seeing the "offlined" cpus
> still in offline state?

No, the PM event handler of cpu hotplug will reenable cpu hotplug via
cpu_hotplug_enable() at the end of system resume. The function will wait
for currently running cpu hotplug operations to complete.

> 
>   Pavel
> 


-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 7/9] aio: add aio_kernel_() interface

2014-07-23 Thread Ming Lei

On Thu, Jul 24, 2014 at 7:16 AM, Zach Brown  wrote:
> On Thu, Jul 24, 2014 at 06:55:28AM +0800, Ming Lei wrote:
>> From: Dave Kleikamp 
>>
>> This adds an interface that lets kernel callers submit aio iocbs without
>> going through the user space syscalls.  This lets kernel callers avoid
>> the management limits and overhead of the context.  It will also let us
>> integrate aio operations with other kernel apis that the user space
>> interface doesn't have access to.
>>
>> This patch is based on Dave's posts in below links:
>>
>>   https://lkml.org/lkml/2013/10/16/365
>>   https://groups.google.com/forum/#!topic/linux.kernel/l7mogGJZoKQ
>
> This was originally written a billion years ago when dinosaurs roamed
> the earth.  Also, notably, before Kent and Ben reworked a bunch of the

Not so far away, this patch is based on Dave's last version of V9, which
was posted in Oct, 2013, :-)

> aio core.  I'd want them to take a look at this patch to make sure that
> it doesn't rely on any assumptions that have changed.

Looks I missed to Cc Ken, :-(

>
>> +/* opcode values not exposed to user space */
>> +enum {
>> + IOCB_CMD_READ_ITER = 0x1,
>> + IOCB_CMD_WRITE_ITER = 0x10001,
>> +};
>
> And I think the consensus was that this isn't good enough.  Find a way
> to encode the kernel caller ops without polluting the uiocb cmd name
> space.

That is easy, since the two cmd names are only for kernel AIO, whatever
should be OK, but looks I didn't see such comment.

>
> (I've now come to think that this entire approach of having loop use aio
> is misguided and that the way forward is to have dio consume what loop
> naturally produces -- bios, blk-mq rqs, whatever -- but I'm on to other

Yes, that is what these patches are doing, and actually AIO's
model is a good match to driver's interface. Lots of drivers
use the asynchronous model(submit, complete, ...).

> things these days.)

At least, loop can improve its throughput much by kernel AIO
without big changes to fs/direct-io(attribute much to ITER_BVEC),
and vhost-scsi should benefit from it too.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v3] video: hyperv: hyperv_fb: refresh the VM screen by force on VM panic

2014-07-23 Thread Dexuan Cui

Ping again -- any one has further comment for the v3 version?

It looks the framebuffer layer's Tomi and Jean-Christophe are out recently?
Recently I don't see mail from them since Jul 1 and Jul 8 in the lists.

Though the patch belongs to driver/video/fbdev/, it only changes hyper-v stuff
and only changes one file and the patch itself is straightforward IMO.

So, hi Greg and all, 
If you think the patch itself is OK, may I know if it's OK for the patch to go
into Greg's char-misc.git tree, as some other hyper-v patches did?

Please let me know what's the next step I should do. 
I appreciate your reply.

Thanks!

-- Dexuan

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Dexuan Cui
> Sent: Wednesday, July 9, 2014 11:04 AM
> To: gre...@linuxfoundation.org; dan.carpen...@oracle.com; linux-
> ker...@vger.kernel.org; driverdev-de...@linuxdriverproject.org;
> plagn...@jcrosoft.com; tomi.valkei...@ti.com; linux-fb...@vger.kernel.org
> Cc: o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; KY
> Srinivasan; Haiyang Zhang
> Subject: [PATCH v3] video: hyperv: hyperv_fb: refresh the VM screen by force
> on VM panic
> 
> Currently the VSC has no chance to notify the VSP of the dirty rectangle on
> VM
> panic because the notification work is done in a workqueue, and in panic()
> the
> kernel typically ends up in an infinite loop, and a typical kernel config has
> CONFIG_PREEMPT_VOLUNTARY=y and CONFIG_PREEMPT is not set, so a
> context switch
> can't happen in panic() and the workqueue won't have a chance to run. As a
> result, the VM Connection window can't refresh until it's closed and we
> re-connect to the VM.
> 
> We can register a handler on panic_notifier_list: the handler can notify
> the VSC and switch the framebuffer driver to a "synchronous mode",
> meaning
> the VSC flushes any future framebuffer change to the VSP immediately.
> 
> v2: removed the MS-TFS line in the commit message
> v3: remove some 'unlikely' markings
> 
> Signed-off-by: Dexuan Cui 
> Reviewed-by: Haiyang Zhang 
> ---
>  drivers/video/fbdev/hyperv_fb.c | 58
> ++---
>  1 file changed, 55 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/video/fbdev/hyperv_fb.c
> b/drivers/video/fbdev/hyperv_fb.c
> index e23392e..a7b98e1 100644
> --- a/drivers/video/fbdev/hyperv_fb.c
> +++ b/drivers/video/fbdev/hyperv_fb.c
> @@ -226,11 +226,16 @@ struct hvfb_par {
>   u8 recv_buf[MAX_VMBUS_PKT_SIZE];
>  };
> 
> +static struct fb_info *hvfb_info;
> +
>  static uint screen_width = HVFB_WIDTH;
>  static uint screen_height = HVFB_HEIGHT;
>  static uint screen_depth;
>  static uint screen_fb_size;
> 
> +/* If true, the VSC notifies the VSP on every framebuffer change */
> +static bool synchronous_fb;
> +
>  /* Send message to Hyper-V host */
>  static inline int synthvid_send(struct hv_device *hdev,
>   struct synthvid_msg *msg)
> @@ -532,6 +537,20 @@ static void hvfb_update_work(struct work_struct *w)
>   schedule_delayed_work(>dwork,
> HVFB_UPDATE_DELAY);
>  }
> 
> +static int hvfb_on_panic(struct notifier_block *nb,
> + unsigned long e, void *p)
> +{
> + if (hvfb_info)
> + synthvid_update(hvfb_info);
> +
> + synchronous_fb = true;
> +
> + return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block hvfb_panic_nb = {
> + .notifier_call = hvfb_on_panic,
> +};
> 
>  /* Framebuffer operation handlers */
> 
> @@ -582,14 +601,41 @@ static int hvfb_blank(int blank, struct fb_info *info)
>   return 1;   /* get fb_blank to set the colormap to all black */
>  }
> 
> +static void hvfb_cfb_fillrect(struct fb_info *p,
> + const struct fb_fillrect *rect)
> +{
> + cfb_fillrect(p, rect);
> +
> + if (synchronous_fb)
> + synthvid_update(p);
> +}
> +
> +static void hvfb_cfb_copyarea(struct fb_info *p,
> + const struct fb_copyarea *area)
> +{
> + cfb_copyarea(p, area);
> +
> + if (synchronous_fb)
> + synthvid_update(p);
> +}
> +
> +static void hvfb_cfb_imageblit(struct fb_info *p,
> + const struct fb_image *image)
> +{
> + cfb_imageblit(p, image);
> +
> + if (synchronous_fb)
> + synthvid_update(p);
> +}
> +
>  static struct fb_ops hvfb_ops = {
>   .owner = THIS_MODULE,
>   .fb_check_var = hvfb_check_var,
>   .fb_set_par = hvfb_set_par,
>   .fb_setcolreg = hvfb_setcolreg,
> - .fb_fillrect = cfb_fillrect,
> - .fb_copyarea = cfb_copyarea,
> - .fb_imageblit = cfb_imageblit,
> + .fb_fillrect = hvfb_cfb_fillrect,
> + .fb_copyarea = hvfb_cfb_copyarea,
> + .fb_imageblit = hvfb_cfb_imageblit,
>   .fb_blank = hvfb_blank,
>  };
> 
> @@ -801,6 +847,9 @@ static int hvfb_probe(struct hv_device *hdev,
> 
>   par->fb_ready = true;
> 
> + hvfb_info = info;
> +

Re: [PATCH 3.8 076/116] xfs: ioctl check for capabilities in the current user namespace

2014-07-23 Thread Eric W. Biederman

Kamal Mostafa  writes:

> On Wed, 2014-07-23 at 09:12 +1000, Dave Chinner wrote:
>> On Tue, Jul 22, 2014 at 03:21:27PM -0700, Kamal Mostafa wrote:
>> > 3.8.13.27 -stable review patch.  If anyone has any objections, please let 
>> > me know.
>> > 
>> > --
>> > 
>> > From: Dwight Engen 
>> > 
>> > commit fd5e2aa8653665ae1cc60f7aca1069abdbcad3f6 upstream.
>> > 
>> > Use inode_capable() to check if SUID|SGID bits should be cleared to match
>> > [...]
>> 
>> Why are you backporting this to 3.8? namespace support didn't come
>> along until much later, so grabbing one patch out of themiddle of a
>> patch series to allow userns support in XFS is likely to cause
>> problems because there's no supporting code in XFS it.
>> 
>> Please don't randomly cherry pick userns support patches that change
>> permission checks back into kernels that don't have userns support.
>
> Yup, that's why we ask for reviews all right!  I've dropped these from
> the 3.8-stable queue:
>
> fs,userns: Change inode_capable to capable_wrt_inode_uidgid

The fs,userns: Chage inode_capable to capable_wrt_inode_uidgid is
appropriate for 3.8.  I think that one is applicable all of the way
back to 3.4

I don't know if xfs in 3.8 called inode_capable in xfs and if it didn't
you can remove that hunk.  To keep things very simple you could just
skip the rename of inode_capable to capable_wrt_uidgid and just
include the one line change to add kgid_has_mapping.

But that bug fix is very much applicable to older kernels.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.16-rc6

2014-07-23 Thread David Rientjes

On Wed, 23 Jul 2014, Linus Torvalds wrote:

> > Well, it looks like we f*cked up something after -rc5 since I'm starting
> > to see lockdep splats all over the place which I didn't see before. I'm
> > running rc6 + tip/master.
> >
> > There was one in r8169 yesterday:
> >
> > https://lkml.kernel.org/r/20140722081840.ga6...@pd.tnic
> >
> > and now I'm seeing the following in a kvm guest. I'm adding some more
> > lists to CC which look like might be related, judging from the stack
> > traces.
> 
> Hmm. I'm not seeing the reason for this.
> 
> > [   31.704282] [ INFO: possible irq lock inversion dependency detected ]
> > [   31.704282] 3.16.0-rc6+ #1 Not tainted
> > [   31.704282] -
> > [   31.704282] Xorg/3484 just changed the state of lock:
> > [   31.704282]  (tasklist_lock){.?.+..}, at: [] 
> > send_sigio+0x59/0x1b0
> > [   31.704282] but this lock took another, HARDIRQ-unsafe lock in the past:
> > [   31.704282]  (&(>alloc_lock)->rlock){+.+...}
> 
> Ok, so the claim is that there's a 'p->alloc_lock' (ie "task_lock()")
> that is inside the tasklist_lock, which would indeed be wrong. But I'm
> not seeing it. The "shortest dependencies" thing seems to imply
> __set_task_comm(), but that only takes task_lock.
> 

It's the reverse, task_lock() inside tasklist_lock is fine but it's 
complaining about taking tasklist_lock inside task_lock().

I don't think it's anything that's sitting in tip/master nor is it 
something that was introduced during this merge window.  I think this has 
been the behavior dating back to commit 94dfd7edfd5c ("USB: HCD: support 
giveback of URB in tasklet context").
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Input: synaptics-rmi4 - fix compiler warnings in F11

2014-07-23 Thread Christopher Heiny


On 07/22/2014 11:11 PM, Dmitry Torokhov wrote:

Signed-off-by: Dmitry Torokhov 


I've reviewed this, and can say:

Acked-by: Christopher Heiny 

but I haven't had a chance to apply it to my build tree.

Andrew - I'll be OOO for a couple of days.  Can you do that, and add a 
Tested-by: or rev the patch, as appropriate?


Thanks,
Chris


---
  drivers/input/rmi4/rmi_f11.c | 135 +++
  1 file changed, 71 insertions(+), 64 deletions(-)

diff --git a/drivers/input/rmi4/rmi_f11.c b/drivers/input/rmi4/rmi_f11.c
index b739d31..7af4f68 100644
--- a/drivers/input/rmi4/rmi_f11.c
+++ b/drivers/input/rmi4/rmi_f11.c
@@ -553,7 +553,7 @@ struct f11_data {
unsigned long *result_bits;
  };

-enum finger_state_values {
+enum f11_finger_state {
F11_NO_FINGER   = 0x00,
F11_PRESENT = 0x01,
F11_INACCURATE  = 0x02,
@@ -563,12 +563,14 @@ enum finger_state_values {
  /** F11_INACCURATE state is overloaded to indicate pen present. */
  #define F11_PEN F11_INACCURATE

-static int get_tool_type(struct f11_2d_sensor *sensor, u8 finger_state)
+static int rmi_f11_get_tool_type(struct f11_2d_sensor *sensor,
+enum f11_finger_state finger_state)
  {
if (IS_ENABLED(CONFIG_RMI4_F11_PEN) &&
sensor->sens_query.has_pen &&
finger_state == F11_PEN)
return MT_TOOL_PEN;
+
return MT_TOOL_FINGER;
  }

@@ -603,36 +605,32 @@ static void rmi_f11_rel_pos_report(struct f11_2d_sensor 
*sensor, u8 n_finger)

  static void rmi_f11_abs_pos_report(struct f11_data *f11,
   struct f11_2d_sensor *sensor,
-  u8 finger_state, u8 n_finger)
+  enum f11_finger_state finger_state,
+  u8 n_finger)
  {
struct f11_2d_data *data = >data;
+   struct input_dev *input = sensor->input;
struct rmi_f11_2d_axis_alignment *axis_align = >axis_align;
+   u8 *pos_data = >abs_pos[n_finger * RMI_F11_ABS_BYTES];
u16 x, y, z;
int w_x, w_y, w_max, w_min, orient;
-   int temp;
-   u8 abs_base = n_finger * RMI_F11_ABS_BYTES;
+   int tool_type = rmi_f11_get_tool_type(sensor, finger_state);
+
+   if (sensor->type_a) {
+   input_report_abs(input, ABS_MT_TRACKING_ID, n_finger);
+   input_report_abs(input, ABS_MT_TOOL_TYPE, tool_type);
+   } else {
+   input_mt_slot(input, n_finger);
+   input_mt_report_slot_state(input, tool_type,
+  finger_state != F11_NO_FINGER);
+   }

if (finger_state) {
-   x = (data->abs_pos[abs_base] << 4) |
-   (data->abs_pos[abs_base + 2] & 0x0F);
-   y = (data->abs_pos[abs_base + 1] << 4) |
-   (data->abs_pos[abs_base + 2] >> 4);
-   w_x = data->abs_pos[abs_base + 3] & 0x0F;
-   w_y = data->abs_pos[abs_base + 3] >> 4;
-   w_max = max(w_x, w_y);
-   w_min = min(w_x, w_y);
-   z = data->abs_pos[abs_base + 4];
-
-   if (axis_align->swap_axes) {
-   temp = x;
-   x = y;
-   y = temp;
-   temp = w_x;
-   w_x = w_y;
-   w_y = temp;
-   }
+   x = (pos_data[0] << 4) | (pos_data[2] & 0x0F);
+   y = (pos_data[1] << 4) | (pos_data[2] >> 4);

-   orient = w_x > w_y ? 1 : 0;
+   if (axis_align->swap_axes)
+   swap(x, y);

if (axis_align->flip_x)
x = max(sensor->max_x - x, 0);
@@ -641,13 +639,13 @@ static void rmi_f11_abs_pos_report(struct f11_data *f11,
y = max(sensor->max_y - y, 0);

/*
-   * here checking if X offset or y offset are specified is
-   *  redundant.  We just add the offsets or, clip the values
-   *
-   * note: offsets need to be done before clipping occurs,
-   * or we could get funny values that are outside
-   * clipping boundaries.
-   */
+* Here checking if X offset or y offset are specified is
+* redundant. We just add the offsets or clip the values.
+*
+* Note: offsets need to be applied before clipping occurs,
+* or we could get funny values that are outside of
+* clipping boundaries.
+*/
x += axis_align->offset_x;
y += axis_align->offset_y;
x =  max(axis_align->clip_x_low, x);
@@ -657,41 +655,44 @@ static void rmi_f11_abs_pos_report(struct f11_data *f11,

Re: Scheduler regression from caffcdd8d27ba78730d5540396ce72ad022aff2c

2014-07-23 Thread Bruno Wolff III


On Wed, Jul 23, 2014 at 17:11:40 +0200,
 Peter Zijlstra  wrote:


OK, so that's become the below patch. I'll feed it to Ingo if that's OK
with hpa.


I tested this patch on 3 machines and it continued to fix the one that 
was broken and didn't seem to break anything on the two that weren't 
broken.


Thanks for developing this patch so quickly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] crypto: Add Allwinner Security System crypto accelerator

2014-07-23 Thread Herbert Xu

On Wed, Jul 23, 2014 at 09:38:35PM +0200, Marek Vasut wrote:
> On Wednesday, July 23, 2014 at 08:52:12 PM, Corentin LABBE wrote:
> > Le 23/07/2014 17:51, Marek Vasut a écrit :
> > > On Wednesday, July 23, 2014 at 04:13:09 PM, Herbert Xu wrote:
> > >> On Wed, Jul 23, 2014 at 04:07:20PM +0200, Marek Vasut wrote:
> > >>> On Wednesday, July 23, 2014 at 03:57:35 PM, Herbert Xu wrote:
> >  On Sat, May 24, 2014 at 02:00:03PM +0200, Marek Vasut wrote:
> > >> +}
> > >> +#endif
> > >> +
> > >> +#ifdef CONFIG_CRYPTO_DEV_SUNXI_SS_MD5
> > >> +err = crypto_register_shash(_md5_alg);
> > > 
> > > Do not use shash for such device. This is clearly and ahash (and
> > > async in general) device. The rule of a thumb here is that you use
> > > sync algos only for devices which have dedicated instructions for
> > > computing the transformation. For devices which are attached to some
> > > kind of bus, you use async algos (ahash etc).
> >  
> >  I'm sorry that I didn't catch this earlier but there is no such
> >  rule.
> >  
> >  Unless you need the async interface you should stick to the sync
> >  interfaces for the sake of simplicity.
> >  
> >  We have a number of existing drivers that are synchronous but
> >  using the async interface.  They should either be converted
> >  over to the sync interface or made interrupt-driven if possible.
> > >>> 
> > >>> Sure, but this device is interrupt driven and uses DMA to feed the
> > >>> crypto engine, therefore async, right ?
> > >> 
> > >> If it's interrupt-driven, then yes it would certainly make sense to
> > >> be async.  But all I see is polling in the latest posting, was the
> > >> first version different?
> > > 
> > > I stand corrected then, sorry.
> > > 
> > > Is it possible to use DMA to feed the crypto accelerator, Corentin?
> > > 
> > > Best regards,
> > > Marek Vasut
> > 
> > Yes, DMA is possible and will be implemented soon.
> > So if I have well understood, I keep using async interface.
> 
> Yeah, in this case, using DMA and async interface combo is the way to go.

OK fair enough.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v8 05/22] Add vm_replace_mixed()

2014-07-23 Thread Zhang, Tianfei

It is double page_table_lock issue, should be free-and-realloc will be simple 
and readability?

+   if (!pte_none(*pte)) {
+   if (!replace)
+   goto out_unlock;
+   
VM_BUG_ON(!mutex_is_locked(>vm_file->f_mapping->i_mmap_mutex));
+  pte_unmap_unlock(pte, ptl);
+   zap_page_range_single(vma, addr, PAGE_SIZE, NULL);
+  pte = get_locked_pte(mm, addr, );
+   }

Best,
Figo

> -Original Message-
> From: owner-linux...@kvack.org [mailto:owner-linux...@kvack.org] On
> Behalf Of Kirill A. Shutemov
> Sent: Wednesday, July 23, 2014 11:55 PM
> To: Matthew Wilcox
> Cc: Wilcox, Matthew R; linux-fsde...@vger.kernel.org; linux...@kvack.org;
> linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v8 05/22] Add vm_replace_mixed()
> 
> On Wed, Jul 23, 2014 at 10:27:45AM -0400, Matthew Wilcox wrote:
> > On Wed, Jul 23, 2014 at 05:20:48PM +0300, Kirill A. Shutemov wrote:
> > > On Wed, Jul 23, 2014 at 09:52:22AM -0400, Matthew Wilcox wrote:
> > > > I'd love to use a lighter-weight weapon!  What would you recommend
> > > > using, zap_pte_range()?
> > >
> > > The most straight-forward way: extract body of pte cycle from
> > > zap_pte_range() to separate function -- zap_pte() -- and use it.
> >
> > OK, I can do that.  What about the other parts of zap_page_range(), do
> > I need to call them?
> >
> > lru_add_drain();
> 
> No, I guess..
> 
> > tlb_gather_mmu(, mm, address, end);
> > tlb_finish_mmu(, address, end);
> 
> New zap_pte() should tolerate tlb == NULL and does flush_tlb_page() or
> pte_clear_*flush or something.
> 
> > update_hiwater_rss(mm);
> 
> No: you cannot end up with lower rss after replace, iiuc.
> 
> > mmu_notifier_invalidate_range_start(mm, address, end);
> > mmu_notifier_invalidate_range_end(mm, address, end);
> 
> mmu_notifier_invalidate_page() should be enough.
> 
> > > > if ((fd = open(argv[1], O_CREAT|O_RDWR, 0666)) < 0) {
> > > > perror(argv[1]);
> > > > exit(1);
> > > > }
> > > >
> > > > if (ftruncate(fd, 4096) < 0) {
> > >
> > > Shouldn't this be ftruncate(fd, 0)? Otherwise the memcpy() below
> > > will fault in page from backing storage, not hole and write will not
> > > replace anything.
> >
> > Ah, it was starting with a new file, hence the O_CREAT up above.
> 
> Do you mean you pointed to new file all the time? O_CREAT doesn't truncate
> file if it exists, iirc.
> 
> --
>  Kirill A. Shutemov
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to
> majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject: [PATCH 1/1] mtd:nand:fix nand_lock/unlock() function

2014-07-23 Thread Brian Norris

On Wed, Jul 23, 2014 at 06:27:30PM -0700, Brian Norris wrote:
> Hi White,
> 
> On Thu, Jul 24, 2014 at 01:00:01AM +, bpqw wrote:
[...]

And...I just got a rejection email from micron.com:

  The error that the other server returned was:
  550 5.1.1 ... ... User unknown

I'm not sure how to get a reply to the submitter now. Perhaps he reads
LKML.

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RESEND PATCH 4/4] aio: use iovec array rather than the single one

2014-07-23 Thread Gu Zheng

Hi Jeff,
On 07/23/2014 09:25 PM, Jeff Moyer wrote:

> Gu Zheng  writes:
> 
>> Previously, we only offer a single iovec to handle all the read/write cases, 
>> so
>> the PREADV/PWRITEV request always need to alloc more iovec buffer when 
>> copying
>> user vectors.
>> If we use a tmp iovec array rather than the single one, some small 
>> PREADV/PWRITEV
>> workloads(vector size small than the tmp buffer) will not need to alloc more
>> iovec buffer when copying user vectors.
> 
> Hi, Gu,
> 
> This still doesn't explain why you decided to look into this.

The comment is clear, just want to avoid some needless memory allocation in
the io submit path.

> Did you
> notice a performance issue in this path?  Do you have benchmarks that
> show some speedup due to this change?

Just some common tests based on fio, it gains a slight improvement(~3%) when the
iodepth in [5,6,7] than before. I did not paste these info here, because I think
other guys(especially the guys have production environment) who are interested
in this can give us more meaningful feedback.

Thanks,
Gu

> 
> Thanks,
> Jeff
> 
>>
>> Reviewed-by: Jeff Moyer 
>> Signed-off-by: Gu Zheng 
>> ---
>>  fs/aio.c |   10 +-
>>  1 files changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/aio.c b/fs/aio.c
>> index f1fede2..df3491a 100644
>> --- a/fs/aio.c
>> +++ b/fs/aio.c
>> @@ -1267,12 +1267,12 @@ static ssize_t aio_setup_vectored_rw(struct kiocb 
>> *kiocb,
>>  if (compat)
>>  ret = compat_rw_copy_check_uvector(rw,
>>  (struct compat_iovec __user *)buf,
>> -*nr_segs, 1, *iovec, iovec);
>> +*nr_segs, UIO_FASTIOV, *iovec, iovec);
>>  else
>>  #endif
>>  ret = rw_copy_check_uvector(rw,
>>  (struct iovec __user *)buf,
>> -*nr_segs, 1, *iovec, iovec);
>> +*nr_segs, UIO_FASTIOV, *iovec, iovec);
>>  if (ret < 0)
>>  return ret;
>>  
>> @@ -1309,7 +1309,7 @@ static ssize_t aio_run_iocb(struct kiocb *req, 
>> unsigned opcode,
>>  fmode_t mode;
>>  aio_rw_op *rw_op;
>>  rw_iter_op *iter_op;
>> -struct iovec inline_vec, *iovec = _vec;
>> +struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
>>  struct iov_iter iter;
>>  
>>  switch (opcode) {
>> @@ -1344,7 +1344,7 @@ rw_common:
>>  if (!ret)
>>  ret = rw_verify_area(rw, file, >ki_pos, 
>> req->ki_nbytes);
>>  if (ret < 0) {
>> -if (iovec != _vec)
>> +if (iovec != inline_vecs)
>>  kfree(iovec);
>>  return ret;
>>  }
>> @@ -1391,7 +1391,7 @@ rw_common:
>>  return -EINVAL;
>>  }
>>  
>> -if (iovec != _vec)
>> +if (iovec != inline_vecs)
>>  kfree(iovec);
>>  
>>  if (ret != -EIOCBQUEUED) {
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables

2014-07-23 Thread Ren, Qiaowei



On 2014-07-24, Hansen, Dave wrote:
> On 07/23/2014 05:49 PM, Ren, Qiaowei wrote:
>> I can check a lot of debug information when one VMA and related
>> bounds tables are allocated and freed through adding a lot of
>> print() like log into kernel/runtime. Do you think this is enough?
> 
> I thought the entire reason we grabbed a VM_ flag was to make it
> possible to figure out without resorting to this.

All cleanup work certainly depends on this VM_ flag. In addition, as we 
discussed, this new VM_ flag can mainly have runtime know how much memory is 
occupied by MPX.

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Subject: [PATCH 1/1] mtd:nand:fix nand_lock/unlock() function

2014-07-23 Thread Brian Norris

Hi White,

On Thu, Jul 24, 2014 at 01:00:01AM +, bpqw wrote:
> Do nand reset before write protect check
> If we want to check the WP# low or high through STATUS READ and check bit 7,
> we must reset the device, other operation (eg.erase/program a locked block) 
> can
> also clear the bit 7 of status register.
> 
> Signed-off-by: White Ding 
> ---
>  drivers/mtd/nand/nand_base.c |   18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index 41167e9..22dd3aa 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -965,6 +965,15 @@ int nand_unlock(struct mtd_info *mtd, loff_t ofs, 
> uint64_t len)
[...]
> @@ -1015,6 +1024,15 @@ int nand_lock(struct mtd_info *mtd, loff_t ofs, 
> uint64_t len)
[...]

I don't see any in-tree users of nand_{un,}lock(). I recently caught a
bug in nand_lock() via inspection (still need to send a fix), but I was
considering dropping the functions entirely.

I presume you have some out-of-tree driver that uses these functions,
then?

Brian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] staging: cxt1e1: Prefix ambiguous variable names with 'cxt1e1_' for clarity

2014-07-23 Thread Greg KH

On Tue, Jul 22, 2014 at 08:34:55PM -0400, Jeff Oczek wrote:
> Changed names of ambiguous sounding variable names as follows
> 
> error_flag  ->  cxt1e1_error_flag
> max_mtu_default ->  cxt1e1_max_mtu_default
> max_txdesc_used ->  cxt1e1_max_txdesc_used
> max_txdesc_default  ->  cxt1e1_max_txdesc_default
> max_rxdesc_used ->  cxt1e1_max_rxdesc_used
> max_rxdesc_default  ->  cxt1e1_max_rxdesc_default
> 
> Since max_txdesc_used, max_rxdesc_used are module parameters, these were
> changed from global to static and the module init function assigns the values
> to the newly named global variables
> 
> Signed-off-by: Jeff Oczek 
> ---
>  drivers/staging/cxt1e1/hwprobe.c   |  7 ++---
>  drivers/staging/cxt1e1/linux.c | 53 
> +-
>  drivers/staging/cxt1e1/musycc.c|  4 +--
>  drivers/staging/cxt1e1/pmcc4_drv.c | 22 +---
>  drivers/staging/cxt1e1/sbeproc.c   |  6 ++---
>  5 files changed, 51 insertions(+), 41 deletions(-)

This driver isn't even in my kernel tree anymore, so how can I apply it?

What kernel branch/version did you make it against?  Please always work
against linux-next, or my staging-next of my staging.git kernel tree
when sending patches.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging: lustre: obdclass: fix sparse warnings about static declaration

2014-07-23 Thread Greg KH

On Wed, Jul 23, 2014 at 07:23:43PM +0400, Andrey Skvortsov wrote:
> Signed-off-by: Andrey Skvortsov 
> ---
>  .../lustre/lustre/obdclass/linux/linux-sysctl.c|   28 
> ++--
>  1 file changed, 14 insertions(+), 14 deletions(-)

You obviously did not build with your patch applied, otherwise you would
not have sent it out :(

Please _ALWAYS_ test your patches, or end up with grumpy maintainers
when you break their build...

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 3/3] mm, oom: rename zonelist locking functions

2014-07-23 Thread David Rientjes

try_set_zonelist_oom() and clear_zonelist_oom() are not named properly to imply 
that they require locking semantics to avoid out_of_memory() being reordered.

zone_scan_lock is required for both functions to ensure that there is proper 
locking synchronization.

Rename try_set_zonelist_oom() to oom_zonelist_trylock() and rename 
clear_zonelist_oom() to oom_zonelist_unlock() to imply there is proper locking 
semantics.

At the same time, convert oom_zonelist_trylock() to return bool instead of int 
since only success and failure are tested.

Signed-off-by: David Rientjes 
---
 include/linux/oom.h |  4 ++--
 mm/oom_kill.c   | 30 +-
 mm/page_alloc.c |  6 +++---
 3 files changed, 18 insertions(+), 22 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -55,8 +55,8 @@ extern void oom_kill_process(struct task_struct *p, gfp_t 
gfp_mask, int order,
 struct mem_cgroup *memcg, nodemask_t *nodemask,
 const char *message);
 
-extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
-extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
+extern bool oom_zonelist_trylock(struct zonelist *zonelist, gfp_t gfp_flags);
+extern void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_flags);
 
 extern void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
   int order, const nodemask_t *nodemask);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -557,28 +557,25 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier);
  * if a parallel OOM killing is already taking place that includes a zone in
  * the zonelist.  Otherwise, locks all zones in the zonelist and returns 1.
  */
-int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_mask)
+bool oom_zonelist_trylock(struct zonelist *zonelist, gfp_t gfp_mask)
 {
struct zoneref *z;
struct zone *zone;
-   int ret = 1;
+   bool ret = true;
 
spin_lock(_scan_lock);
-   for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) {
+   for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask))
if (zone_is_oom_locked(zone)) {
-   ret = 0;
+   ret = false;
goto out;
}
-   }
 
-   for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) {
-   /*
-* Lock each zone in the zonelist under zone_scan_lock so a
-* parallel invocation of try_set_zonelist_oom() doesn't succeed
-* when it shouldn't.
-*/
+   /*
+* Lock each zone in the zonelist under zone_scan_lock so a parallel
+* call to oom_zonelist_trylock() doesn't succeed when it shouldn't.
+*/
+   for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask))
zone_set_flag(zone, ZONE_OOM_LOCKED);
-   }
 
 out:
spin_unlock(_scan_lock);
@@ -590,15 +587,14 @@ out:
  * allocation attempts with zonelists containing them may now recall the OOM
  * killer, if necessary.
  */
-void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_mask)
+void oom_zonelist_unlock(struct zonelist *zonelist, gfp_t gfp_mask)
 {
struct zoneref *z;
struct zone *zone;
 
spin_lock(_scan_lock);
-   for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) {
+   for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask))
zone_clear_flag(zone, ZONE_OOM_LOCKED);
-   }
spin_unlock(_scan_lock);
 }
 
@@ -693,8 +689,8 @@ void pagefault_out_of_memory(void)
return;
 
zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
-   if (try_set_zonelist_oom(zonelist, GFP_KERNEL)) {
+   if (oom_zonelist_trylock(zonelist, GFP_KERNEL)) {
out_of_memory(zonelist, 0, 0, NULL, false);
-   clear_zonelist_oom(zonelist, GFP_KERNEL);
+   oom_zonelist_unlock(zonelist, GFP_KERNEL);
}
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2201,8 +2201,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 {
struct page *page;
 
-   /* Acquire the OOM killer lock for the zones in zonelist */
-   if (!try_set_zonelist_oom(zonelist, gfp_mask)) {
+   /* Acquire the per-zone oom lock for each zone */
+   if (!oom_zonelist_trylock(zonelist, gfp_mask)) {
schedule_timeout_uninterruptible(1);
return NULL;
}
@@ -2240,7 +2240,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
out_of_memory(zonelist, gfp_mask, order, nodemask, false);
 
 out:
-   clear_zonelist_oom(zonelist, gfp_mask);
+   oom_zonelist_unlock(zonelist, gfp_mask);
return page;
 }

[patch 2/3] mm, oom: remove unnecessary check for NULL zonelist

2014-07-23 Thread David Rientjes

If the pagefault handler is modified to pass a non-NULL zonelist then an 
unnecessary check for a NULL zonelist in constrained_alloc() can be removed.

Signed-off-by: David Rientjes 
---
 mm/oom_kill.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -208,8 +208,6 @@ static enum oom_constraint constrained_alloc(struct 
zonelist *zonelist,
/* Default to all available memory */
*totalpages = totalram_pages + total_swap_pages;
 
-   if (!zonelist)
-   return CONSTRAINT_NONE;
/*
 * Reach here only when __GFP_NOFAIL is used. So, we should avoid
 * to kill current.We have to random task kill in this case.
@@ -696,7 +694,7 @@ void pagefault_out_of_memory(void)
 
zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
if (try_set_zonelist_oom(zonelist, GFP_KERNEL)) {
-   out_of_memory(NULL, 0, 0, NULL, false);
+   out_of_memory(zonelist, 0, 0, NULL, false);
clear_zonelist_oom(zonelist, GFP_KERNEL);
}
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch 1/3] mm, oom: ensure memoryless node zonelist always includes zones

2014-07-23 Thread David Rientjes

With memoryless node support being worked on, it's possible that for 
optimizations that a node may not have a non-NULL zonelist.  When CONFIG_NUMA 
is 
enabled and node 0 is memoryless, this means the zonelist for first_online_node 
may become NULL.

The oom killer requires a zonelist that includes all memory zones for the sysrq 
trigger and pagefault out of memory handler.

Ensure that a non-NULL zonelist is always passed to the oom killer.

Signed-off-by: David Rientjes 
---
 drivers/tty/sysrq.c  |  2 +-
 include/linux/nodemask.h | 10 +-
 mm/oom_kill.c|  2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -355,7 +355,7 @@ static struct sysrq_key_op sysrq_term_op = {
 
 static void moom_callback(struct work_struct *ignored)
 {
-   out_of_memory(node_zonelist(first_online_node, GFP_KERNEL), GFP_KERNEL,
+   out_of_memory(node_zonelist(first_memory_node, GFP_KERNEL), GFP_KERNEL,
  0, NULL, true);
 }
 
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -430,7 +430,15 @@ static inline int num_node_state(enum node_states state)
for_each_node_mask((__node), node_states[__state])
 
 #define first_online_node  first_node(node_states[N_ONLINE])
-#define next_online_node(nid)  next_node((nid), node_states[N_ONLINE])
+#define first_memory_node  first_node(node_states[N_MEMORY])
+static inline int next_online_node(int nid)
+{
+   return next_node(nid, node_states[N_ONLINE]);
+}
+static inline int next_memory_node(int nid)
+{
+   return next_node(nid, node_states[N_MEMORY]);
+}
 
 extern int nr_node_ids;
 extern int nr_online_nodes;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -694,7 +694,7 @@ void pagefault_out_of_memory(void)
if (mem_cgroup_oom_synchronize(true))
return;
 
-   zonelist = node_zonelist(first_online_node, GFP_KERNEL);
+   zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
if (try_set_zonelist_oom(zonelist, GFP_KERNEL)) {
out_of_memory(NULL, 0, 0, NULL, false);
clear_zonelist_oom(zonelist, GFP_KERNEL);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables

2014-07-23 Thread Dave Hansen

On 07/23/2014 05:49 PM, Ren, Qiaowei wrote:
> I can check a lot of debug information when one VMA and related
> bounds tables are allocated and freed through adding a lot of print()
> like log into kernel/runtime. Do you think this is enough?

I thought the entire reason we grabbed a VM_ flag was to make it
possible to figure out without resorting to this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Subject: [PATCH 1/1] mtd:nand:fix nand_lock/unlock() function

2014-07-23 Thread bpqw

Do nand reset before write protect check
If we want to check the WP# low or high through STATUS READ and check bit 7,
we must reset the device, other operation (eg.erase/program a locked block) can
also clear the bit 7 of status register.

Signed-off-by: White Ding 
---
 drivers/mtd/nand/nand_base.c |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index 41167e9..22dd3aa 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -965,6 +965,15 @@ int nand_unlock(struct mtd_info *mtd, loff_t ofs, uint64_t 
len)
 
chip->select_chip(mtd, chipnr);
 
+   /*
+* Reset the chip.
+* If we want to check the WP through READ STATUS and check the bit 7
+* we must reset the chip
+* some operation can also clear the bit 7 of status register
+* eg. erase/program a locked block
+*/
+   chip->cmdfunc(mtd, NAND_CMD_RESET, -1, -1);
+
/* Check, if it is write protected */
if (nand_check_wp(mtd)) {
pr_debug("%s: device is write protected!\n",
@@ -1015,6 +1024,15 @@ int nand_lock(struct mtd_info *mtd, loff_t ofs, uint64_t 
len)
 
chip->select_chip(mtd, chipnr);
 
+   /*
+* Reset the chip.
+* If we want to check the WP through READ STATUS and check the bit 7
+* we must reset the chip
+* some operation can also clear the bit 7 of status register
+* eg. erase/program a locked block
+*/
+   chip->cmdfunc(mtd, NAND_CMD_RESET, -1, -1);
+
/* Check, if it is write protected */
if (nand_check_wp(mtd)) {
pr_debug("%s: device is write protected!\n",
-- 
1.7.9.5

Br
White Ding 

EBU APAC Application Engineering
Tel:86-21-38997078
Mobile: 86-13761729112
Address: No 601 Fasai Rd, Waigaoqiao Free Trade Zone Pudong, Shanghai, China

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] irqchip: add keystone irq controller ip driver

2014-07-23 Thread Varka Bhadram



On Wednesday 23 July 2014 11:31 PM, Grygorii Strashko wrote:

Hi,

On 07/23/2014 06:32 PM, Varka Bhadram wrote:

On Wednesday 23 July 2014 08:10 PM, Grygorii Strashko wrote:

On Keystone SOCs, DSP cores can send interrupts to ARM
host using the IRQ controller IP. It provides 28 IRQ
signals to ARM. The IRQ handler running on HOST OS can
identify DSP signal source by analyzing SRCCx bits in
IPCARx registers. This is one of the component used by
the IPC mechanism used on Keystone SOCs.

(...)


+Required Properties:
+- compatible: should be "ti,keystone-irq"
+- ti,syscon-dev : phandle and offset pair. The phandle to syscon used to
+access device control registers and the offset inside
+device control registers range.
+- interrupt-controller : Identifies the node as an interrupt controller
+- #interrupt-cells : Specifies the number of cells needed to encode
interrupt
+ source should be 1.
+- interrupts: interrupt reference to primary interrupt controller

proper indentation for the properties

- compatible: Should be "ti,keystone-irq"
- ti,syscon-dev: phandle and offset pair. The phandle to syscon
used to
access device control registers and the offset inside
device control registers range.


+
+Please refer to interrupts.txt in this directory for details of the
common
+Interrupt Controllers bindings used by client devices.
+

(...)


+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 

Includes in alphabetical order...

Give one line gap before local includes..

...
#include 

#include "irqchip.h"


+#include "irqchip.h"
+
+
+/* The source ID bits start from 4 to 31 (total 28 bits)*/
+#define BIT_OFS4
+#define KEYSTONE_N_IRQ(32 - BIT_OFS)
+
+struct keystone_irq_device {
+struct device*dev;
+struct irq_chip chip;
+u32 mask;
+u32 irq;
+struct irq_domain*irqd;
+struct regmap*devctrl_regs;
+u32devctrl_offset;
+};
+
+static inline u32 keystone_irq_readl(struct keystone_irq_device *kirq)
+{
+int ret;
+u32 val = 0;
+
+ret = regmap_read(kirq->devctrl_regs, kirq->devctrl_offset, );
+if (ret < 0)
+dev_dbg(kirq->dev, "irq read failed ret(%d)\n", ret);
+return val;
+}
+
+static inline void
+keystone_irq_writel(struct keystone_irq_device *kirq, u32 value)
+{
+int ret;
+
+ret = regmap_write(kirq->devctrl_regs, kirq->devctrl_offset, value);
+if (ret < 0)
+dev_dbg(kirq->dev, "irq write failed ret(%d)\n", ret);

It can be like

if (!regmap_write(kirq->devctrl_regs, kirq->devctrl_offset, value))
  dev_dbg(kirq->dev, "irq write failed \n");


+}
+
+

Pls, Pay attention that I'd like to see ret code here in case of failure.


What we have to do with ret code... ?
In case of failure only this debug message will be printed.


(...)


+}
+
+static int keystone_irq_map(struct irq_domain *h, unsigned int virq,
+irq_hw_number_t hw)

should match open parenthesis:

static int keystone_irq_map(struct irq_domain *h, unsigned int virq,
  irq_hw_number_t hw)


+{
+struct keystone_irq_device *kirq = h->host_data;
+
+irq_set_chip_data(virq, kirq);
+irq_set_chip_and_handler(virq, >chip, handle_level_irq);
+set_irq_flags(virq, IRQF_VALID | IRQF_PROBE);
+return 0;
+}
+
+static struct irq_domain_ops keystone_irq_ops = {
+.map= keystone_irq_map,
+.xlate= irq_domain_xlate_onecell,
+};
+
+static int keystone_irq_probe(struct platform_device *pdev)
+{
+struct device *dev = >dev;
+struct device_node *np = dev->of_node;
+struct keystone_irq_device *kirq;
+int ret;
+
+if (np == NULL)
+return -EINVAL;

return -ENODEV??

If probe is executed - the dev is present, but it was created in a 
wrong/unsupported way
or dev structure contains wrong data.


Here we are trying to get the device tree node , but that is not present we may 
return the
error code saying that NO DEVICE is present


(...)


+static struct platform_driver keystone_irq_device_driver = {
+.probe= keystone_irq_probe,
+.remove= keystone_irq_remove,
+.driver= {
+.name= "keystone_irq",
+.owner= THIS_MODULE,

No need to update it. Its done by module_platform_driver()..


+.of_match_table= of_match_ptr(keystone_irq_dt_ids),

This driver is always populate through the dts file. So no need to use
of_match_ptr



--
-Varka Bhadram

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v7 03/10] x86, mpx: add macro cpu_has_mpx

2014-07-23 Thread Ren, Qiaowei



On 2014-07-24, Hansen, Dave wrote:
> On 07/22/2014 07:35 PM, Ren, Qiaowei wrote:
>> The checking about MPX feature should be as follow:
>> 
>> if (pcntxt_mask & XSTATE_EAGER) {
>> if (eagerfpu == DISABLE) {
>> pr_err("eagerfpu not present, disabling some
> xstate features: 0x%llx\n",
>> pcntxt_mask &
> XSTATE_EAGER);
>> pcntxt_mask &= ~XSTATE_EAGER; } else { eagerfpu
>> = ENABLE;
>> }
>> }
>> This patch was merged into kernel the ending of last year
>> (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/com
>> mi
>> t/?id=e7d820a5e549b3eb6c3f9467507566565646a669 )
> 
> Should we be doing a clear_cpu_cap(X86_FEATURE_MPX) in here?
> 
> This isn't major, but I can't _ever_ imagine a user being able to
> track down why MPX is not working from this message.  Should we spruce it up 
> somehow?

Maybe. If the error log "disabling some xstate features:" is changed to 
"disabling MPX xstate features:", do you think it is OK?

Thanks,
Qiaowei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Staging: comedi: amplc_pc236: a blank line inserted

2014-07-23 Thread Greg KH

On Wed, Jul 23, 2014 at 05:06:28AM +0300, Sam Asadi wrote:
> A 'Missing a blank line after declarations' warning fixed by inserting
> a blank line after struct pointer declaration.
> 
> Signed-off-by: Sam Asadi 
> ---
>  drivers/staging/comedi/drivers/amplc_pc236.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/staging/comedi/drivers/amplc_pc236.c 
> b/drivers/staging/comedi/drivers/amplc_pc236.c
> index c9a96ad..d0f81f8 100644
> --- a/drivers/staging/comedi/drivers/amplc_pc236.c
> +++ b/drivers/staging/comedi/drivers/amplc_pc236.c
> @@ -515,6 +515,7 @@ static void pc236_detach(struct comedi_device *dev)
>   comedi_legacy_detach(dev);
>   } else if (is_pci_board(thisboard)) {
>   struct pci_dev *pcidev = comedi_to_pci_dev(dev);
> +
>   if (dev->irq)
>   free_irq(dev->irq, dev);
>   comedi_pci_disable(dev);

This is already in my tree, sorry :(

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v7 09/10] x86, mpx: cleanup unused bound tables

2014-07-23 Thread Ren, Qiaowei



On 2014-07-24, Hansen, Dave wrote:
> On 07/20/2014 10:38 PM, Qiaowei Ren wrote:
>> Since the kernel allocated those tables on-demand without userspace
>> knowledge, it is also responsible for freeing them when the
>> associated mappings go away.
>> 
>> Here, the solution for this issue is to hook do_munmap() to check
>> whether one process is MPX enabled. If yes, those bounds tables
>> covered in the virtual address region which is being unmapped will
>> be freed
> also.
> 
> This is the part of the code that I'm the most concerned about.
> 
> Could you elaborate on how you've tested this to make sure it works OK?

I can check a lot of debug information when one VMA and related bounds tables 
are allocated and freed through adding a lot of print() like log into 
kernel/runtime. Do you think this is enough?

Thanks,
Qiaowei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v2 net-next 10/16] bpf: add eBPF verifier

2014-07-23 Thread Alexei Starovoitov

On Wed, Jul 23, 2014 at 4:38 PM, Kees Cook  wrote:
>> +Program that doesn't check return value of map_lookup_elem() before 
>> accessing
>> +map element:
>> +  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
>> +  BPF_ALU64_REG(BPF_MOV, BPF_REG_2, BPF_REG_10),
>> +  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
>> +  BPF_ALU64_IMM(BPF_MOV, BPF_REG_1, 1),
>> +  BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
>
> Is the expectation that these pointers are direct kernel function
> addresses? It looks like they're indexes in the check_call routine
> below. What specifically were the pointer leaks you'd mentioned?

yes, the pointer returned from map_lookup_elem() is a direct pointer
to map element value. If program prints it, that obviously a leak.
Therefore I'm planning to add 'secure' mode to verifier where such
pointer leaks are detected and rejected. This mode will be on for
any non-root syscall.

>> +#define _(OP) ({ int ret = OP; if (ret < 0) return ret; })
>
> This seems overly terse. :) And the meaning tends to be overloaded
> (this obviously isn't a translatable string, etc). Perhaps call it
> "chk" or "ret_fail"? And I think OP in the body should have ()s around
> it to avoid potential macro expansion silliness.

Sure, I'll wrap OP in ().
you've missed the previous thread about my favorite _ macro:
http://www.spinics.net/lists/netdev/msg288070.html
I think I gave a ton of 'pro' arguments already.
Looks like I have to order a bunch of t-shirts with '#define _()' on
them and give it to everyone on the next conference :)

>> +static const char *const bpf_jmp_string[] = {
>> +   "jmp", "==", ">", ">=", "&", "!=", "s>", "s>=", "call", "exit"
>> +};
>
> It seems like these string arrays should have literal initializers
> like reg_type_str does.

yeah. good point. will do.

>> +static int check_reg_arg(struct reg_state *regs, int regno, bool is_src)
>> +{
>
> Since regno is always populated with dst_reg/src_reg (u8 :4 sized),
> shouldn't this be u8 instead of int? (And in check_* below too?) More

why? 'int' type is much friendlier to compiler. u8,u16 is a pain to deal with.
unsigned types in general are much harder for optimizer.

> importantly, regno needs bounds checking. MAX_BPF_REG is 10, but
> dst_reg/src_reg could be up to 15, IIUC.

grr. yes. somehow lost this check in this version. good catch.

>> +   } else {
>> +   if (regno == BPF_REG_FP)
>> +   /* frame pointer is read only */
>
> Why no verbose() call here?

no good reason.will add.

>> +   slot = >stack[MAX_BPF_STACK + off];
>> +   slot->stype = STACK_SPILL;
>> +   /* save register state */
>> +   slot->type = state->regs[value_regno].type;
>> +   slot->imm = state->regs[value_regno].imm;
>> +   for (i = 1; i < 8; i++) {
>> +   slot = >stack[MAX_BPF_STACK + off + i];
>
> off and size need bounds checking here and below.

off and size were checked in check_mem_access().
Here size is 1,2,4,8 and off is within [-MAX_BPF_STACK,0)
so no extra checks needed.

>> +/* check read/write into map element returned by bpf_map_lookup_elem() */
>> +static int check_map_access(struct verifier_env *env, int regno, int off,
>> +   int size)
>> +{
>> +   struct bpf_map *map;
>> +   int map_id = env->cur_state.regs[regno].imm;
>> +
>> +   _(get_map_info(env, map_id, ));
>> +
>> +   if (off < 0 || off + size > map->value_size) {
>
> This could be tricked with a negative size, or a giant size, wrapping 
> negative.

nope. cannot. check_map_access() is called from check_mem_access()
where off and size were checked.

>> +static int check_mem_access(struct verifier_env *env, int regno, int off,
>> +   int bpf_size, enum bpf_access_type t,
>> +   int value_regno)
>> +{
>> +   struct verifier_state *state = >cur_state;
>> +   int size;
>> +
>> +   _(size = bpf_size_to_bytes(bpf_size));
>> +
>> +   if (off % size != 0) {
>> +   verbose("misaligned access off %d size %d\n", off, size);
>> +   return -EACCES;
>> +   }
>
> I think more off and size checking is needed here.

I don't see the problem. Here it's the main entry into other checks.
alignment check above is a common check for all memory accesses.
All other stricter checks are in check_map_access(), check_stack_*(),
check_ctx_access() that are called from this check_mem_access() func.
Why do you think more checking is needed?

>> +/* when register 'regno' is passed into function that will read 
>> 'access_size'
>> + * bytes from that pointer, make sure that it's within stack boundary
>> + * and all elements of stack are initialized
>> + */
>> +static int check_stack_boundary(struct verifier_env *env,
>> +   int regno, int access_size)
>> +{
>> +   struct verifier_state *state = >cur_state;
>> +   struct reg_state

Re: [PATCH v2] mm/highmem: make kmap cache coloring aware

2014-07-23 Thread Max Filippov

Hi Andrew,

thanks for your feedback, I'll address your points in the next version of this
series.

On Thu, Jul 24, 2014 at 1:17 AM, Andrew Morton
 wrote:
> Fifthly, it would be very useful to publish the performance testing
> results for at least one architecture so that we can determine the
> patchset's desirability.  And perhaps to motivate other architectures
> to implement this.

What sort of performance numbers would be relevant?
For xtensa this patch enables highmem use for cores with aliasing cache,
that is access to a gigabyte of memory (typical on KC705 FPGA board) vs.
only 128MBytes of low memory, which is highly desirable. But performance
comparison of these two configurations seems to make little sense.
OTOH performance comparison of highmem variants with and without
cache aliasing would show the quality of our cache flushing code.

-- 
Thanks.
-- Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.16-rc6

2014-07-23 Thread Linus Torvalds

On Wed, Jul 23, 2014 at 2:53 AM, Borislav Petkov  wrote:
>
> Well, it looks like we f*cked up something after -rc5 since I'm starting
> to see lockdep splats all over the place which I didn't see before. I'm
> running rc6 + tip/master.
>
> There was one in r8169 yesterday:
>
> https://lkml.kernel.org/r/20140722081840.ga6...@pd.tnic
>
> and now I'm seeing the following in a kvm guest. I'm adding some more
> lists to CC which look like might be related, judging from the stack
> traces.

Hmm. I'm not seeing the reason for this.

> [   31.704282] [ INFO: possible irq lock inversion dependency detected ]
> [   31.704282] 3.16.0-rc6+ #1 Not tainted
> [   31.704282] -
> [   31.704282] Xorg/3484 just changed the state of lock:
> [   31.704282]  (tasklist_lock){.?.+..}, at: [] 
> send_sigio+0x59/0x1b0
> [   31.704282] but this lock took another, HARDIRQ-unsafe lock in the past:
> [   31.704282]  (&(>alloc_lock)->rlock){+.+...}

Ok, so the claim is that there's a 'p->alloc_lock' (ie "task_lock()")
that is inside the tasklist_lock, which would indeed be wrong. But I'm
not seeing it. The "shortest dependencies" thing seems to imply
__set_task_comm(), but that only takes task_lock.

Unless there is something in tip/master. Can you check that this is
actually in plain -rc6?

Or maybe I'm just blind. Those lockdep splats are easy to get wrong.
Adding PeterZ and Ingo to the list just because they are my lockdep
go-to people.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFCv2 PATCH 01/23] sched: Documentation for scheduler energy cost model

2014-07-23 Thread Rafael J. Wysocki

Hi Morten,

Sorry for the late response, I've been swamped with other stuff lately.

I have a couple of remarks regarding the terminology and one general concern
(please see below).

On Thursday, July 03, 2014 05:25:48 PM Morten Rasmussen wrote:
> This documentation patch provides an overview of the experimental
> scheduler energy costing model, associated data structures, and a
> reference recipe on how platforms can be characterized to derive energy
> models.
> 
> Signed-off-by: Morten Rasmussen 
> ---

[cut]

> +
> +Platform topology
> +--
> +
> +The system topology (cpus, caches, and NUMA information, not peripherals) is
> +represented in the scheduler by the sched_domain hierarchy which has
> +sched_groups attached at each level that covers one or more cpus (see
> +sched-domains.txt for more details). To add energy awareness to the scheduler
> +we need to consider power and frequency domains.
> +
> +Power domain:
> +
> +A power domain is a part of the system that can be powered on/off
> +independently. Power domains are typically organized in a hierarchy where you
> +may be able to power down just a cpu or a group of cpus along with any
> +associated resources (e.g.  shared caches). Powering up a cpu means that all
> +power domains it is a part of in the hierarchy must be powered up. Hence, it 
> is
> +more expensive to power up the first cpu that belongs to a higher level power
> +domain than powering up additional cpus in the same high level domain. Two
> +level power domain hierarchy example:
> +
> + Power source
> +  +---+...
> +per group PD  G   G
> +  |   +--+|
> + ++---| Shared   |  (other groups)
> +per-cpu PD   GG   | resource |
> + ||   +--+
> + +---+ +---+
> + | CPU 0 | | CPU 1 |
> + +---+ +---+
> +
> +Frequency domain:
> +
> +Frequency domains (P-states) typically cover the same group of cpus as one of
> +the power domain levels. That is, there might be several smaller power 
> domains
> +sharing the same frequency (P-state) or there might be a power domain 
> spanning
> +multiple frequency domains.
> +
> +From a scheduling point of view there is no need to know the actual 
> frequencies
> +[Hz]. All the scheduler cares about is the compute capacity available at the
> +current state (P-state) the cpu is in and any other available states. For 
> that
> +reason, and to also factor in any cpu micro-architecture differences, compute
> +capacity scaling states are called 'capacity states' in this document. For 
> SMP
> +systems this is equivalent to P-states. For mixed micro-architecture systems
> +(like ARM big.LITTLE) it is P-states scaled according to the 
> micro-architecture
> +performance relative to the other cpus in the system.
> +

I am used to slightly different terminology here.  Namely, there are voltage
domains (parts sharing a voltage rail or a voltage regulator, such that you
can only apply/remove/change voltage to all of them at the same time) and clock
domains (analogously, but for clocks).  A power domain (which in your 
description
above seems to correspond to a voltage domain) may be a voltage domain, a clock
domain or a combination thereof.

In addition to that, in a voltage domain it may be possible to apply many
different levels of voltage, which case doesn't seem to be covered at all by
the above (or I'm missing something).

Also a P-state is not just a frequency level, but a combination of frequency
and voltage that has to be applied for that frequency to be stable.  You may
regard them as Operation Performance Points of the CPU, but that very well may
go beyond frequencies and voltages.  Thus it actually is better not to talk
about P-states as "frequencies".

Now, P-states may or may not have to be coordinated between all CPUs in a
package (cluster), by hardware or software, such that all CPUs in a cluster
need to be kept in the same P-state.  That you can regard as a "P-state
domain", but it usually means a specific combination of voltage and frequency.

C-states in turn are states in which CPUs don't execute instructions.
That need not mean the removal of voltage or even frequency from them.
Of course, they do mean some sort of power draw reduction, but that may
be achieved in many different ways.  Some C-states require coordination
too (for example, a single C-state may apply to a whole package or cluster
at the same time) and you can think about "domains" here too, but there
need not be a direct mapping to physical parameters such as the frequency
or the voltage.

Moreover, P-states and C-states may overlap.  That is, a CPU may be in Px
and Cy at the same time, which means that after leaving Cy it will execute
instructions in Px.  Things like leakage may depend on x in that

Re: WARNING: at kernel/cpuset.c:1139

2014-07-23 Thread Li Zefan

On 2014/7/23 23:12, Tejun Heo wrote:
> On Wed, Jul 23, 2014 at 10:50:29AM +0800, Mike Qiu wrote:
>> commit 734d45130cb ("cpuset: update cs->effective_{cpus, mems} when config
>> changes") introduce the below warning in my server.
>>
>> [   35.652137] [ cut here ]
>> [   35.652141] WARNING: at kernel/cpuset.c:1139
> 
> Hah, can you reproduce it?  If so, can you detail how?
> 

It's a typo.

WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
nodes_equal(cp->mems_allowed, cp->effective_mems));

should be

WARN_ON(!cgroup_on_dfl(cp->css.cgroup) &&
!nodes_equal(cp->mems_allowed, cp->effective_mems));

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] clk: samsung: exynos4: Enable ARMCLK down feature

2014-07-23 Thread Mike Turquette

Quoting Krzysztof Kozlowski (2014-07-18 07:36:32)
> Enable ARMCLK down feature on all Exynos4 SoCs. The frequency of
> ARMCLK will be reduced upon entering idle mode (WFI or WFE).
> 
> The feature behaves like very fast cpufreq ondemand governor. In idle
> mode this reduces energy consumption on full frequency chosen by
> cpufreq governor by approximately:
>  - Trats2:  6.5% (153 mA -> 143 mA)
>  - Trats:  33.0% (180 mA -> 120 mA)
>  - Gear1:  27.0% (180 mA -> 130 mA)

Nice power savings! Just a quick question on this feature: the clock
frequency is changed in hardware as a result of WFI/WFE? And this only
happens when all CPUs in a cluster (e.g. all 4 CPUs in Exynos 4412) are
in WFI/WFE state?

Thanks,
Mike

> 
> The patch uses simillar settings as Exynos5250 (clk-exynos5250.c),
> except it disables clock up feature and on Exynos4412 ARMCLK down is
> enabled for all 4 cores.
> 
> Tested on Trats board (Exynos4210), Trats2 board (Exynos4412) and
> Samsung Gear 1 (Exynos4212).
> 
> Signed-off-by: Krzysztof Kozlowski 
> 
> ---
> 
> Changes since v1:
> 1. Add PWR_CTRL registers to the list of saved clk registers on
>Exynos4x12. Suggested by Tomasz Figa.
> 2. Disable the clock up feature. (sug. Tomasz Figa)
> 3. Use macros for setting clock down ratio. (sug. Tomasz Figa)
> 4. Use num_possible_cpus() for exception on Exynos4x12. (sug. Tomasz
>Figa)
> 5. Enable the clock down feature also on Exynos4210 Trats board.
> ---
>  drivers/clk/samsung/clk-exynos4.c | 46 
> +++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/drivers/clk/samsung/clk-exynos4.c 
> b/drivers/clk/samsung/clk-exynos4.c
> index 7f4a473a7ad7..86c7709dc6d6 100644
> --- a/drivers/clk/samsung/clk-exynos4.c
> +++ b/drivers/clk/samsung/clk-exynos4.c
> @@ -114,11 +114,27 @@
>  #define DIV_CPU1   0x14504
>  #define GATE_SCLK_CPU  0x14800
>  #define GATE_IP_CPU0x14900
> +#define PWR_CTRL1  0x15020
> +#define E4X12_PWR_CTRL20x15024
>  #define E4X12_DIV_ISP0 0x18300
>  #define E4X12_DIV_ISP1 0x18304
>  #define E4X12_GATE_ISP00x18800
>  #define E4X12_GATE_ISP10x18804
>  
> +/* Below definitions are used for PWR_CTRL settings */
> +#define PWR_CTRL1_CORE2_DOWN_RATIO(x)  (((x) & 0x7) << 28)
> +#define PWR_CTRL1_CORE1_DOWN_RATIO(x)  (((x) & 0x7) << 16)
> +#define PWR_CTRL1_DIV2_DOWN_EN (1 << 9)
> +#define PWR_CTRL1_DIV1_DOWN_EN (1 << 8)
> +#define PWR_CTRL1_USE_CORE3_WFE(1 << 7)
> +#define PWR_CTRL1_USE_CORE2_WFE(1 << 6)
> +#define PWR_CTRL1_USE_CORE1_WFE(1 << 5)
> +#define PWR_CTRL1_USE_CORE0_WFE(1 << 4)
> +#define PWR_CTRL1_USE_CORE3_WFI(1 << 3)
> +#define PWR_CTRL1_USE_CORE2_WFI(1 << 2)
> +#define PWR_CTRL1_USE_CORE1_WFI(1 << 1)
> +#define PWR_CTRL1_USE_CORE0_WFI(1 << 0)
> +
>  /* the exynos4 soc type */
>  enum exynos4_soc {
> EXYNOS4210,
> @@ -155,6 +171,7 @@ static unsigned long exynos4210_clk_save[] __initdata = {
> E4210_GATE_IP_LCD1,
> E4210_GATE_IP_PERIR,
> E4210_MPLL_CON0,
> +   PWR_CTRL1,
>  };
>  
>  static unsigned long exynos4x12_clk_save[] __initdata = {
> @@ -164,6 +181,8 @@ static unsigned long exynos4x12_clk_save[] __initdata = {
> E4X12_DIV_ISP,
> E4X12_DIV_CAM1,
> E4X12_MPLL_CON0,
> +   PWR_CTRL1,
> +   E4X12_PWR_CTRL2,
>  };
>  
>  static unsigned long exynos4_clk_pll_regs[] __initdata = {
> @@ -1164,6 +1183,32 @@ static struct samsung_pll_clock 
> exynos4x12_plls[nr_plls] __initdata = {
> VPLL_LOCK, VPLL_CON0, NULL),
>  };
>  
> +static void __init exynos4_core_down_clock(enum exynos4_soc soc)
> +{
> +   unsigned int tmp;
> +
> +   /*
> +* Enable arm clock down (in idle) and set arm divider
> +* ratios in WFI/WFE state.
> +*/
> +   tmp = (PWR_CTRL1_CORE2_DOWN_RATIO(7) | PWR_CTRL1_CORE1_DOWN_RATIO(7) |
> +   PWR_CTRL1_DIV2_DOWN_EN | PWR_CTRL1_DIV1_DOWN_EN |
> +   PWR_CTRL1_USE_CORE1_WFE | PWR_CTRL1_USE_CORE0_WFE |
> +   PWR_CTRL1_USE_CORE1_WFI | PWR_CTRL1_USE_CORE0_WFI);
> +   /* On Exynos4412 enable it also on core 2 and 3 */
> +   if (num_possible_cpus() == 4)
> +   tmp |= PWR_CTRL1_USE_CORE3_WFE | PWR_CTRL1_USE_CORE2_WFE |
> +  PWR_CTRL1_USE_CORE3_WFI | PWR_CTRL1_USE_CORE2_WFI;
> +   __raw_writel(tmp, reg_base + PWR_CTRL1);
> +
> +   /*
> +* Disable the clock up feature on Exynos4x12, in case it was
> +* enabled by bootloader.
> +*/
> +   if (exynos4_soc == EXYNOS4X12)
> +   __raw_writel(0x0, reg_base + E4X12_PWR_CTRL2);
> +}
> +
>  /* register exynos4 clocks */
>  static void __init

Re: [PATCH] driver/rtc/class.c: check the error after rtc_read_time()

2014-07-23 Thread Andrew Morton

On Thu, 24 Jul 2014 01:49:44 +0200 "Rafael J. Wysocki"  
wrote:

> On Thursday, July 24, 2014 01:47:57 AM Rafael J. Wysocki wrote:
> > On Wednesday, July 23, 2014 02:56:34 PM Andrew Morton wrote:
> > > On Tue, 15 Jul 2014 17:25:23 +0900 Hyogi Gim  wrote:
> > > 
> > > > In rtc_suspend() and rtc_resume(), the error after rtc_read_time() is 
> > > > not
> > > > checked. If rtc device fail to read time, we cannot guarantee the 
> > > > following
> > > > process.
> > > > 
> > > > Add the verification code for returned rtc_read_time() error.
> > > > 
> > > > ...
> > > >
> > > > --- a/drivers/rtc/class.c
> > > > +++ b/drivers/rtc/class.c
> > > > @@ -53,6 +53,7 @@ static int rtc_suspend(struct device *dev)
> > > > struct rtc_device   *rtc = to_rtc_device(dev);
> > > > struct rtc_time tm;
> > > > struct timespec delta, delta_delta;
> > > > +   int err;
> > > >  
> > > > if (has_persistent_clock())
> > > > return 0;
> > > > @@ -61,7 +62,12 @@ static int rtc_suspend(struct device *dev)
> > > > return 0;
> > > >  
> > > > /* snapshot the current RTC and system time at suspend*/
> > > > -   rtc_read_time(rtc, );
> > > > +   err = rtc_read_time(rtc, );
> > > > +   if (err < 0) {
> > > > +   pr_debug("%s:  fail to read rtc time\n", 
> > > > dev_name(>dev));
> > > > +   return 0;
> > > > +   }
> > > 
> > > OK, it makes no sense to go ahead and set the system time from a
> > > garbage rtc_time.
> > > 
> > > But I'm wondering if we should propagate the error back to the
> > > rtc_suspend() caller.  What does the PM core do if a particular
> > > device's ->suspend or ->resume fails?
> > 
> > It aborts the suspend.
> 
> I mean, if ->suspend fails, the suspend is aborted.

So what should rtc do in this case?  At present it pretends the read
succeeded.  Either way, this doesn't seem to be the place to be making
such policy decisions..


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 14/16] cpufreq: Add cpufreq driver for Tegra124

2014-07-23 Thread Viresh Kumar

On 24 July 2014 00:47, Tuomas Tynkkynen  wrote:
> It's this:
>
> +static int tegra124_cpufreq_probe(struct platform_device *pdev)
> +{
> [...]
> +
> +   dfll_clk = of_clk_get_by_name(cpu_dev->of_node, "dfll");
> +   if (IS_ERR(dfll_clk)) {
> +   ret = PTR_ERR(dfll_clk);
> +   goto out_put_cpu_clk;
> +   }

This would search for clocks passed via DT, right? Why would we
get EPROBE_DEFER for that? Sorry for the stupid question.

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.8 107/116] hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry

2014-07-23 Thread Hugh Dickins

On Wed, 23 Jul 2014, Kamal Mostafa wrote:
> On Tue, 2014-07-22 at 16:08 -0700, Hugh Dickins wrote:
> > On Tue, 22 Jul 2014, Kamal Mostafa wrote:
> > 
> > > 3.8.13.27 -stable review patch.  If anyone has any objections, please let 
> > > me know.
> > > 
> > > --
> > > 
> > > From: Naoya Horiguchi 
> > > 
> > > commit 4a705fef986231a3e7a6b1a6d3c37025f021f49f upstream.
> > > 
> > > There's a race between fork() and hugepage migration, as a result we try
> > > [...]
> > 
> > Please drop this one for now: other -stables have carried it, but it
> > was found last week to contain a bug of its own, arguably worse than
> > what it's fixing.  Naoya-san has done the fix for that, it's in mmotm
> > and should make its way to Linus probably this week: so please hold
> > this back until that can join it - thanks.
> > 
> > Hugh
> 
> OK, I've dropped it from the 3.8-stable queue, and will watch for the
> fix to land.  Thanks very much, Hugh!

commit 0253d634e0803a8376a0d88efee0bf523d8673f9
Author: Naoya Horiguchi 
Date:   Wed Jul 23 14:00:19 2014 -0700
mm: hugetlb: fix copy_hugetlb_page_range()

is now in Linus's tree: so the original patch is good to go into
your -stables, so long as you add 0253d634e080 on top.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC v2 net-next 13/16] tracing: allow eBPF programs to be attached to events

2014-07-23 Thread Alexei Starovoitov

On Wed, Jul 23, 2014 at 4:46 PM, Kees Cook  wrote:
>>
>> eBPF programs can call in-kernel helper functions to:
>> - lookup/update/delete elements in maps
>> - memcmp
>> - trace_printk
>> - load_pointer
>> - dump_stack
>
> Ah, this must be the pointer leaking you mentioned. :)
>
>
> Can the existing tracing mechanisms already expose kernel addresses? I
> suspect "yes". So I guess existing limitations on tracing exposure
> should already cover access control here? (I'm trying to figure out if
> a separate CONFIG is needed -- I don't think so: nothing "new" is
> exposed via eBPF, is that right?)

correct. through debugfs/tracing the whole kernel is already exposed.
Idea of eBPF for tracing is to give kernel developers and performance
engineers a tool to analyze what kernel is doing by writing programs
in C and attaching them to kprobe/tracepoint events, so it's definitely
for root only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] Lattice ECP3 FPGA: Correct endianness

2014-07-23 Thread Greg KH

On Wed, Jul 23, 2014 at 11:39:35AM +0200, Jean-Michel Hautbois wrote:
> This code corrects endianness and avoids a sparse error.
> Tested with Lattice ECP3-35 with Freescale i.MX6.
> It also sends uevent in order to load it.
> 
> Signed-off-by: Jean-Michel Hautbois 
> ---
>  drivers/misc/lattice-ecp3-config.c | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/misc/lattice-ecp3-config.c
> b/drivers/misc/lattice-ecp3-config.c
> index bb26f08..2c86319 100644
> --- a/drivers/misc/lattice-ecp3-config.c
> +++ b/drivers/misc/lattice-ecp3-config.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #define FIRMWARE_NAME"lattice-ecp3.bit"
> 
> @@ -92,8 +93,8 @@ static void firmware_load(const struct firmware *fw,
> void *context)
>  /* Trying to speak with the FPGA via SPI... */
>  txbuf[0] = FPGA_CMD_READ_ID;
>  ret = spi_write_then_read(spi, txbuf, 8, rxbuf, rx_len);
> -dev_dbg(>dev, "FPGA JTAG ID=%08x\n", *(u32 *)[4]);
> -jedec_id = *(u32 *)[4];
> +jedec_id = get_unaligned_be32([4]);
> +dev_dbg(>dev, "FPGA JTAG ID=%08x\n", jedec_id);

Your email client ate all the tabs and spit out spaces and then
line-wrapped the patch, making it impossible to apply :(

Can you fix it up and resend it in a format we can use?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivers/misc/ti-st: Load firmware from ti-connectivity directory.

2014-07-23 Thread Greg KH

On Tue, Jul 22, 2014 at 01:08:38PM +0200, Enric Balletbo i Serra wrote:
> Looks like the default location for TI firmware is inside the ti-connectivity
> directory, to be coherent with other firmware request used by TI drivers, load
> the TIInit firmware from this directory instead of /lib/firmware directly.

Ah, it's this way in the linux-firmware package, I'll go queue this up
now, sorry for the delay.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/10] of: rename of_aliases_mutex to just of_mutex

2014-07-23 Thread Grant Likely

From: Pantelis Antoniou 

We're overloading usage of of_aliases_mutex for sysfs changes,
so rename to something that is more generic.

Signed-off-by: Pantelis Antoniou 
Signed-off-by: Grant Likely 
---
 drivers/of/base.c   | 19 +--
 drivers/of/device.c |  4 ++--
 drivers/of/of_private.h |  2 +-
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index b9864806e9b8..e48a1b90a392 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -40,10 +40,9 @@ static struct device_node *of_stdout;
 static struct kset *of_kset;
 
 /*
- * Used to protect the of_aliases; but also overloaded to hold off addition of
- * nodes to sysfs
+ * Used to protect the of_aliases, to hold off addition of nodes to sysfs
  */
-DEFINE_MUTEX(of_aliases_mutex);
+DEFINE_MUTEX(of_mutex);
 
 /* use when traversing tree through the allnext, child, sibling,
  * or parent members of struct device_node.
@@ -255,13 +254,13 @@ int of_node_add(struct device_node *np)
 * Grab the mutex here so that in a race condition between of_init() and
 * of_node_add(), node addition will still be consistent.
 */
-   mutex_lock(_aliases_mutex);
+   mutex_lock(_mutex);
if (of_kset)
rc = __of_node_add(np);
else
/* This scenario may be perfectly valid, but report it anyway */
pr_info("of_node_add(%s) before of_init()\n", np->full_name);
-   mutex_unlock(_aliases_mutex);
+   mutex_unlock(_mutex);
return rc;
 }
 
@@ -289,15 +288,15 @@ static int __init of_init(void)
struct device_node *np;
 
/* Create the kset, and register existing nodes */
-   mutex_lock(_aliases_mutex);
+   mutex_lock(_mutex);
of_kset = kset_create_and_add("devicetree", NULL, firmware_kobj);
if (!of_kset) {
-   mutex_unlock(_aliases_mutex);
+   mutex_unlock(_mutex);
return -ENOMEM;
}
for_each_of_allnodes(np)
__of_node_add(np);
-   mutex_unlock(_aliases_mutex);
+   mutex_unlock(_mutex);
 
/* Symlink in /proc as required by userspace ABI */
if (of_allnodes)
@@ -2122,7 +2121,7 @@ int of_alias_get_id(struct device_node *np, const char 
*stem)
struct alias_prop *app;
int id = -ENODEV;
 
-   mutex_lock(_aliases_mutex);
+   mutex_lock(_mutex);
list_for_each_entry(app, _lookup, link) {
if (strcmp(app->stem, stem) != 0)
continue;
@@ -2132,7 +2131,7 @@ int of_alias_get_id(struct device_node *np, const char 
*stem)
break;
}
}
-   mutex_unlock(_aliases_mutex);
+   mutex_unlock(_mutex);
 
return id;
 }
diff --git a/drivers/of/device.c b/drivers/of/device.c
index dafb9736ab9b..46d6c75c1404 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -160,7 +160,7 @@ void of_device_uevent(struct device *dev, struct 
kobj_uevent_env *env)
add_uevent_var(env, "OF_COMPATIBLE_N=%d", seen);
 
seen = 0;
-   mutex_lock(_aliases_mutex);
+   mutex_lock(_mutex);
list_for_each_entry(app, _lookup, link) {
if (dev->of_node == app->np) {
add_uevent_var(env, "OF_ALIAS_%d=%s", seen,
@@ -168,7 +168,7 @@ void of_device_uevent(struct device *dev, struct 
kobj_uevent_env *env)
seen++;
}
}
-   mutex_unlock(_aliases_mutex);
+   mutex_unlock(_mutex);
 }
 
 int of_device_uevent_modalias(struct device *dev, struct kobj_uevent_env *env)
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index ff350c8fa7ac..fcc70e74dfe0 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -31,6 +31,6 @@ struct alias_prop {
char stem[0];
 };
 
-extern struct mutex of_aliases_mutex;
+extern struct mutex of_mutex;
 extern struct list_head aliases_lookup;
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1812 matches

Mail list logo