Re: [PATCH 2/2] mm: limit VmData with RLIMIT_DATA

2016-01-22 Thread Cyrill Gorcunov
On Sat, Jan 23, 2016 at 10:39:47AM +0300, Konstantin Khlebnikov wrote:
> This adds is correct version of RLIMIT_DATA check.
> And kernel boot option "ignore_rlimit_data" for reverting old behavior.
> Also could be set by /sys/module/kernel/parameters/ignore_rlimit_data.
> 
> Signed-off-by: Konstantin Khlebnikov 
Acked-by: Cyrill Gorcunov 



Re: [PATCH 1/2] mm: do not limit VmData with RLIMIT_DATA

2016-01-22 Thread Cyrill Gorcunov
On Sat, Jan 23, 2016 at 10:39:40AM +0300, Konstantin Khlebnikov wrote:
> This partially reverts 84638335900f ("mm: rework virtual memory accounting")
> 
> Before that commit RLIMIT_DATA have control only over size of the brk region.
> But that change have caused problems with all existing versions of valgrind
> because they set RLIMIT_DATA to zero for some reason.
> 
> More over, current check has a major flaw: RLIMIT_DATA in bytes,
> not pages. So, some problems might have slipped through testing.
> Let's revert it for now and put back in next release.
> 
> Signed-off-by: Konstantin Khlebnikov 
> Link: http://lkml.kernel.org/r/20151228211015.GL2194@uranus
> Reported-by: Christian Borntraeger 

Looks great for me. Thanks a lot, Kostya!
Acked-by: Cyrill Gorcunov 


Re: [Gta04-owner] [PATCH 0/4] UART slave device support - version 4

2016-01-22 Thread Andreas Kemnade
On Fri, 22 Jan 2016 20:12:29 +
One Thousand Gnomes  wrote:

> > I would have expected that the main (and IMO sufficient) reason why
> > the kernel should do it is because the particular bus used to connect
> > a BT chip to the CPU is a hw detail that a kernel that does its job
> > should keep to itself. Same as userspace not needing to care if a BT
> > chip is behind SDIO or USB, why does it have to tell the kernel behind
> > which UART a BT chip is sitting?
> 
> Lots of reasons, some historic some not
> 
> 1. Different BT chips have different interfaces, especially when it gets
> to stuff like firmware reprogramming
> 
> 2. In many cases we don't know at the kernel level where there are BT
> uarts. It's improving with recent ACPI but for many systems it's simply
> not available to the OS
> 
Same is true for i2c devices. The solution there is that you have various
methods for providing the information to the kernel, some 
are autoprobed, some are via board files and you can also tell via sysfs
that there is one device.

> 3. The power management for a lot of BT (especially on device tree) is
> not actually expressed, so you need a slightly customised daemon for each
> device - that one is ugly but the serial and bt layers can't fix it.
> 
That boils down to a circular it is not there because it is not there.
If we express the power management, it can be done in kernel.

> 4. Because you don't want to just automatically load and turn on
> bluetooth just because it is there - it burns power
> 
Exactly the same is true for wifi and for many other devices for
which drivers are automatically handled in kernel, too.
Well, do you have a list of devices which do not burn power?
I would be highly interested in those.

Regards,
Andreas


pgpPEGBL_N0Da.pgp
Description: OpenPGP digital signature


[PATCH 2/2] mm: limit VmData with RLIMIT_DATA

2016-01-22 Thread Konstantin Khlebnikov
This adds is correct version of RLIMIT_DATA check.
And kernel boot option "ignore_rlimit_data" for reverting old behavior.
Also could be set by /sys/module/kernel/parameters/ignore_rlimit_data.

Signed-off-by: Konstantin Khlebnikov 
---
 Documentation/kernel-parameters.txt |5 +
 mm/mmap.c   |8 
 2 files changed, 13 insertions(+)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index cfb2c0f1a4a8..850239102e86 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1461,6 +1461,11 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
could change it dynamically, usually by
/sys/module/printk/parameters/ignore_loglevel.
 
+   ignore_rlimit_data
+   Ignore setrlimit(RLIMIT_DATA) setting for private
+   mappings (as it was before). Could be changed by
+   /sys/module/kernel/parameters/ignore_rlimit_data.
+
ihash_entries=  [KNL]
Set number of hash buckets for inode cache.
 
diff --git a/mm/mmap.c b/mm/mmap.c
index e0cd98c510ba..af272025b1b9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -69,6 +70,8 @@ const int mmap_rnd_compat_bits_max = 
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX;
 int mmap_rnd_compat_bits __read_mostly = CONFIG_ARCH_MMAP_RND_COMPAT_BITS;
 #endif
 
+static bool ignore_rlimit_data = false;
+core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
 
 static void unmap_region(struct mm_struct *mm,
struct vm_area_struct *vma, struct vm_area_struct *prev,
@@ -2982,6 +2985,11 @@ bool may_expand_vm(struct mm_struct *mm, vm_flags_t 
flags, unsigned long npages)
if (mm->total_vm + npages > rlimit(RLIMIT_AS) >> PAGE_SHIFT)
return false;
 
+   if (!ignore_rlimit_data && (flags & (VM_WRITE | VM_SHARED |
+   (VM_STACK_FLAGS & (VM_GROWSUP | VM_GROWSDOWN == VM_WRITE &&
+   mm->data_vm + npages > rlimit(RLIMIT_DATA) >> PAGE_SHIFT)
+   return false;
+
return true;
 }
 



[PATCH 1/2] mm: do not limit VmData with RLIMIT_DATA

2016-01-22 Thread Konstantin Khlebnikov
This partially reverts 84638335900f ("mm: rework virtual memory accounting")

Before that commit RLIMIT_DATA have control only over size of the brk region.
But that change have caused problems with all existing versions of valgrind
because they set RLIMIT_DATA to zero for some reason.

More over, current check has a major flaw: RLIMIT_DATA in bytes,
not pages. So, some problems might have slipped through testing.
Let's revert it for now and put back in next release.

Signed-off-by: Konstantin Khlebnikov 
Link: http://lkml.kernel.org/r/20151228211015.GL2194@uranus
Reported-by: Christian Borntraeger 
---
 mm/mmap.c |4 
 1 file changed, 4 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 84b12624ceb0..e0cd98c510ba 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2982,10 +2982,6 @@ bool may_expand_vm(struct mm_struct *mm, vm_flags_t 
flags, unsigned long npages)
if (mm->total_vm + npages > rlimit(RLIMIT_AS) >> PAGE_SHIFT)
return false;
 
-   if ((flags & (VM_WRITE | VM_SHARED | (VM_STACK_FLAGS &
-   (VM_GROWSUP | VM_GROWSDOWN == VM_WRITE)
-   return mm->data_vm + npages <= rlimit(RLIMIT_DATA);
-
return true;
 }
 



Re: [PATCH] ideapad-laptop: Add Lenovo Yoga 700 to no_hw_rfkill dmi list

2016-01-22 Thread Darren Hart
On Mon, Jan 18, 2016 at 02:32:53PM -0500, Josh Boyer wrote:
> Like the Yoga 900 models the Lenovo Yoga 700 does not have a
> hw rfkill switch, and trying to read the hw rfkill switch through the
> ideapad module causes it to always reported blocking breaking wifi.
> 
> This commit adds the Lenovo Yoga 700 to the no_hw_rfkill dmi list, fixing
> the wifi breakage.
> 
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1295272
> Tested-by: 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Josh Boyer 
> ---
> 
> This applies to the for-next branch of the platform-x86-drivers tree
> 
> 
>  drivers/platform/x86/ideapad-laptop.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/platform/x86/ideapad-laptop.c 
> b/drivers/platform/x86/ideapad-laptop.c
> index d28db0e793df..51178626305d 100644
> --- a/drivers/platform/x86/ideapad-laptop.c
> +++ b/drivers/platform/x86/ideapad-laptop.c
> @@ -900,6 +900,13 @@ static const struct dmi_system_id no_hw_rfkill_list[] = {
>   },
>   },
>   {
> + .ident = "Lenogo Yoga 700",

Josh,

Is this a typo? "Lenogo"? Please tell me it's a typo :-)

-- 
Darren Hart
Intel Open Source Technology Center


[patch-rt] sched: fix ->nr_cpus_allowed = 1 transcription error during migrate_disable() cleanup

2016-01-22 Thread Mike Galbraith

Setting p->nr_cpus_allowed accidentally wandered into migrate_disable()
during the cleanup - kill it. 

Signed-off-by: Mike Galbraith 
---
 kernel/sched/core.c |1 -
 1 file changed, 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3180,7 +3180,6 @@ void migrate_disable(void)
preempt_lazy_disable();
pin_current_cpu();
p->migrate_disable = 1;
-   p->nr_cpus_allowed = 1;
preempt_enable();
 }
 EXPORT_SYMBOL(migrate_disable);


[PATCH] dump_stack: avoid potential deadlocks

2016-01-22 Thread Eric Dumazet
From: Eric Dumazet 

Some servers experienced fatal deadlocks because of a combination
of bugs, leading to multiple cpus calling dump_stack().

The checksumming bug was fixed in commit 34ae6a1aa054
("ipv6: update skb->csum when CE mark is propagated").

The second problem is a faulty locking in dump_stack()

CPU1 runs in process context and calls dump_stack(), grabs dump_lock.

   CPU2 receives a TCP packet under softirq, grabs socket spinlock, and
   call dump_stack() from netdev_rx_csum_fault().

   dump_stack() spins on atomic_cmpxchg(_lock, -1, 2), since
   dump_lock is owned by CPU1

While dumping its stack, CPU1 is interrupted by a softirq, and happens
to process a packet for the TCP socket locked by CPU2.

CPU1 spins forever in spin_lock() : deadlock

Stack trace on CPU1 looked like :

[306295.402231] NMI backtrace for cpu 1
[306295.402238] RIP: 0010:[]  [] 
_raw_spin_lock+0x25/0x30
...
[306295.402255] Stack:
[306295.402256]  88407f023cb0 a99cbdc3 88407f023ca0 
88012f496bb0
[306295.402266]  aa4dc1f0 8820d94f0dc0 000a 
aa4b4280
[306295.402275]  88407f023ce0 a98a21d0 88407f023cc0 
88407f023ca0
[306295.402284] Call Trace:
[306295.402286]   
[306295.402288] 
[306295.402291]  [] tcp_v6_rcv+0x243/0x620
[306295.402304]  [] ip6_input_finish+0x11f/0x330
[306295.402309]  [] ip6_input+0x38/0x40
[306295.402313]  [] ip6_rcv_finish+0x3c/0x90
[306295.402318]  [] ipv6_rcv+0x2a9/0x500
[306295.402323]  [] process_backlog+0x461/0xaa0
[306295.402332]  [] net_rx_action+0x147/0x430
[306295.402337]  [] __do_softirq+0x167/0x2d0
[306295.402341]  [] call_softirq+0x1c/0x30
[306295.402345]  [] do_softirq+0x3f/0x80
[306295.402350]  [] irq_exit+0x6e/0xc0
[306295.402355]  [] 
smp_call_function_single_interrupt+0x35/0x40
[306295.402360]  [] call_function_single_interrupt+0x6a/0x70
[306295.402361]   
[306295.402364] 
[306295.402376]  [] printk+0x4d/0x4f
[306295.402390]  [] printk_address+0x31/0x33
[306295.402395]  [] print_trace_address+0x33/0x3c
[306295.402408]  [] print_context_stack+0x7f/0x119
[306295.402412]  [] dump_trace+0x26b/0x28e
[306295.402417]  [] show_trace_log_lvl+0x4f/0x5c
[306295.402421]  [] show_stack_log_lvl+0x104/0x113
[306295.402425]  [] show_stack+0x42/0x44
[306295.402429]  [] dump_stack+0x46/0x58
[306295.402434]  [] netdev_rx_csum_fault+0x38/0x3c
[306295.402439]  [] __skb_checksum_complete_head+0x6e/0x80
[306295.402444]  [] __skb_checksum_complete+0x11/0x20
[306295.402449]  [] tcp_rcv_established+0x2bd5/0x2fd0
[306295.402468]  [] tcp_v6_do_rcv+0x13c/0x620
[306295.402477]  [] sk_backlog_rcv+0x15/0x30
[306295.402482]  [] release_sock+0xd2/0x150
[306295.402486]  [] tcp_recvmsg+0x1c1/0xfc0
[306295.402491]  [] inet_recvmsg+0x7d/0x90
[306295.402495]  [] sock_recvmsg+0xaf/0xe0
[306295.402505]  [] ___sys_recvmsg+0x111/0x3b0
[306295.402528]  [] SyS_recvmsg+0x5c/0xb0
[306295.402532]  [] system_call_fastpath+0x16/0x1b


Fixes: b58d977432c8 ("dump_stack: serialize the output from dump_stack()")
Signed-off-by: Eric Dumazet 
Cc: Alex Thorlton 
---
 lib/dump_stack.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index 6745c6230db3..c30d07e99dba 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -25,6 +25,7 @@ static atomic_t dump_lock = ATOMIC_INIT(-1);
 
 asmlinkage __visible void dump_stack(void)
 {
+   unsigned long flags;
int was_locked;
int old;
int cpu;
@@ -33,9 +34,8 @@ asmlinkage __visible void dump_stack(void)
 * Permit this cpu to perform nested stack dumps while serialising
 * against other CPUs
 */
-   preempt_disable();
-
 retry:
+   local_irq_save(flags);
cpu = smp_processor_id();
old = atomic_cmpxchg(_lock, -1, cpu);
if (old == -1) {
@@ -43,6 +43,7 @@ retry:
} else if (old == cpu) {
was_locked = 1;
} else {
+   local_irq_restore(flags);
cpu_relax();
goto retry;
}
@@ -52,7 +53,7 @@ retry:
if (!was_locked)
atomic_set(_lock, -1);
 
-   preempt_enable();
+   local_irq_restore(flags);
 }
 #else
 asmlinkage __visible void dump_stack(void)




[PATCH] x86/head_64.S: do not use temporary register to check alignment

2016-01-22 Thread Alexander Kuleshov
We are using temporary %rax register during checking of kernel address
alignment. We can ged rid of it since testl instruction is safe and does
not change value of the rbp register.

Signed-off-by: Alexander Kuleshov 
Suggested-by: Brian Gerst 
---
 arch/x86/kernel/head_64.S | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index ffdc0e8..7c21029 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -76,9 +76,7 @@ startup_64:
subq$_text - __START_KERNEL_map, %rbp
 
/* Is the address not 2M aligned? */
-   movq%rbp, %rax
-   andl$~PMD_PAGE_MASK, %eax
-   testl   %eax, %eax
+   testl   $~PMD_PAGE_MASK, %ebp
jnz bad_address
 
/*
-- 
2.7.0.25.gfc10eb5



Re: [RFC PATCH] x86/head_64.S: remove redundant check that kernel address is 2M aligned

2016-01-22 Thread Alexander Kuleshov
Hello Brian,

On 01-22-16, Brian Gerst wrote:
> >
> > -   /* Is the address not 2M aligned? */
> > -   movq%rbp, %rax
> > -   andl$~PMD_PAGE_MASK, %eax
> > -   testl   %eax, %eax
> > -   jnz bad_address
> > -
> > /*
> >  * Is the address too large?
> >  */
> 
> I think we still need to do the check, in case we came from a 64-bit
> bootloader that directly jumped to startup_64.  However, this check
> can be simplified to:
> 
> testl $~PMD_PAGE_MASK, %ebp
> jnz bad_address

Ah, ok, in this way we can't trust a bootloader. I just thought that
64-bit entry point is startup_64 from arch/x86/boot/compressed/head_64.S

Thank you.


Re: [LKP] [lkp] [spi] 2baed30cb3: BUG: scheduling while atomic: systemd-udevd/134/0x00000002

2016-01-22 Thread Sudip Mukherjee
Hi Huang, Ying,
On Thu, Jan 21, 2016 at 11:36:52AM +0530, Sudip Mukherjee wrote:
> On Thu, Jan 21, 2016 at 01:47:10PM +0800, Huang, Ying wrote:
> > Sudip Mukherjee  writes:
> > 
> > > On Wed, Jan 20, 2016 at 01:00:40PM +0800, Huang, Ying wrote:
> > >> Sudip Mukherjee  writes:
> > >> 
> > >> > On Wed, Jan 20, 2016 at 08:44:37AM +0800, kernel test robot wrote:
> 
> > >
> > > I am not able to reproduce this. Tested just with the kernel and
> > > yocto-minimal-i386.cgz filesystem and it booted properly.
> > >
> > > I guess I need atleast your job file to reproduce this.
> > 
> > This is a boot test so I did not attached the job file.  But the test
> > result may depends on specific root file system.  For example, the
> > process when BUG report is always systemd-udevd.  Maybe you need a
> > systemd based root file system.
> 
> So silly of me. Since you said 2baed30cb3, so i kept looking at that
> patch.
> Can you please test again after reverting:
> ebd43516d387 ("Staging: panel: usleep_range is preferred over udelay")
> 
> If it solves the problem then I will submit a formal patch.

Did you get a chance to test it?

regards
sudip


Re: [BUG] Devices breaking due to CONFIG_ZONE_DEVICE

2016-01-22 Thread Dan Williams
On Fri, Jan 22, 2016 at 9:47 PM, Dan Williams  wrote:
> On Fri, Jan 22, 2016 at 8:46 PM, Sudip Mukherjee
>  wrote:
>> Hi All,
>> Commit 033fbae988fc ("mm: ZONE_DEVICE for "device memory"") has
>> introduced CONFIG_ZONE_DEVICE while sacrificing CONFIG_ZONE_DMA.
>> Distributions like Ubuntu has started enabling CONFIG_ZONE_DEVICE and
>> thus breaking parallel port. Please have a look at
>> https://bugzilla.kernel.org/show_bug.cgi?id=110931 for the bug report.
>>
>> Apart from parallel port I can see some sound drivers will also break.
>>
>> Now what is the possible solution for this?
>
> The tradeoff here is enabling direct-I/O for persistent memory vs
> support for legacy devices.
>
> One possible solution is to alias ZONE_DMA and ZONE_DEVICE.  At early
> boot if pmem is detected disable these legacy devices, or the reverse
> disable DMA to persistent memory if a legacy device is detected.  The
> latter is a bit harder to do as I think we would want to make the
> decision early during memory init before we would know if any parallel
> ports or ISA sound cards are present.

...another option that might be cleaner is to teach GFP_DMA to get
memory from a different mechanism.  I.e. don't use the mm-zone
infrastructure to organize that small 16MB pool of memory.


RE: [PATCH] drivers/scsi/emcctd: drivers/scsi/emcctd: Client driver implementation for EMC-Symmetrix GuestOS emulated Cut-Through Device

2016-01-22 Thread Singhal, Maneesh
Hello Greg,

Thanks for taking out time to review the patch. Please find my replies 
inlined...
Will post the next patch for modifications soon.

> -Original Message-
> From: Greg KH [mailto:g...@kroah.com]
> Sent: Tuesday, January 19, 2016 11:42 PM
> To: Singhal, Maneesh
> Cc: linux-s...@vger.kernel.org; linux-kernel@vger.kernel.org;
> jbottom...@odin.com; martin.peter...@oracle.com; linux-
> a...@vger.kernel.org
> Subject: Re: [PATCH] drivers/scsi/emcctd: drivers/scsi/emcctd: Client
> driver implementation for EMC-Symmetrix GuestOS emulated Cut-
> Through Device
> 
> On Tue, Jan 19, 2016 at 11:58:06AM +, Singhal, Maneesh wrote:
> > Hello,
> > Kindly review the following patch for the following driver to be
> added
> > in SCSI subsystem -
> >
> > Regards
> > Maneesh
> >
> > --
> > --
> > >From f3c4b836d6f130b1d7ded618002c8164f8f4a06d Mon Sep 17
> 00:00:00
> > >2001
> > From: "maneesh.singhal" 
> > Date: Tue, 19 Jan 2016 06:39:35 -0500
> > Subject: [PATCH] [PATCH] drivers/scsi/emcctd: Client driver
> > implementation for  EMC-Symmetrix GuestOS emulated Cut-
> Through Device.
> >
> > The patch is a driver implementation  EMC-Symmetrix GuestOS
> emulated
> > Cut-Through Device. The Cut-Through Device PCI emulation is
> > implemented for GuestOS environments in the HyperMax OS.
> GuestOS
> > environments allows loading of any x86 compliant operating systems
> like Linux/FreeBSD etc.
> >
> > The client driver is a SCSI HBA implementation which interfaces with
> > SCSI midlayer in the north-bound interfaces and connects with the
> > emulated PCI device on the south side.
> >
> > The PCI vendor ID:product ID for emulated Cut-Through Device is
> 0x1120:0x1B00.
> >
> > Signed-off-by: maneesh.singhal 
> > ---
> >  Documentation/scsi/emcctd.txt   |   57 +
> >  MAINTAINERS |9 +
> >  drivers/scsi/Kconfig|1 +
> >  drivers/scsi/Makefile   |1 +
> >  drivers/scsi/emcctd/Kconfig |7 +
> >  drivers/scsi/emcctd/Makefile|1 +
> >  drivers/scsi/emcctd/README  |   10 +
> >  drivers/scsi/emcctd/emc_ctd_interface.h |  386 +
> >  drivers/scsi/emcctd/emcctd.c| 2840
> +++
> >  drivers/scsi/emcctd/emcctd.h|  232 +++
> >  10 files changed, 3544 insertions(+)
> >  create mode 100644 Documentation/scsi/emcctd.txt  create mode
> 100644
> > drivers/scsi/emcctd/Kconfig  create mode 100644
> > drivers/scsi/emcctd/Makefile  create mode 100644
> > drivers/scsi/emcctd/README  create mode 100644
> > drivers/scsi/emcctd/emc_ctd_interface.h
> >  create mode 100644 drivers/scsi/emcctd/emcctd.c  create mode
> 100644
> > drivers/scsi/emcctd/emcctd.h
> >
> > diff --git a/Documentation/scsi/emcctd.txt
> > b/Documentation/scsi/emcctd.txt new file mode 100644 index
> > 000..bcafc87
> > --- /dev/null
> > +++ b/Documentation/scsi/emcctd.txt
> > @@ -0,0 +1,56 @@
> > +This file contains brief information about the EMC Cut-Through
> Driver (emcctd).
> > +The driver is currently maintained by Singhal, Maneesh
> > +(maneesh.sing...@emc.com)
> > +
> > +Last modified: Mon Jan 18 2016 by Maneesh Singhal
> > +
> > +BASICS
> > +
> > +Its a client driver implementation for EMC-Symmetrix GuestOS
> emulated
> > +Cut-Through Device. The Cut-Through Device PCI emulation is
> > +implemented for GuestOS environments in the HyperMax OS.
> GuestOS
> > +environments allows loading of any x86 compliant operating
> systems like Linux/FreeBSD etc.
> > +
> > +The client driver is a SCSI HBA implementation which interfaces with
> > +SCSI midlayer in the north-bound interfaces and connects with the
> > +emulated PCI device on the south side.
> > +
> > +The PCI vendor ID:product ID for emulated Cut-Through Device is
> 0x1120:0x1B00.
> > +
> > +VERSIONING
> > +
> > +The Version of the driver is maintained as 2.0.0.X, where 2 refers to
> > +the CTD protocol in use, and X refers to the ongoing version of the
> driver.
> > +
> > +
> > +SYSFS SUPPORT
> > +
> > +The driver creates the directory /sys/module/emcctd and
> populates it
> > +with version file and a directory for various parameters as described
> > +in MODULE PARAMETERS section.
> > +
> > +PROCFS SUPPORT
> > +
> > +The driver creates the directory /proc/emc and creates files
> > +emcctd_stats_x where 'x' refers to the PCI emulation number this
> client driver connected to.
> > +These files cotains WWN information and IO statistics for the
> > +particular PCI emulation.
> 
> No, no driver should be adding proc files, please use sysfs or debugfs
> for debugging things.
[MS>] Sure.
> 
> > +MODULE PARAMETERS
> 
> No driver should be using module parameters anymore, again, please
> use the correct interfaces.
> 
[MS>] Yes got that
> > +
> > +The supported parameters which could add debuggability or change
> the
> > +runtime behavior of the driver are as following:
> 

RE: [PATCH] drivers/scsi/emcctd: drivers/scsi/emcctd: Client driver implementation for EMC-Symmetrix GuestOS emulated Cut-Through Device

2016-01-22 Thread Singhal, Maneesh
Thanks for your time. My replies inlined...

> -Original Message-
> From: Johannes Thumshirn [mailto:jthumsh...@suse.de]
> Sent: Tuesday, January 19, 2016 9:34 PM
> To: Singhal, Maneesh
> Cc: linux-s...@vger.kernel.org; linux-kernel@vger.kernel.org;
> jbottom...@odin.com; martin.peter...@oracle.com; linux-
> a...@vger.kernel.org
> Subject: Re: [PATCH] drivers/scsi/emcctd: drivers/scsi/emcctd: Client
> driver implementation for EMC-Symmetrix GuestOS emulated Cut-
> Through Device
> 
> On Tue, Jan 19, 2016 at 11:58:06AM +, Singhal, Maneesh wrote:
> > Hello,
> > Kindly review the following patch for the following driver to be
> added in SCSI subsystem -
> >
> > Regards
> > Maneesh
> >
> > 
> > From f3c4b836d6f130b1d7ded618002c8164f8f4a06d Mon Sep 17
> 00:00:00 2001
> > From: "maneesh.singhal" 
> > Date: Tue, 19 Jan 2016 06:39:35 -0500
> > Subject: [PATCH] [PATCH] drivers/scsi/emcctd: Client driver
> implementation for
> >  EMC-Symmetrix GuestOS emulated Cut-Through Device.
> >
> > The patch is a driver implementation  EMC-Symmetrix GuestOS
> emulated Cut-Through
> > Device. The Cut-Through Device PCI emulation is implemented for
> GuestOS
> > environments in the HyperMax OS. GuestOS environments allows
> loading of
> > any x86 compliant operating systems like Linux/FreeBSD etc.
> >
> > The client driver is a SCSI HBA implementation which interfaces with
> SCSI
> > midlayer in the north-bound interfaces and connects with the
> emulated PCI device
> > on the south side.
> >
> > The PCI vendor ID:product ID for emulated Cut-Through Device is
> 0x1120:0x1B00.
> >
> > Signed-off-by: maneesh.singhal 
> > ---
> >  Documentation/scsi/emcctd.txt   |   57 +
> >  MAINTAINERS |9 +
> >  drivers/scsi/Kconfig|1 +
> >  drivers/scsi/Makefile   |1 +
> >  drivers/scsi/emcctd/Kconfig |7 +
> >  drivers/scsi/emcctd/Makefile|1 +
> >  drivers/scsi/emcctd/README  |   10 +
> >  drivers/scsi/emcctd/emc_ctd_interface.h |  386 +
> >  drivers/scsi/emcctd/emcctd.c| 2840
> +++
> >  drivers/scsi/emcctd/emcctd.h|  232 +++
> >  10 files changed, 3544 insertions(+)
> >  create mode 100644 Documentation/scsi/emcctd.txt
> >  create mode 100644 drivers/scsi/emcctd/Kconfig
> >  create mode 100644 drivers/scsi/emcctd/Makefile
> >  create mode 100644 drivers/scsi/emcctd/README
> >  create mode 100644 drivers/scsi/emcctd/emc_ctd_interface.h
> >  create mode 100644 drivers/scsi/emcctd/emcctd.c
> >  create mode 100644 drivers/scsi/emcctd/emcctd.h
> >
> > diff --git a/Documentation/scsi/emcctd.txt
> b/Documentation/scsi/emcctd.txt
> > new file mode 100644
> > index 000..bcafc87
> > --- /dev/null
> > +++ b/Documentation/scsi/emcctd.txt
> > @@ -0,0 +1,56 @@
> > +This file contains brief information about the EMC Cut-Through
> Driver (emcctd).
> > +The driver is currently maintained by Singhal, Maneesh
> (maneesh.sing...@emc.com)
> > +
> > +Last modified: Mon Jan 18 2016 by Maneesh Singhal
> > +
> > +BASICS
> > +
> > +Its a client driver implementation for EMC-Symmetrix GuestOS
> emulated
> > +Cut-Through Device. The Cut-Through Device PCI emulation is
> implemented for
> > +GuestOS environments in the HyperMax OS. GuestOS
> environments allows loading of
> > +any x86 compliant operating systems like Linux/FreeBSD etc.
> > +
> > +The client driver is a SCSI HBA implementation which interfaces with
> SCSI
> > +midlayer in the north-bound interfaces and connects with the
> emulated PCI device
> > +on the south side.
> > +
> > +The PCI vendor ID:product ID for emulated Cut-Through Device is
> 0x1120:0x1B00.
> > +
> > +VERSIONING
> > +
> > +The Version of the driver is maintained as 2.0.0.X, where 2 refers to
> the CTD
> > +protocol in use, and X refers to the ongoing version of the driver.
> > +
> > +
> > +SYSFS SUPPORT
> > +
> > +The driver creates the directory /sys/module/emcctd and
> populates it with
> > +version file and a directory for various parameters as described in
> MODULE
> > +PARAMETERS section.
> > +
> > +PROCFS SUPPORT
> > +
> > +The driver creates the directory /proc/emc and creates files
> emcctd_stats_x
> > +where 'x' refers to the PCI emulation number this client driver
> connected to.
> > +These files cotains WWN information and IO statistics for the
> particular PCI
> > +emulation.
> > +
> > +MODULE PARAMETERS
> > +
> > +The supported parameters which could add debuggability or change
> the runtime
> > +behavior of the driver are as following:
> > +
> > +ctd_debug=0 | 1Enable driver debug messages(0=off,
> 1=on)
> > +
> > +max_luns=xxSpecify the maximum number of LUN's
> per
> > +   host(default=16384)
> > +
> > +cmd_per_lun=xx Specify the maximum commands per
> lun(default=16)
> > +
> > +DEBUGGING HINTS
> > +

Re: [BUG] Devices breaking due to CONFIG_ZONE_DEVICE

2016-01-22 Thread Dan Williams
On Fri, Jan 22, 2016 at 8:46 PM, Sudip Mukherjee
 wrote:
> Hi All,
> Commit 033fbae988fc ("mm: ZONE_DEVICE for "device memory"") has
> introduced CONFIG_ZONE_DEVICE while sacrificing CONFIG_ZONE_DMA.
> Distributions like Ubuntu has started enabling CONFIG_ZONE_DEVICE and
> thus breaking parallel port. Please have a look at
> https://bugzilla.kernel.org/show_bug.cgi?id=110931 for the bug report.
>
> Apart from parallel port I can see some sound drivers will also break.
>
> Now what is the possible solution for this?

The tradeoff here is enabling direct-I/O for persistent memory vs
support for legacy devices.

One possible solution is to alias ZONE_DMA and ZONE_DEVICE.  At early
boot if pmem is detected disable these legacy devices, or the reverse
disable DMA to persistent memory if a legacy device is detected.  The
latter is a bit harder to do as I think we would want to make the
decision early during memory init before we would know if any parallel
ports or ISA sound cards are present.


RE: [PATCH] drivers/scsi/emcctd: drivers/scsi/emcctd: Client driver implementation for EMC-Symmetrix GuestOS emulated Cut-Through Device

2016-01-22 Thread Singhal, Maneesh
Hello Thumshirn.
Thanks for taking out time to review the patch. I appreciate that. Please find 
my comments inlined.

> -Original Message-
> From: Johannes Thumshirn [mailto:jthumsh...@suse.de]
> Sent: Tuesday, January 19, 2016 7:20 PM
> To: Singhal, Maneesh
> Cc: linux-s...@vger.kernel.org; linux-kernel@vger.kernel.org;
> jbottom...@odin.com; martin.peter...@oracle.com; linux-
> a...@vger.kernel.org
> Subject: Re: [PATCH] drivers/scsi/emcctd: drivers/scsi/emcctd: Client
> driver implementation for EMC-Symmetrix GuestOS emulated Cut-
> Through Device
> 
> On Tue, Jan 19, 2016 at 11:58:06AM +, Singhal, Maneesh wrote:
> > Hello,
> > Kindly review the following patch for the following driver to be
> added in SCSI subsystem -
> >
> > Regards
> > Maneesh
> >
> 
> Hi Maneesh,
> 
> Fery first round of review. No real functionallity yet just a bit on
> readablility.
> 
> I'll do a more in depth review later.
> 
> > 
>  From f3c4b836d6f130b1d7ded618002c8164f8f4a06d Mon Sep 17
> 00:00:00 2001
> > From: "maneesh.singhal" 
> > Date: Tue, 19 Jan 2016 06:39:35 -0500
> > Subject: [PATCH] [PATCH] drivers/scsi/emcctd: Client driver
> implementation for
> >  EMC-Symmetrix GuestOS emulated Cut-Through Device.
> >
> > The patch is a driver implementation  EMC-Symmetrix GuestOS
> emulated Cut-Through
> > Device. The Cut-Through Device PCI emulation is implemented for
> GuestOS
> > environments in the HyperMax OS. GuestOS environments allows
> loading of
> > any x86 compliant operating systems like Linux/FreeBSD etc.
> >
> > The client driver is a SCSI HBA implementation which interfaces with
> SCSI
> > midlayer in the north-bound interfaces and connects with the
> emulated PCI device
> > on the south side.
> >
> > The PCI vendor ID:product ID for emulated Cut-Through Device is
> 0x1120:0x1B00.
> >
> > Signed-off-by: maneesh.singhal 
> > ---
> >  Documentation/scsi/emcctd.txt   |   57 +
> >  MAINTAINERS |9 +
> >  drivers/scsi/Kconfig|1 +
> >  drivers/scsi/Makefile   |1 +
> >  drivers/scsi/emcctd/Kconfig |7 +
> >  drivers/scsi/emcctd/Makefile|1 +
> >  drivers/scsi/emcctd/README  |   10 +
> >  drivers/scsi/emcctd/emc_ctd_interface.h |  386 +
> >  drivers/scsi/emcctd/emcctd.c| 2840
> +++
> >  drivers/scsi/emcctd/emcctd.h|  232 +++
> >  10 files changed, 3544 insertions(+)
> >  create mode 100644 Documentation/scsi/emcctd.txt
> >  create mode 100644 drivers/scsi/emcctd/Kconfig
> >  create mode 100644 drivers/scsi/emcctd/Makefile
> >  create mode 100644 drivers/scsi/emcctd/README
> >  create mode 100644 drivers/scsi/emcctd/emc_ctd_interface.h
> >  create mode 100644 drivers/scsi/emcctd/emcctd.c
> >  create mode 100644 drivers/scsi/emcctd/emcctd.h
> >
> > diff --git a/Documentation/scsi/emcctd.txt
> b/Documentation/scsi/emcctd.txt
> > new file mode 100644
> > index 000..bcafc87
> > --- /dev/null
> > +++ b/Documentation/scsi/emcctd.txt
> > @@ -0,0 +1,56 @@
> > +This file contains brief information about the EMC Cut-Through
> Driver (emcctd).
> > +The driver is currently maintained by Singhal, Maneesh
> (maneesh.sing...@emc.com)
> > +
> > +Last modified: Mon Jan 18 2016 by Maneesh Singhal
> > +
> > +BASICS
> > +
> > +Its a client driver implementation for EMC-Symmetrix GuestOS
> emulated
> > +Cut-Through Device. The Cut-Through Device PCI emulation is
> implemented for
> > +GuestOS environments in the HyperMax OS. GuestOS
> environments allows loading of
> > +any x86 compliant operating systems like Linux/FreeBSD etc.
> > +
> > +The client driver is a SCSI HBA implementation which interfaces with
> SCSI
> > +midlayer in the north-bound interfaces and connects with the
> emulated PCI device
> > +on the south side.
> > +
> > +The PCI vendor ID:product ID for emulated Cut-Through Device is
> 0x1120:0x1B00.
> > +
> > +VERSIONING
> > +
> > +The Version of the driver is maintained as 2.0.0.X, where 2 refers to
> the CTD
> > +protocol in use, and X refers to the ongoing version of the driver.
> > +
> > +
> > +SYSFS SUPPORT
> > +
> > +The driver creates the directory /sys/module/emcctd and
> populates it with
> > +version file and a directory for various parameters as described in
> MODULE
> > +PARAMETERS section.
> > +
> > +PROCFS SUPPORT
> > +
> > +The driver creates the directory /proc/emc and creates files
> emcctd_stats_x
> > +where 'x' refers to the PCI emulation number this client driver
> connected to.
> > +These files cotains WWN information and IO statistics for the
> particular PCI
> > +emulation.
> > +
> > +MODULE PARAMETERS
> > +
> > +The supported parameters which could add debuggability or change
> the runtime
> > +behavior of the driver are as following:
> > +
> > +ctd_debug=0 | 1Enable driver debug messages(0=off,
> 1=on)
> > +
> > 

Re: [PATCH v2 1/4] lib/string_helpers: export string_units_{2,10} for others

2016-01-22 Thread James Bottomley
On Thu, 2016-01-21 at 17:22 +0200, Andy Shevchenko wrote:
> There is one user coming which would like to use those string arrays.
> It might
> be useful for any other user in the future.

Well, let's not do it until we have an actual consumer because that
will help us get the interface correct.

> Signed-off-by: Andy Shevchenko 
> ---
>  include/linux/string_helpers.h |  6 ++
>  lib/string_helpers.c   | 21 -
>  2 files changed, 18 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/string_helpers.h
> b/include/linux/string_helpers.h
> index dabe643..a55c9cc 100644
> --- a/include/linux/string_helpers.h
> +++ b/include/linux/string_helpers.h
> @@ -10,6 +10,12 @@ enum string_size_units {
>   STRING_UNITS_2, /* use binary powers of 2^10
> */
>  };
>  
> +#define STRING_UNITS_10_NUM  9
> +#define STRING_UNITS_2_NUM   9
> +
> +extern const char *const string_units_10[STRING_UNITS_10_NUM];
> +extern const char *const string_units_2[STRING_UNITS_2_NUM];
> +
>  void string_get_size(u64 size, u64 blk_size, enum string_size_units
> units,
>char *buf, int len);
>  
> diff --git a/lib/string_helpers.c b/lib/string_helpers.c
> index 5939f63..7ee4644 100644
> --- a/lib/string_helpers.c
> +++ b/lib/string_helpers.c
> @@ -13,6 +13,15 @@
>  #include 
>  #include 
>  
> +const char * const string_units_10[STRING_UNITS_10_NUM] = {
> + "B", "kB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB",
> +};
> +EXPORT_SYMBOL(string_units_10);
> +const char * const string_units_2[STRING_UNITS_2_NUM] = {
> + "B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB",
> +};
> +EXPORT_SYMBOL(string_units_2);
> +

This is a pretty silly thing to do; how does someone who adds a unit to
one of the string_units know to increment STRING_UNITS_X_NUM?  Even if
you add a comment admonishing them to do it, it's far better to have
this calculated at compile time like it was before this patch.

James



Re: [PATCH 07/13] aio: enabled thread based async fsync

2016-01-22 Thread Benjamin LaHaise
On Sat, Jan 23, 2016 at 03:24:49PM +1100, Dave Chinner wrote:
> On Wed, Jan 20, 2016 at 04:56:30PM -0500, Benjamin LaHaise wrote:
> > On Thu, Jan 21, 2016 at 08:45:46AM +1100, Dave Chinner wrote:
> > > Filesystems *must take locks* in the IO path. We have to serialise
> > > against truncate and other operations at some point in the IO path
> > > (e.g. block mapping vs concurrent allocation and/or removal), and
> > > that can only be done sanely with sleeping locks.  There is no way
> > > of knowing in advance if we are going to block, and so either we
> > > always use threads for IO submission or we accept that occasionally
> > > the AIO submission will block.
> > 
> > I never said we don't take locks.  Still, we can be more intelligent 
> > about when and where we do so.  With the nonblocking pread() and pwrite() 
> > changes being proposed elsewhere, we can do the part of the I/O that 
> > doesn't block in the submitter, which is a huge win when possible.
> > 
> > As it stands today, *every* buffered write takes i_mutex immediately 
> > on entering ->write().  That one issue alone accounts for a nearly 10x 
> > performance difference between an O_SYNC write and an O_DIRECT write, 
> 
> Yes, that locking is for correct behaviour, not for performance
> reasons.  The i_mutex is providing the required semantics for POSIX
> write(2) functionality - writes must serialise against other reads
> and writes so that they are completed atomically w.r.t. other IO.
> i.e. writes to the same offset must not interleave, not should reads
> be able to see partial data from a write in progress.

No, the locks are not *required* for POSIX semantics, they are a legacy
of how Linux filesystem code has been implemented and how we ensure the
necessary internal consistency needed inside our filesystems is
provided.  There are other ways to achieve the required semantics that
do not involve a single giant lock for the entire file/inode.  And no, I
am not saying that doing this is simple or easy to do.

-ben

> Direct IO does not conform to POSIX concurrency standards, so we
> don't have to serialise concurrent IO against each other.
> 
> > and using O_SYNC writes is a legitimate use-case for users who want 
> > caching of data by the kernel (duplicating that functionality is a huge 
> > amount of work for an application, plus if you want the cache to be 
> > persistent between runs of an app, you have to get the kernel to do it).
> 
> Yes, but you take what you get given. Buffered IO sucks in many ways;
> this is just one of them.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com

-- 
"Thought is the essence of where you are now."


Re: [RFC][PATCH -next 2/2] printk: set may_schedule for some of console_trylock callers

2016-01-22 Thread Sergey Senozhatsky
Hello,

On (01/22/16 10:48), Petr Mladek wrote:
[..]
> > and in console_unlock()
> > 
> > -   if (do_cond_resched)
> > -   cond_resched();
> > +   console_conditional_schedule();
> >
> >
> > but for !CONFIG_PREEMPT_COUNT we can't. because of currently held 
> > spin_locks/etc
> > that we don't know about.
> 
> Ah, I was not aware that we did not have information about preemption
> without PREEMPT_COUNT.

yes, for example,

static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
preempt_disable();
spin_acquire(>dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}

where preempt_disable()  include/linux/preempt.h

...
#else /* !CONFIG_PREEMPT_COUNT */

/*
 * Even if we don't have any preemption, we need preempt disable/enable
 * to be barriers, so that we don't have things like get_user/put_user
 * that can cause faults and scheduling migrate into our preempt-protected
 * region.
 */
#define preempt_disable()   barrier()
#define preempt_enable()barrier()


so on !CONFIG_PREEMPT_COUNT kernels we can't rely on console_trylock()
'magic', we need the existing rules.

> > `console_may_schedule' carries a bit of important information for
> > console_conditional_schedule() caller. if it has acquired console_sem
> > via console_lock() - then it can schedule, if via console_trylock() - it 
> > cannot.
> > 
> > the last `if via console_trylock() - it cannot' rule is not always true,
> > we clearly can have printk()->console_unlock() from non-atomic contexts
> > (if we know that its non-atomic, which is not the case with !PREEMPT_COUNT).
> 
> By other words, we could automatically detect save context for
> cond_resched() only if PREEMPT_COUNT is enabled. Otherwise, we need to
> keep the current logic (heuristic). Do I get it correctly, please?

yes, I think so.

> I would personally wait a bit for Jack's async console printing.
> It will call console only if oops_in_progress is set. It means that
> this partial optimization won't be needed at all.

ok, thanks. I'd love to see Jan's printk() rework being merged. I have 99 
problems
with printk() and console_unlock(). People usually are not aware of the secrets 
that
printk-console_unlock have; and tend to think that printk is just 'a kernel 
way' of
spelling printf, with all the consequences that follows -- excessive printk 
usage,
RCU stalls, soft lockups, etc. And that printk abuse does not necessarily hit 
the
abuser. A completely 'innocent' user space application that does a syscall which
involves console_lock-console_unlock, can spend seconds in console_unlock 
pushing
someone's data to console_drivers. console_lock and console_unlock, I think, 
have a
bit misleading naming. _lock has acquire semantics, _unlock, however, does not
simply release the lock. I even think it'd be good to have 
console_unlock_fast(),
that would just up_console_sem() w/o any penalty. So some of console_unlock() 
that
are 'accessible' by user-space /* for example,
  tty_open()
tty_lookup_driver()
  console_device()
console_lock()
console_unlock()

or reading from /proc/consoles, and so on and forth */
could be replaced with console_unlock_fast().

The patch in question is simply a further extension on Tejun's work. And
these two patches already made my life a bit simpler, albeit not all of the
printk/console_unlock problems were addressed.

Jan's patch set is a much more complicated effort, and it may take 2 or
3 (??) kernel releases to finish (there are corner cases: for example,
workers can stall during OOM, etc.), I'd be happy to see it in -next for 4.6,
personally, not sure how realistic this expectation is.

> The other (first) patch still makes sense in the simplified form.

thanks. let's do it this way - I'll keep the preempt disable/enable
removal patch the last in the series, so we can easily drop it (if
Jan's rework is much-much closer). How does that sound?

-ss

> Best Regards,
> Petr
> 


[BUG] Devices breaking due to CONFIG_ZONE_DEVICE

2016-01-22 Thread Sudip Mukherjee
Hi All,
Commit 033fbae988fc ("mm: ZONE_DEVICE for "device memory"") has
introduced CONFIG_ZONE_DEVICE while sacrificing CONFIG_ZONE_DMA.
Distributions like Ubuntu has started enabling CONFIG_ZONE_DEVICE and
thus breaking parallel port. Please have a look at
https://bugzilla.kernel.org/show_bug.cgi?id=110931 for the bug report.

Apart from parallel port I can see some sound drivers will also break.

Now what is the possible solution for this?

Regards
Sudip



Re: [PATCH 07/13] aio: enabled thread based async fsync

2016-01-22 Thread Dave Chinner
On Wed, Jan 20, 2016 at 03:07:26PM -0800, Linus Torvalds wrote:
> On Jan 20, 2016 1:46 PM, "Dave Chinner"  wrote:
> > >
> > > > That said, I also agree that it would be interesting to hear what the
> > > > performance impact is for existing performance-sensitive users. Could
> > > > we make that "aio_may_use_threads()" case be unconditional, making
> > > > things simpler?
> > >
> > > Making it unconditional is a goal, but some work is required before that
> > > can be the case.  The O_DIRECT issue is one such matter -- it requires
> some
> > > changes to the filesystems to ensure that they adhere to the
> non-blocking
> > > nature of the new interface (ie taking i_mutex is a Bad Thing that users
> > > really do not want to be exposed to; if taking it blocks, the code
> should
> > > punt to a helper thread).
> >
> > Filesystems *must take locks* in the IO path.
> 
> I agree.
> 
> I also would prefer to make the aio code have as little interaction and
> magic flags with the filesystem code as humanly possible.
> 
> I wonder if we could make the rough rule be that the only synchronous case
> the aio code ever has is more or less entirely in the generic vfs caches?
> IOW, could we possibly aim to make the rule be that if we call down to the
> filesystem layer, we do that within a thread?

We have to go through the filesystem layer locking even on page
cache hits, and even if we get into the page cache copy-in/copy-out
code we can still get stuck on things like page locks and page
faults. Even if hte pages are cached, we can still get caught on
deeper filesystem locks for block mapping. e.g. read from a hole,
get zeros back, page cache is populated. Write data into range,
fetch page, realise it's unmapped, need to do block/delayed
allocation which requires filesystem locks and potentially
transactions and IO

> We could do things like that for the name loopkup for openat() too, where
> we could handle the successful RCU loopkup synchronously, but then if we
> fall out of RCU mode we'd do the thread.

We'd have to do quite a bit of work to unwind back out to the AIO
layer before we can dispatch the open operation again in a thread,
wouldn't we?

So I'm not convinced that conditional thread dispatch makes sense. I
think the simplest thing to do is make all AIO use threads/
workqueues by default, and if the application is smart enough to
only do things that minimise blocking they can turn off the threaded
dispatch and get the same behaviour they get now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH 07/13] aio: enabled thread based async fsync

2016-01-22 Thread Dave Chinner
On Wed, Jan 20, 2016 at 04:56:30PM -0500, Benjamin LaHaise wrote:
> On Thu, Jan 21, 2016 at 08:45:46AM +1100, Dave Chinner wrote:
> > Filesystems *must take locks* in the IO path. We have to serialise
> > against truncate and other operations at some point in the IO path
> > (e.g. block mapping vs concurrent allocation and/or removal), and
> > that can only be done sanely with sleeping locks.  There is no way
> > of knowing in advance if we are going to block, and so either we
> > always use threads for IO submission or we accept that occasionally
> > the AIO submission will block.
> 
> I never said we don't take locks.  Still, we can be more intelligent 
> about when and where we do so.  With the nonblocking pread() and pwrite() 
> changes being proposed elsewhere, we can do the part of the I/O that 
> doesn't block in the submitter, which is a huge win when possible.
> 
> As it stands today, *every* buffered write takes i_mutex immediately 
> on entering ->write().  That one issue alone accounts for a nearly 10x 
> performance difference between an O_SYNC write and an O_DIRECT write, 

Yes, that locking is for correct behaviour, not for performance
reasons.  The i_mutex is providing the required semantics for POSIX
write(2) functionality - writes must serialise against other reads
and writes so that they are completed atomically w.r.t. other IO.
i.e. writes to the same offset must not interleave, not should reads
be able to see partial data from a write in progress.

Direct IO does not conform to POSIX concurrency standards, so we
don't have to serialise concurrent IO against each other.

> and using O_SYNC writes is a legitimate use-case for users who want 
> caching of data by the kernel (duplicating that functionality is a huge 
> amount of work for an application, plus if you want the cache to be 
> persistent between runs of an app, you have to get the kernel to do it).

Yes, but you take what you get given. Buffered IO sucks in many ways;
this is just one of them.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH RFC 09/15] ARM: dts: sun6i: sina31s: Switch to mmc3 for onboard eMMC

2016-01-22 Thread Chen-Yu Tsai
On Sat, Jan 23, 2016 at 4:39 AM, Maxime Ripard
 wrote:
> Hi,
>
> On Thu, Jan 21, 2016 at 01:26:36PM +0800, Chen-Yu Tsai wrote:
>> According to Allwinner, only mmc3 supports 8 bit DDR transfers for eMMC.
>> Switch to mmc3 for the onboard eMMC, and also assign vqmmc for signal
>> voltage sensing/switching, and "cap-mmc-hw-reset" to denote this
>> instance can use eMMC hardware reset.
>>
>> Signed-off-by: Chen-Yu Tsai 
>> ---
>>  arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi 
>> b/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi
>> index ea69fb8ad4d8..4ec0c8679b2e 100644
>> --- a/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi
>> +++ b/arch/arm/boot/dts/sun6i-a31s-sina31s-core.dtsi
>> @@ -61,12 +61,14 @@
>>  };
>>
>>  /* eMMC on core board */
>> - {
>> + {
>>   pinctrl-names = "default";
>> - pinctrl-0 = <_8bit_emmc_pins>;
>> + pinctrl-0 = <_8bit_emmc_pins>;
>>   vmmc-supply = <_dcdc1>;
>> + vqmmc-supply = <_dcdc1>;
>
> That seems odd. IIRC the VCC was supposed to be fixed and VCCQ could
> be either at 1.8 or 3V. Having the same regulator on both would make
> VCCQ forced to 3.3V, which seems to go against your commit log.
>
> What's the catch ? :)

That is how the board is routed. Which means the only use for
vqmmc-supply is the driver will know that it can only do 3.3V,
i.e. voltage sensing.

It is the reason I requested Olimex to look into this. Allwinner
reference designs all tie vqmmc directly to 3.3V.

Actually with the latest driver patches, this is not even needed. To
make the driver backward compatible, if no vqmmc-supply is given, it
just assumes 3.3V signaling.

Regards
ChenYu


[PATCH] Staging: xgifb: vb_init.c: Coding style warning fix block comment

2016-01-22 Thread YU Bo

This is a patch to the vb_init.c file that fixes up a warning reported
by checkpatch.pl:

WARNING: Block comments use * on subsequent lines

Signed-off-by: YU BO 
---
 drivers/staging/xgifb/vb_init.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/xgifb/vb_init.c b/drivers/staging/xgifb/vb_init.c
index 360dc95..9259247 100644
--- a/drivers/staging/xgifb/vb_init.c
+++ b/drivers/staging/xgifb/vb_init.c
@@ -700,11 +700,11 @@ static void XGINew_CheckChannel(struct xgi_hw_device_info 
*HwDeviceExtension,
break;
case XG42:
/*
-XG42 SR14 D[3] Reserve
-D[2] = 1, Dual Channel
-= 0, Single Channel
-
-It's Different from Other XG40 Series.
+* XG42 SR14 D[3] Reserve
+* D[2] = 1, Dual Channel
+* = 0, Single Channel
+*
+* It's Different from Other XG40 Series.
 */
if (XGINew_CheckFrequence(pVBInfo) == 1) { /* DDRII, DDR2x */
pVBInfo->ram_bus = 32; /* 32 bits */
--
1.7.10.4


--
Best Regards


Re: [PATCH 6/9] perf, tools, stat: Document CSV format in manpage

2016-01-22 Thread Andi Kleen
> [jolsa@krava perf]$ sudo ./perf stat -e cycles,instructions -a -x, 
> 160517940,,cycles,2357448795,100.00,,Ghz,2357448795,100.00
> 
>  ^  what's this data?
> 
> 84822675,,instructions,2357537479,100.00,0.53,insn per cycle
> 
> 
> ,,,2357537479,100.00,,stalled cycles per insn,2357537479,100.00
> 
> this line is probably wrong, as noted in previous email..?

Yes the noise was printed twice.
I fixed it now (by removing the first variant, which is much simpler
in the code)

-Andi


Re: [PATCH v1] block: fix bio splitting on max sectors

2016-01-22 Thread Jens Axboe

On 01/22/2016 05:05 PM, Ming Lei wrote:

After commit e36f62042880(block: split bios to maxpossible length),
bio can be splitted in the middle of a vector entry, then it
is easy to split out one bio which size isn't aligned with block
size, especially when the block size is bigger than 512.

This patch fixes the issue by making the max io size aligned
to logical block size.

Fixes: e36f62042880(block: split bios to maxpossible length)
Reported-by: Stefan Haberland 
Cc: Keith Busch 
Suggested-by: Linus Torvalds 
Signed-off-by: Ming Lei 
---
V1:
- avoid double shift as suggested by Linus
- compute 'max_sectors' once as suggested by Keith


This looks good to me, I'll apply and run a bit of local testing.

--
Jens Axboe



Re: [PATCH 3/9] perf, tools, stat: Move noise/running printing into printout

2016-01-22 Thread Andi Kleen
> > -   if (run == 0 || ena == 0) {
> > -   fprintf(output, "CPU%*d%s%*s%s",
> > -   csv_output ? 0 : -4,
> > -   perf_evsel__cpus(counter)->map[cpu], csv_sep,
> > -   csv_output ? 0 : 18,
> > -   counter->supported ? CNTR_NOT_COUNTED : 
> > CNTR_NOT_SUPPORTED,
> > -   csv_sep);
> 
> this hunk is not preserved in the new code.. I guess the output is
> different for -A if counter wasn't meassure?

The code for this is common in printout() now.

-Andi


[PATCH v3 01/17] Xen: ACPI: Hide UART used by Xen

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

ACPI 6.0 introduces a new table STAO to list the devices which are used
by Xen and can't be used by Dom0. On Xen virtual platforms, the physical
UART is used by Xen. So here it hides UART from Dom0.

Signed-off-by: Shannon Zhao 
---
CC: "Rafael J. Wysocki"  (supporter:ACPI)
CC: Len Brown  (supporter:ACPI)
CC: linux-a...@vger.kernel.org (open list:ACPI)
---
 drivers/acpi/bus.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index a212cef..7f85b54 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -46,6 +46,7 @@ ACPI_MODULE_NAME("bus");
 struct acpi_device *acpi_root;
 struct proc_dir_entry *acpi_root_dir;
 EXPORT_SYMBOL(acpi_root_dir);
+static u64 spcr_uart_addr;
 
 #ifdef CONFIG_X86
 #ifdef CONFIG_ACPI_CUSTOM_DSDT
@@ -105,6 +106,22 @@ acpi_status acpi_bus_get_status_handle(acpi_handle handle,
return status;
 }
 
+static bool acpi_check_device_is_ignored(acpi_handle handle)
+{
+   acpi_status status;
+   u64 addr;
+
+   /* Check if it should ignore the UART device */
+   if (spcr_uart_addr != 0) {
+   status = acpi_evaluate_integer(handle, METHOD_NAME__ADR, NULL,
+  );
+   if (ACPI_SUCCESS(status) && addr == spcr_uart_addr)
+   return true;
+   }
+
+   return false;
+}
+
 int acpi_bus_get_status(struct acpi_device *device)
 {
acpi_status status;
@@ -114,7 +131,8 @@ int acpi_bus_get_status(struct acpi_device *device)
if (ACPI_FAILURE(status))
return -ENODEV;
 
-   acpi_set_device_status(device, sta);
+   if (!acpi_check_device_is_ignored(device->handle))
+   acpi_set_device_status(device, sta);
 
if (device->status.functional && !device->status.present) {
ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Device [%s] status [%08x]: "
@@ -1069,6 +1087,8 @@ EXPORT_SYMBOL_GPL(acpi_kobj);
 static int __init acpi_init(void)
 {
int result;
+   acpi_status status;
+   struct acpi_table_stao *stao_ptr;
 
if (acpi_disabled) {
printk(KERN_INFO PREFIX "Interpreter disabled.\n");
@@ -1081,6 +1101,20 @@ static int __init acpi_init(void)
acpi_kobj = NULL;
}
 
+   /* If there is STAO table, check whether it needs to ignore the UART
+* device in SPCR table.
+*/
+   status = acpi_get_table(ACPI_SIG_STAO, 0,
+   (struct acpi_table_header **)_ptr);
+   if (ACPI_SUCCESS(status) && stao_ptr->ignore_uart) {
+   struct acpi_table_spcr *spcr_ptr;
+
+   status = acpi_get_table(ACPI_SIG_SPCR, 0,
+   (struct acpi_table_header **)_ptr);
+   if (ACPI_SUCCESS(status))
+   spcr_uart_addr = spcr_ptr->serial_port.address;
+   }
+
init_acpi_device_notify();
result = acpi_bus_init();
if (result) {
-- 
2.0.4




[PATCH v3 06/17] Xen: ARM: Add support for mapping platform device mmio

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Add a bus_notifier for platform bus device in order to map the device
mmio regions when DOM0 booting with ACPI.

Signed-off-by: Shannon Zhao 
Acked-by: Stefano Stabellini 
---
 drivers/xen/Makefile |   1 +
 drivers/xen/arm-device.c | 141 +++
 2 files changed, 142 insertions(+)
 create mode 100644 drivers/xen/arm-device.c

diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index aa8a7f7..0c087b1 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -9,6 +9,7 @@ CFLAGS_features.o   := $(nostackp)
 
 CFLAGS_efi.o   += -fshort-wchar
 
+dom0-$(CONFIG_ARM64) += arm-device.o
 dom0-$(CONFIG_PCI) += pci.o
 dom0-$(CONFIG_USB_SUPPORT) += dbgp.o
 dom0-$(CONFIG_XEN_ACPI) += acpi.o $(xen-pad-y)
diff --git a/drivers/xen/arm-device.c b/drivers/xen/arm-device.c
new file mode 100644
index 000..76e26e5
--- /dev/null
+++ b/drivers/xen/arm-device.c
@@ -0,0 +1,141 @@
+/*
+ * Copyright (c) 2015, Linaro Limited, Shannon Zhao
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int xen_unmap_device_mmio(struct resource *resource, unsigned int count)
+{
+   unsigned int i, j, nr;
+   int rc = 0;
+   struct resource *r;
+   struct xen_remove_from_physmap xrp;
+
+   for (i = 0; i < count; i++) {
+   r = [i];
+   nr = DIV_ROUND_UP(resource_size(r), XEN_PAGE_SIZE);
+   if ((resource_type(r) != IORESOURCE_MEM) || (nr == 0))
+   continue;
+
+   for (j = 0; j < nr; j++) {
+   xrp.domid = DOMID_SELF;
+   xrp.gpfn = XEN_PFN_DOWN(r->start) + j;
+   rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap,
+ );
+   if (rc)
+   return rc;
+   }
+   }
+
+   return rc;
+}
+
+static int xen_map_device_mmio(struct resource *resource, unsigned int count)
+{
+   unsigned int i, j, nr;
+   int rc = 0;
+   struct resource *r;
+   xen_pfn_t *gpfns;
+   xen_ulong_t *idxs;
+   int *errs;
+   struct xen_add_to_physmap_range xatp;
+
+   for (i = 0; i < count; i++) {
+   r = [i];
+   nr = DIV_ROUND_UP(resource_size(r), XEN_PAGE_SIZE);
+   if ((resource_type(r) != IORESOURCE_MEM) || (nr == 0))
+   continue;
+
+   gpfns = kzalloc(sizeof(xen_pfn_t) * nr, GFP_KERNEL);
+   idxs = kzalloc(sizeof(xen_ulong_t) * nr, GFP_KERNEL);
+   errs = kzalloc(sizeof(int) * nr, GFP_KERNEL);
+   if (!gpfns || !idxs || !errs) {
+   kfree(gpfns);
+   kfree(idxs);
+   kfree(errs);
+   return -ENOMEM;
+   }
+
+   for (j = 0; j < nr; j++) {
+   gpfns[j] = XEN_PFN_DOWN(r->start) + j;
+   idxs[j] = XEN_PFN_DOWN(r->start) + j;
+   }
+
+   xatp.domid = DOMID_SELF;
+   xatp.size = nr;
+   xatp.space = XENMAPSPACE_dev_mmio;
+
+   set_xen_guest_handle(xatp.gpfns, gpfns);
+   set_xen_guest_handle(xatp.idxs, idxs);
+   set_xen_guest_handle(xatp.errs, errs);
+
+   rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, );
+   kfree(gpfns);
+   kfree(idxs);
+   kfree(errs);
+   if (rc)
+   return rc;
+   }
+
+   return rc;
+}
+
+static int xen_platform_notifier(struct notifier_block *nb,
+unsigned long action, void *data)
+{
+   struct platform_device *pdev = to_platform_device(data);
+   int r = 0;
+
+   if (pdev->num_resources == 0 || pdev->resource == NULL)
+   return NOTIFY_OK;
+
+   switch (action) {
+   case BUS_NOTIFY_ADD_DEVICE:
+   r = xen_map_device_mmio(pdev->resource, pdev->num_resources);
+   break;
+   case BUS_NOTIFY_DEL_DEVICE:
+   r = xen_unmap_device_mmio(pdev->resource, pdev->num_resources);
+   break;
+   default:
+   return NOTIFY_DONE;
+   }
+   if (r)
+   

[PATCH v3 07/17] Xen: ARM: Add support for mapping AMBA device mmio

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Add a bus_notifier for AMBA bus device in order to map the device
mmio regions when DOM0 booting with ACPI.

Signed-off-by: Shannon Zhao 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/arm-device.c | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/drivers/xen/arm-device.c b/drivers/xen/arm-device.c
index 76e26e5..3854043 100644
--- a/drivers/xen/arm-device.c
+++ b/drivers/xen/arm-device.c
@@ -139,3 +139,46 @@ static int __init register_xen_platform_notifier(void)
 }
 
 arch_initcall(register_xen_platform_notifier);
+
+#ifdef CONFIG_ARM_AMBA
+#include 
+
+static int xen_amba_notifier(struct notifier_block *nb,
+unsigned long action, void *data)
+{
+   struct amba_device *adev = to_amba_device(data);
+   int r = 0;
+
+   switch (action) {
+   case BUS_NOTIFY_ADD_DEVICE:
+   r = xen_map_device_mmio(>res, 1);
+   break;
+   case BUS_NOTIFY_DEL_DEVICE:
+   r = xen_unmap_device_mmio(>res, 1);
+   break;
+   default:
+   return NOTIFY_DONE;
+   }
+   if (r)
+   dev_err(>dev, "AMBA: Failed to %s device %s MMIO!\n",
+   action == BUS_NOTIFY_ADD_DEVICE ? "map" :
+   (action == BUS_NOTIFY_DEL_DEVICE ? "unmap" : "?"),
+   adev->dev.init_name);
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block amba_device_nb = {
+   .notifier_call = xen_amba_notifier,
+};
+
+static int __init register_xen_amba_notifier(void)
+{
+   if (!xen_initial_domain() || acpi_disabled)
+   return 0;
+
+   return bus_register_notifier(_bustype, _device_nb);
+}
+
+arch_initcall(register_xen_amba_notifier);
+#endif
-- 
2.0.4




[PATCH v3 02/17] xen/grant-table: Move xlated_setup_gnttab_pages to common place

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Move xlated_setup_gnttab_pages to common place, so it can be reused by
ARM to setup grant table.

Rename it to xen_xlate_map_ballooned_pages.

Signed-off-by: Shannon Zhao 
Reviewed-by: Stefano Stabellini 
---
 arch/x86/xen/grant-table.c | 57 +--
 drivers/xen/xlate_mmu.c| 61 ++
 include/xen/xen-ops.h  |  2 ++
 3 files changed, 69 insertions(+), 51 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index e079500..de4144c 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -111,63 +111,18 @@ int arch_gnttab_init(unsigned long nr_shared)
 }
 
 #ifdef CONFIG_XEN_PVH
-#include 
 #include 
-#include 
-static int __init xlated_setup_gnttab_pages(void)
-{
-   struct page **pages;
-   xen_pfn_t *pfns;
-   void *vaddr;
-   int rc;
-   unsigned int i;
-   unsigned long nr_grant_frames = gnttab_max_grant_frames();
-
-   BUG_ON(nr_grant_frames == 0);
-   pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
-   if (!pages)
-   return -ENOMEM;
-
-   pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
-   if (!pfns) {
-   kfree(pages);
-   return -ENOMEM;
-   }
-   rc = alloc_xenballooned_pages(nr_grant_frames, pages);
-   if (rc) {
-   pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
-   nr_grant_frames, rc);
-   kfree(pages);
-   kfree(pfns);
-   return rc;
-   }
-   for (i = 0; i < nr_grant_frames; i++)
-   pfns[i] = page_to_pfn(pages[i]);
-
-   vaddr = vmap(pages, nr_grant_frames, 0, PAGE_KERNEL);
-   if (!vaddr) {
-   pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
-   nr_grant_frames, rc);
-   free_xenballooned_pages(nr_grant_frames, pages);
-   kfree(pages);
-   kfree(pfns);
-   return -ENOMEM;
-   }
-   kfree(pages);
-
-   xen_auto_xlat_grant_frames.pfn = pfns;
-   xen_auto_xlat_grant_frames.count = nr_grant_frames;
-   xen_auto_xlat_grant_frames.vaddr = vaddr;
-
-   return 0;
-}
-
+#include 
 static int __init xen_pvh_gnttab_setup(void)
 {
if (!xen_pvh_domain())
return -ENODEV;
 
-   return xlated_setup_gnttab_pages();
+   xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
+
+   return xen_xlate_map_ballooned_pages(_auto_xlat_grant_frames.pfn,
+_auto_xlat_grant_frames.vaddr,
+xen_auto_xlat_grant_frames.count);
 }
 /* Call it _before_ __gnttab_init as we need to initialize the
  * xen_auto_xlat_grant_frames first. */
diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index 5063c5e..9692656 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -29,6 +29,8 @@
  */
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -37,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
 
@@ -185,3 +188,61 @@ int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
return 0;
 }
 EXPORT_SYMBOL_GPL(xen_xlate_unmap_gfn_range);
+
+/**
+ * xen_xlate_map_ballooned_pages - map a new set of ballooned pages
+ * @gfns: returns the array of corresponding GFNs
+ * @virt: returns the virtual address of the mapped region
+ * @nr_grant_frames: number of GFNs
+ * @return 0 on success, error otherwise
+ *
+ * This allocates a set of ballooned pages and maps them into the
+ * kernel's address space.
+ */
+int __init xen_xlate_map_ballooned_pages(xen_pfn_t **gfns, void **virt,
+unsigned long nr_grant_frames)
+{
+   struct page **pages;
+   xen_pfn_t *pfns;
+   void *vaddr;
+   int rc;
+   unsigned int i;
+
+   BUG_ON(nr_grant_frames == 0);
+   pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
+   if (!pages)
+   return -ENOMEM;
+
+   pfns = kcalloc(nr_grant_frames, sizeof(pfns[0]), GFP_KERNEL);
+   if (!pfns) {
+   kfree(pages);
+   return -ENOMEM;
+   }
+   rc = alloc_xenballooned_pages(nr_grant_frames, pages);
+   if (rc) {
+   pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
+   nr_grant_frames, rc);
+   kfree(pages);
+   kfree(pfns);
+   return rc;
+   }
+   for (i = 0; i < nr_grant_frames; i++)
+   pfns[i] = page_to_pfn(pages[i]);
+
+   vaddr = vmap(pages, nr_grant_frames, 0, PAGE_KERNEL);
+   if (!vaddr) {
+   pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
+   nr_grant_frames, rc);
+   

[PATCH v3 10/17] arm/xen: Get event-channel irq through HVM_PARAM when booting with ACPI

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

When booting with ACPI, it could get the event-channel irq through
HVM_PARAM_CALLBACK_IRQ.

Signed-off-by: Shannon Zhao 
---
 arch/arm/xen/enlighten.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 6d90a62..0e010bb 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -262,6 +263,35 @@ void __init xen_early_init(void)
add_preferred_console("hvc", 0, NULL);
 }
 
+static void __init xen_acpi_guest_init_events_irq(void)
+{
+#ifdef CONFIG_ACPI
+   struct xen_hvm_param a;
+   int interrupt, trigger, polarity;
+
+   a.domid = DOMID_SELF;
+   a.index = HVM_PARAM_CALLBACK_IRQ;
+   xen_events_irq = 0;
+
+   if (!HYPERVISOR_hvm_op(HVMOP_get_param, )) {
+   if ((a.value >> 56) == HVM_PARAM_CALLBACK_TYPE_EVENT) {
+   interrupt = a.value & 0xff;
+   trigger = ((a.value >> 8) & 0x1) ? ACPI_EDGE_SENSITIVE
+: ACPI_LEVEL_SENSITIVE;
+   polarity = ((a.value >> 8) & 0x2) ? ACPI_ACTIVE_LOW
+ : ACPI_ACTIVE_HIGH;
+   xen_events_irq = acpi_register_gsi(NULL, interrupt,
+  trigger, polarity);
+   }
+   }
+#endif
+}
+
+static void __init xen_dt_guest_init_events_irq(void)
+{
+   xen_events_irq = irq_of_parse_and_map(xen_node, 0);
+}
+
 static int __init xen_guest_init(void)
 {
struct xen_add_to_physmap xatp;
@@ -270,7 +300,11 @@ static int __init xen_guest_init(void)
if (!xen_domain())
return 0;
 
-   xen_events_irq = irq_of_parse_and_map(xen_node, 0);
+   if (!acpi_disabled)
+   xen_acpi_guest_init_events_irq();
+   else
+   xen_dt_guest_init_events_irq();
+
if (!xen_events_irq) {
pr_err("Xen event channel interrupt not found\n");
return -ENODEV;
-- 
2.0.4




[PATCH v3 08/17] Xen: public/hvm: sync changes of HVM_PARAM_CALLBACK_VIA ABI from Xen

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Sync the changes of HVM_PARAM_CALLBACK_VIA ABI introduced by
Xen commit  (public/hvm: export the HVM_PARAM_CALLBACK_VIA
ABI in the API).

Signed-off-by: Shannon Zhao 
---
 include/xen/interface/hvm/params.h | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/include/xen/interface/hvm/params.h 
b/include/xen/interface/hvm/params.h
index a6c7991..70ad208 100644
--- a/include/xen/interface/hvm/params.h
+++ b/include/xen/interface/hvm/params.h
@@ -27,16 +27,31 @@
  * Parameter space for HVMOP_{set,get}_param.
  */
 
+#define HVM_PARAM_CALLBACK_IRQ 0
 /*
  * How should CPU0 event-channel notifications be delivered?
- * val[63:56] == 0: val[55:0] is a delivery GSI (Global System Interrupt).
- * val[63:56] == 1: val[55:0] is a delivery PCI INTx line, as follows:
- *  Domain = val[47:32], Bus  = val[31:16],
- *  DevFn  = val[15: 8], IntX = val[ 1: 0]
- * val[63:56] == 2: val[7:0] is a vector number.
+ *
  * If val == 0 then CPU0 event-channel notifications are not delivered.
+ * If val != 0, val[63:56] encodes the type, as follows:
+ */
+
+#define HVM_PARAM_CALLBACK_TYPE_GSI  0
+/*
+ * val[55:0] is a delivery GSI.  GSI 0 cannot be used, as it aliases val == 0,
+ * and disables all notifications.
+ */
+
+#define HVM_PARAM_CALLBACK_TYPE_PCI_INTX 1
+/*
+ * val[55:0] is a delivery PCI INTx line:
+ * Domain = val[47:32], Bus = val[31:16] DevFn = val[15:8], IntX = val[1:0]
+ */
+
+#define HVM_PARAM_CALLBACK_TYPE_VECTOR   2
+/*
+ * val[7:0] is a vector number.  Check for XENFEAT_hvm_callback_vector to know
+ * if this delivery method is available.
  */
-#define HVM_PARAM_CALLBACK_IRQ 0
 
 #define HVM_PARAM_STORE_PFN1
 #define HVM_PARAM_STORE_EVTCHN 2
-- 
2.0.4




[PATCH v3 16/17] FDT: Add a helper to get specified name subnode

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Sometimes it needs to check if there is a node in FDT by full path.
Introduce this helper to get the specified name subnode if it exists.

Signed-off-by: Shannon Zhao 
---
CC: Rob Herring 
---
 drivers/of/fdt.c   | 35 +++
 include/linux/of_fdt.h |  2 ++
 2 files changed, 37 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 655f79d..112ec16 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -645,6 +645,41 @@ int __init of_scan_flat_dt(int (*it)(unsigned long node,
 }
 
 /**
+ * of_get_flat_dt_subnode_by_name - get subnode of specified node by name
+ *
+ * @node: the parent node
+ * @uname: the name of subnode
+ * @return offset of the subnode, or -FDT_ERR_NOTFOUND if there is none
+ */
+
+int of_get_flat_dt_subnode_by_name(unsigned long node, const char *uname)
+{
+   const void *blob = initial_boot_params;
+   int offset;
+   const char *pathp;
+
+   /* Find first subnode if it exists */
+   offset = fdt_first_subnode(blob, node);
+   if (offset < 0)
+   return -FDT_ERR_NOTFOUND;
+   pathp = fdt_get_name(blob, offset, NULL);
+   if (strncmp(pathp, uname, strlen(uname)) == 0)
+   return offset;
+
+   /* Find other subnodes */
+   do {
+   offset = fdt_next_subnode(blob, offset);
+   if (offset < 0)
+   return -FDT_ERR_NOTFOUND;
+   pathp = fdt_get_name(blob, offset, NULL);
+   if (strncmp(pathp, uname, strlen(uname)) == 0)
+   return offset;
+   } while (offset >= 0);
+
+   return -FDT_ERR_NOTFOUND;
+}
+
+/**
  * of_get_flat_dt_root - find the root node in the flat blob
  */
 unsigned long __init of_get_flat_dt_root(void)
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index df9ef38..fc28162 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -52,6 +52,8 @@ extern char __dtb_end[];
 extern int of_scan_flat_dt(int (*it)(unsigned long node, const char *uname,
 int depth, void *data),
   void *data);
+extern int of_get_flat_dt_subnode_by_name(unsigned long node,
+ const char *uname);
 extern const void *of_get_flat_dt_prop(unsigned long node, const char *name,
   int *size);
 extern int of_flat_dt_is_compatible(unsigned long node, const char *name);
-- 
2.0.4




[PATCH v3 14/17] XEN: EFI: Move x86 specific codes to architecture directory

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Move x86 specific codes to architecture directory and export those EFI
runtime service functions. This will be useful for initializing runtime
service on ARM later.

Signed-off-by: Shannon Zhao 
---
 arch/x86/xen/efi.c| 112 
 drivers/xen/efi.c | 174 ++
 include/xen/xen-ops.h |  30 ++---
 3 files changed, 168 insertions(+), 148 deletions(-)

diff --git a/arch/x86/xen/efi.c b/arch/x86/xen/efi.c
index be14cc3..86527f1 100644
--- a/arch/x86/xen/efi.c
+++ b/arch/x86/xen/efi.c
@@ -20,10 +20,122 @@
 #include 
 #include 
 
+#include 
 #include 
+#include 
 
 #include 
 #include 
+#include 
+
+static efi_char16_t vendor[100] __initdata;
+
+static efi_system_table_t efi_systab_xen __initdata = {
+   .hdr = {
+   .signature  = EFI_SYSTEM_TABLE_SIGNATURE,
+   .revision   = 0, /* Initialized later. */
+   .headersize = 0, /* Ignored by Linux Kernel. */
+   .crc32  = 0, /* Ignored by Linux Kernel. */
+   .reserved   = 0
+   },
+   .fw_vendor  = EFI_INVALID_TABLE_ADDR, /* Initialized later. */
+   .fw_revision= 0,  /* Initialized later. */
+   .con_in_handle  = EFI_INVALID_TABLE_ADDR, /* Not used under Xen. */
+   .con_in = EFI_INVALID_TABLE_ADDR, /* Not used under Xen. */
+   .con_out_handle = EFI_INVALID_TABLE_ADDR, /* Not used under Xen. */
+   .con_out= EFI_INVALID_TABLE_ADDR, /* Not used under Xen. */
+   .stderr_handle  = EFI_INVALID_TABLE_ADDR, /* Not used under Xen. */
+   .stderr = EFI_INVALID_TABLE_ADDR, /* Not used under Xen. */
+   .runtime= (efi_runtime_services_t *)EFI_INVALID_TABLE_ADDR,
+ /* Not used under Xen. */
+   .boottime   = (efi_boot_services_t *)EFI_INVALID_TABLE_ADDR,
+ /* Not used under Xen. */
+   .nr_tables  = 0,  /* Initialized later. */
+   .tables = EFI_INVALID_TABLE_ADDR  /* Initialized later. */
+};
+
+static const struct efi efi_xen __initconst = {
+   .systab   = NULL, /* Initialized later. */
+   .runtime_version  = 0,/* Initialized later. */
+   .mps  = EFI_INVALID_TABLE_ADDR,
+   .acpi = EFI_INVALID_TABLE_ADDR,
+   .acpi20   = EFI_INVALID_TABLE_ADDR,
+   .smbios   = EFI_INVALID_TABLE_ADDR,
+   .smbios3  = EFI_INVALID_TABLE_ADDR,
+   .sal_systab   = EFI_INVALID_TABLE_ADDR,
+   .boot_info= EFI_INVALID_TABLE_ADDR,
+   .hcdp = EFI_INVALID_TABLE_ADDR,
+   .uga  = EFI_INVALID_TABLE_ADDR,
+   .uv_systab= EFI_INVALID_TABLE_ADDR,
+   .fw_vendor= EFI_INVALID_TABLE_ADDR,
+   .runtime  = EFI_INVALID_TABLE_ADDR,
+   .config_table = EFI_INVALID_TABLE_ADDR,
+   .get_time = xen_efi_get_time,
+   .set_time = xen_efi_set_time,
+   .get_wakeup_time  = xen_efi_get_wakeup_time,
+   .set_wakeup_time  = xen_efi_set_wakeup_time,
+   .get_variable = xen_efi_get_variable,
+   .get_next_variable= xen_efi_get_next_variable,
+   .set_variable = xen_efi_set_variable,
+   .query_variable_info  = xen_efi_query_variable_info,
+   .update_capsule   = xen_efi_update_capsule,
+   .query_capsule_caps   = xen_efi_query_capsule_caps,
+   .get_next_high_mono_count = xen_efi_get_next_high_mono_count,
+   .reset_system = NULL, /* Functionality provided by Xen. */
+   .set_virtual_address_map  = NULL, /* Not used under Xen. */
+   .memmap   = NULL, /* Not used under Xen. */
+   .flags= 0 /* Initialized later. */
+};
+
+static efi_system_table_t __init *xen_efi_probe(void)
+{
+   struct xen_platform_op op = {
+   .cmd = XENPF_firmware_info,
+   .u.firmware_info = {
+   .type = XEN_FW_EFI_INFO,
+   .index = XEN_FW_EFI_CONFIG_TABLE
+   }
+   };
+   union xenpf_efi_info *info = _info.u.efi_info;
+
+   if (!xen_initial_domain() || HYPERVISOR_platform_op() < 0)
+   return NULL;
+
+   /* Here we know that Xen runs on EFI platform. */
+
+   efi = efi_xen;
+
+   efi_systab_xen.tables = info->cfg.addr;
+   efi_systab_xen.nr_tables = info->cfg.nent;
+
+   op.cmd = XENPF_firmware_info;
+   op.u.firmware_info.type = XEN_FW_EFI_INFO;
+   op.u.firmware_info.index = XEN_FW_EFI_VENDOR;
+   info->vendor.bufsz = sizeof(vendor);
+   

[PATCH v3 15/17] ARM64: XEN: Add a function to initialize Xen specific UEFI runtime services

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

When running on Xen hypervisor, runtime services are supported through
hypercall. Add a Xen specific function to initialize runtime services.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/xen/xen-ops.h |  6 ++
 arch/arm64/xen/Makefile  |  1 +
 arch/arm64/xen/efi.c | 40 
 drivers/xen/Kconfig  |  2 +-
 4 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/xen/xen-ops.h
 create mode 100644 arch/arm64/xen/efi.c

diff --git a/arch/arm64/include/asm/xen/xen-ops.h 
b/arch/arm64/include/asm/xen/xen-ops.h
new file mode 100644
index 000..ec154e7
--- /dev/null
+++ b/arch/arm64/include/asm/xen/xen-ops.h
@@ -0,0 +1,6 @@
+#ifndef _ASM_XEN_OPS_H
+#define _ASM_XEN_OPS_H
+
+void xen_efi_runtime_setup(void);
+
+#endif /* _ASM_XEN_OPS_H */
diff --git a/arch/arm64/xen/Makefile b/arch/arm64/xen/Makefile
index 74a8d87..62e6fe2 100644
--- a/arch/arm64/xen/Makefile
+++ b/arch/arm64/xen/Makefile
@@ -1,2 +1,3 @@
 xen-arm-y  += $(addprefix ../../arm/xen/, enlighten.o grant-table.o p2m.o 
mm.o)
 obj-y  := xen-arm.o hypercall.o
+obj-$(CONFIG_XEN_EFI) += efi.o
diff --git a/arch/arm64/xen/efi.c b/arch/arm64/xen/efi.c
new file mode 100644
index 000..b9cae65
--- /dev/null
+++ b/arch/arm64/xen/efi.c
@@ -0,0 +1,40 @@
+/*
+ * Copyright (c) 2015, Linaro Limited, Shannon Zhao
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+
+/* Set XEN EFI runtime services function pointers. Other fileds of struct efi,
+ * e.g. efi.systab, will be set like normal EFI.
+ */
+void __init xen_efi_runtime_setup(void)
+{
+   efi.get_time = xen_efi_get_time;
+   efi.set_time = xen_efi_set_time;
+   efi.get_wakeup_time  = xen_efi_get_wakeup_time;
+   efi.set_wakeup_time  = xen_efi_set_wakeup_time;
+   efi.get_variable = xen_efi_get_variable;
+   efi.get_next_variable= xen_efi_get_next_variable;
+   efi.set_variable = xen_efi_set_variable;
+   efi.query_variable_info  = xen_efi_query_variable_info;
+   efi.update_capsule   = xen_efi_update_capsule;
+   efi.query_capsule_caps   = xen_efi_query_capsule_caps;
+   efi.get_next_high_mono_count = xen_efi_get_next_high_mono_count;
+   efi.reset_system = NULL; /* Functionality provided by Xen. 
*/
+}
+EXPORT_SYMBOL_GPL(xen_efi_runtime_setup);
diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 73708ac..27d216a 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -268,7 +268,7 @@ config XEN_HAVE_PVMMU
 
 config XEN_EFI
def_bool y
-   depends on X86_64 && EFI
+   depends on (ARM64 || X86_64) && EFI
 
 config XEN_AUTO_XLATE
def_bool y
-- 
2.0.4




[PATCH v3 17/17] Xen: EFI: Parse DT parameters for Xen specific UEFI

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Add a new function to parse DT parameters for Xen specific UEFI just
like the way for normal UEFI. Then it could reuse the existing codes.

If Xen supports EFI, initialize runtime services.

Signed-off-by: Shannon Zhao 
---
CC: Matt Fleming 
---
 arch/arm/xen/enlighten.c   |  6 ++
 arch/arm64/kernel/efi.c| 17 -
 drivers/firmware/efi/efi.c | 45 ++---
 3 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index cdc0bd2..608d735 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -245,6 +245,12 @@ static int __init fdt_find_hyper_node(unsigned long node, 
const char *uname,
!strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix)))
hyper_node.version = s + strlen(hyper_node.prefix);
 
+   if (IS_ENABLED(CONFIG_XEN_EFI)) {
+   /* Check if Xen supports EFI */
+   if (of_get_flat_dt_subnode_by_name(node, "uefi") > 0)
+   set_bit(EFI_PARAVIRT, );
+   }
+
return 0;
 }
 
diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 4eeb171..3c46129 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct efi_memory_map memmap;
 
@@ -308,13 +309,19 @@ static int __init arm64_enable_runtime_services(void)
}
set_bit(EFI_SYSTEM_TABLES, );
 
-   if (!efi_virtmap_init()) {
-   pr_err("No UEFI virtual mapping was installed -- runtime 
services will not be available\n");
-   return -ENOMEM;
+   if (IS_ENABLED(CONFIG_XEN_EFI) && efi_enabled(EFI_PARAVIRT)) {
+   /* Set up runtime services function pointers for Xen Dom0 */
+   xen_efi_runtime_setup();
+   } else {
+   if (!efi_virtmap_init()) {
+   pr_err("No UEFI virtual mapping was installed -- 
runtime services will not be available\n");
+   return -ENOMEM;
+   }
+
+   /* Set up runtime services function pointers */
+   efi_native_runtime_setup();
}
 
-   /* Set up runtime services function pointers */
-   efi_native_runtime_setup();
set_bit(EFI_RUNTIME_SERVICES, );
 
efi.runtime_version = efi.systab->hdr.revision;
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 027ca21..bdcf6d7 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -498,12 +498,14 @@ device_initcall(efi_load_efivars);
FIELD_SIZEOF(struct efi_fdt_params, field) \
}
 
-static __initdata struct {
+struct params {
const char name[32];
const char propname[32];
int offset;
int size;
-} dt_params[] = {
+};
+
+static struct params fdt_params[] __initdata = {
UEFI_PARAM("System Table", "linux,uefi-system-table", system_table),
UEFI_PARAM("MemMap Address", "linux,uefi-mmap-start", mmap),
UEFI_PARAM("MemMap Size", "linux,uefi-mmap-size", mmap_size),
@@ -511,24 +513,45 @@ static __initdata struct {
UEFI_PARAM("MemMap Desc. Version", "linux,uefi-mmap-desc-ver", desc_ver)
 };
 
+static struct params xen_fdt_params[] __initdata = {
+   UEFI_PARAM("System Table", "xen,uefi-system-table", system_table),
+   UEFI_PARAM("MemMap Address", "xen,uefi-mmap-start", mmap),
+   UEFI_PARAM("MemMap Size", "xen,uefi-mmap-size", mmap_size),
+   UEFI_PARAM("MemMap Desc. Size", "xen,uefi-mmap-desc-size", desc_size),
+   UEFI_PARAM("MemMap Desc. Version", "xen,uefi-mmap-desc-ver", desc_ver)
+};
+
 struct param_info {
int found;
void *params;
+   struct params *dt_params;
+   int size;
 };
 
 static int __init fdt_find_uefi_params(unsigned long node, const char *uname,
   int depth, void *data)
 {
struct param_info *info = data;
+   struct params *dt_params = info->dt_params;
const void *prop;
void *dest;
u64 val;
-   int i, len;
+   int i, len, offset;
 
-   if (depth != 1 || strcmp(uname, "chosen") != 0)
-   return 0;
+   if (efi_enabled(EFI_PARAVIRT)) {
+   if (depth != 1 || strcmp(uname, "hypervisor") != 0)
+   return 0;
 
-   for (i = 0; i < ARRAY_SIZE(dt_params); i++) {
+   offset = of_get_flat_dt_subnode_by_name(node, "uefi");
+   if (offset < 0)
+   return 0;
+   node = offset;
+   } else {
+   if (depth != 1 || strcmp(uname, "chosen") != 0)
+   return 0;
+   }
+
+   for (i = 0; i < info->size; i++) {
prop = of_get_flat_dt_prop(node, dt_params[i].propname, );
if (!prop)
return 0;
@@ -559,12 +582,20 @@ int __init 

[PATCH v3 05/17] xen: memory : Add new XENMAPSPACE type XENMAPSPACE_dev_mmio

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Add a new type of Xen map space for Dom0 to map device's MMIO region.

Signed-off-by: Shannon Zhao 
---
 include/xen/interface/memory.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/xen/interface/memory.h b/include/xen/interface/memory.h
index 2ecfe4f..9aa8988 100644
--- a/include/xen/interface/memory.h
+++ b/include/xen/interface/memory.h
@@ -160,6 +160,7 @@ DEFINE_GUEST_HANDLE_STRUCT(xen_machphys_mapping_t);
 #define XENMAPSPACE_gmfn_foreign 4 /* GMFN from another dom,
* XENMEM_add_to_physmap_range only.
*/
+#define XENMAPSPACE_dev_mmio 5 /* device mmio region */
 
 /*
  * Sets the GPFN at which a particular page appears in the specified guest's
-- 
2.0.4




[PATCH v3 04/17] arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Use xen_xlate_map_ballooned_pages to setup grant table. Then it doesn't
rely on DT or ACPI to pass the start address and size of grant table.

Signed-off-by: Shannon Zhao 
Acked-by: Stefano Stabellini 
---
 arch/arm/xen/enlighten.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index afe6175..6d90a62 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -266,18 +266,10 @@ static int __init xen_guest_init(void)
 {
struct xen_add_to_physmap xatp;
struct shared_info *shared_info_page = NULL;
-   struct resource res;
-   phys_addr_t grant_frames;
 
if (!xen_domain())
return 0;
 
-   if (of_address_to_resource(xen_node, GRANT_TABLE_PHYSADDR, )) {
-   pr_err("Xen grant table base address not found\n");
-   return -ENODEV;
-   }
-   grant_frames = res.start;
-
xen_events_irq = irq_of_parse_and_map(xen_node, 0);
if (!xen_events_irq) {
pr_err("Xen event channel interrupt not found\n");
@@ -312,7 +304,10 @@ static int __init xen_guest_init(void)
if (xen_vcpu_info == NULL)
return -ENOMEM;
 
-   if (gnttab_setup_auto_xlat_frames(grant_frames)) {
+   xen_auto_xlat_grant_frames.count = gnttab_max_grant_frames();
+   if (xen_xlate_map_ballooned_pages(_auto_xlat_grant_frames.pfn,
+ _auto_xlat_grant_frames.vaddr,
+ xen_auto_xlat_grant_frames.count)) {
free_percpu(xen_vcpu_info);
return -ENOMEM;
}
-- 
2.0.4




[PATCH v3 11/17] ARM: XEN: Move xen_early_init() before efi_init()

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Move xen_early_init() before efi_init(), then when calling efi_init()
could initialize Xen specific UEFI.

Check if it runs on Xen hypervisor through the flat dts.

Signed-off-by: Shannon Zhao 
---
 arch/arm/xen/enlighten.c  | 56 ++-
 arch/arm64/kernel/setup.c |  2 +-
 2 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 0e010bb..cdc0bd2 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -52,8 +53,6 @@ struct xen_memory_region 
xen_extra_mem[XEN_EXTRA_MEM_MAX_REGIONS] __initdata;
 
 static __read_mostly unsigned int xen_events_irq;
 
-static __initdata struct device_node *xen_node;
-
 int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   unsigned long addr,
   xen_pfn_t *gfn, int nr,
@@ -222,6 +221,33 @@ static irqreturn_t xen_arm_callback(int irq, void *arg)
return IRQ_HANDLED;
 }
 
+static __initdata struct {
+   const char *compat;
+   const char *prefix;
+   const char *version;
+   bool found;
+} hyper_node = {"xen,xen", "xen,xen-", NULL, false};
+
+static int __init fdt_find_hyper_node(unsigned long node, const char *uname,
+ int depth, void *data)
+{
+   const void *s = NULL;
+   int len;
+
+   if (depth != 1 || strcmp(uname, "hypervisor") != 0)
+   return 0;
+
+   if (of_flat_dt_is_compatible(node, hyper_node.compat))
+   hyper_node.found = true;
+
+   s = of_get_flat_dt_prop(node, "compatible", );
+   if (strlen(hyper_node.prefix) + 3  < len &&
+   !strncmp(hyper_node.prefix, s, strlen(hyper_node.prefix)))
+   hyper_node.version = s + strlen(hyper_node.prefix);
+
+   return 0;
+}
+
 /*
  * see Documentation/devicetree/bindings/arm/xen.txt for the
  * documentation of the Xen Device Tree format.
@@ -229,26 +255,18 @@ static irqreturn_t xen_arm_callback(int irq, void *arg)
 #define GRANT_TABLE_PHYSADDR 0
 void __init xen_early_init(void)
 {
-   int len;
-   const char *s = NULL;
-   const char *version = NULL;
-   const char *xen_prefix = "xen,xen-";
-
-   xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
-   if (!xen_node) {
+   of_scan_flat_dt(fdt_find_hyper_node, NULL);
+   if (!hyper_node.found) {
pr_debug("No Xen support\n");
return;
}
-   s = of_get_property(xen_node, "compatible", );
-   if (strlen(xen_prefix) + 3  < len &&
-   !strncmp(xen_prefix, s, strlen(xen_prefix)))
-   version = s + strlen(xen_prefix);
-   if (version == NULL) {
+
+   if (hyper_node.version == NULL) {
pr_debug("Xen version not found\n");
return;
}
 
-   pr_info("Xen %s support found\n", version);
+   pr_info("Xen %s support found\n", hyper_node.version);
 
xen_domain_type = XEN_HVM_DOMAIN;
 
@@ -289,6 +307,14 @@ static void __init xen_acpi_guest_init_events_irq(void)
 
 static void __init xen_dt_guest_init_events_irq(void)
 {
+   struct device_node *xen_node;
+
+   xen_node = of_find_compatible_node(NULL, NULL, "xen,xen");
+   if (!xen_node) {
+   pr_err("Xen support was detected before, but it has 
disappeared\n");
+   return;
+   }
+
xen_events_irq = irq_of_parse_and_map(xen_node, 0);
 }
 
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 8119479..a4a2878 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -313,6 +313,7 @@ void __init setup_arch(char **cmdline_p)
 */
local_async_enable();
 
+   xen_early_init();
efi_init();
arm64_memblock_init();
 
@@ -334,7 +335,6 @@ void __init setup_arch(char **cmdline_p)
} else {
psci_acpi_init();
}
-   xen_early_init();
 
cpu_read_bootcpu_ops();
smp_init_cpus();
-- 
2.0.4




[PATCH v3 03/17] Xen: xlate: Use page_to_xen_pfn instead of page_to_pfn

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Use page_to_xen_pfn in case of 64KB page.

Signed-off-by: Shannon Zhao 
---
 drivers/xen/xlate_mmu.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
index 9692656..28f728b 100644
--- a/drivers/xen/xlate_mmu.c
+++ b/drivers/xen/xlate_mmu.c
@@ -207,9 +207,12 @@ int __init xen_xlate_map_ballooned_pages(xen_pfn_t **gfns, 
void **virt,
void *vaddr;
int rc;
unsigned int i;
+   unsigned long nr_pages;
+   xen_pfn_t xen_pfn = 0;
 
BUG_ON(nr_grant_frames == 0);
-   pages = kcalloc(nr_grant_frames, sizeof(pages[0]), GFP_KERNEL);
+   nr_pages = DIV_ROUND_UP(nr_grant_frames, XEN_PFN_PER_PAGE);
+   pages = kcalloc(nr_pages, sizeof(pages[0]), GFP_KERNEL);
if (!pages)
return -ENOMEM;
 
@@ -218,22 +221,25 @@ int __init xen_xlate_map_ballooned_pages(xen_pfn_t 
**gfns, void **virt,
kfree(pages);
return -ENOMEM;
}
-   rc = alloc_xenballooned_pages(nr_grant_frames, pages);
+   rc = alloc_xenballooned_pages(nr_pages, pages);
if (rc) {
-   pr_warn("%s Couldn't balloon alloc %ld pfns rc:%d\n", __func__,
-   nr_grant_frames, rc);
+   pr_warn("%s Couldn't balloon alloc %ld pages rc:%d\n", __func__,
+   nr_pages, rc);
kfree(pages);
kfree(pfns);
return rc;
}
-   for (i = 0; i < nr_grant_frames; i++)
-   pfns[i] = page_to_pfn(pages[i]);
+   for (i = 0; i < nr_grant_frames; i++) {
+   if ((i % XEN_PFN_PER_PAGE) == 0)
+   xen_pfn = page_to_xen_pfn(pages[i / XEN_PFN_PER_PAGE]);
+   pfns[i] = pfn_to_gfn(xen_pfn++);
+   }
 
-   vaddr = vmap(pages, nr_grant_frames, 0, PAGE_KERNEL);
+   vaddr = vmap(pages, nr_pages, 0, PAGE_KERNEL);
if (!vaddr) {
-   pr_warn("%s Couldn't map %ld pfns rc:%d\n", __func__,
-   nr_grant_frames, rc);
-   free_xenballooned_pages(nr_grant_frames, pages);
+   pr_warn("%s Couldn't map %ld pages rc:%d\n", __func__,
+   nr_pages, rc);
+   free_xenballooned_pages(nr_pages, pages);
kfree(pages);
kfree(pfns);
return -ENOMEM;
-- 
2.0.4




[PATCH v3 12/17] ARM64: ACPI: Check if it runs on Xen to enable or disable ACPI

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

When it's a Xen domain0 booting with ACPI, it will supply a /chosen and
a /hypervisor node in DT. So check if it needs to enable ACPI.

Signed-off-by: Shannon Zhao 
---
CC: Hanjun Guo 
---
 arch/arm64/kernel/acpi.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index d1ce8e2..4e92be0 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -67,10 +67,13 @@ static int __init dt_scan_depth1_nodes(unsigned long node,
 {
/*
 * Return 1 as soon as we encounter a node at depth 1 that is
-* not the /chosen node.
+* not the /chosen node, or /hypervisor node when running on Xen.
 */
-   if (depth == 1 && (strcmp(uname, "chosen") != 0))
-   return 1;
+   if (depth == 1 && (strcmp(uname, "chosen") != 0)) {
+   if (!xen_initial_domain() || (strcmp(uname, "hypervisor") != 0))
+   return 1;
+   }
+
return 0;
 }
 
@@ -184,7 +187,8 @@ void __init acpi_boot_table_init(void)
/*
 * Enable ACPI instead of device tree unless
 * - ACPI has been disabled explicitly (acpi=off), or
-* - the device tree is not empty (it has more than just a /chosen node)
+* - the device tree is not empty (it has more than just a /chosen node,
+*   and a /hypervisor node when running on Xen)
 *   and ACPI has not been force enabled (acpi=force)
 */
if (param_acpi_off ||
-- 
2.0.4




[PATCH v3 13/17] ARM: Xen: Document UEFI support on Xen ARM virtual platforms

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Add a "uefi" node under /hypervisor node in FDT, then Linux kernel could
scan this to get the UEFI information.

Signed-off-by: Shannon Zhao 
Acked-by: Rob Herring 
---
CC: Rob Herring 
---
 Documentation/devicetree/bindings/arm/xen.txt | 34 +++
 1 file changed, 34 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/xen.txt 
b/Documentation/devicetree/bindings/arm/xen.txt
index 0f7b9c2..aa69405 100644
--- a/Documentation/devicetree/bindings/arm/xen.txt
+++ b/Documentation/devicetree/bindings/arm/xen.txt
@@ -15,6 +15,26 @@ the following properties:
 - interrupts: the interrupt used by Xen to inject event notifications.
   A GIC node is also required.
 
+To support UEFI on Xen ARM virtual platforms, Xen populates the FDT "uefi" node
+under /hypervisor with following parameters:
+
+
+Name  | Size   | Description
+
+xen,uefi-system-table | 64-bit | Guest physical address of the UEFI System
+ || Table.
+
+xen,uefi-mmap-start   | 64-bit | Guest physical address of the UEFI memory
+ || map.
+
+xen,uefi-mmap-size| 32-bit | Size in bytes of the UEFI memory map
+  || pointed to in previous entry.
+
+xen,uefi-mmap-desc-size   | 32-bit | Size in bytes of each entry in the UEFI
+  || memory map.
+
+xen,uefi-mmap-desc-ver| 32-bit | Version of the mmap descriptor format.
+
 
 Example (assuming #address-cells = <2> and #size-cells = <2>):
 
@@ -22,4 +42,18 @@ hypervisor {
compatible = "xen,xen-4.3", "xen,xen";
reg = <0 0xb000 0 0x2>;
interrupts = <1 15 0xf08>;
+   uefi {
+   xen,uefi-system-table = <0x>;
+   xen,uefi-mmap-start = <0x>;
+   xen,uefi-mmap-size = <0x>;
+   xen,uefi-mmap-desc-size = <0x>;
+   xen,uefi-mmap-desc-ver = <0x>;
+};
 };
+
+The format and meaning of these "xen,uefi-*" parameters are similar to those in
+Documentation/arm/uefi.txt which are used by normal UEFI. But to distinguish
+from normal UEFI, for Xen ARM virtual platforms it needs to introduce a Xen
+specific UEFI which requires Xen hypervisor to provide hypercalls for Dom0 to
+make use of the runtime services. Therefore, it defines these parameters under
+/hypervisor node.
-- 
2.0.4




[PATCH v3 09/17] xen/hvm/params: Add a new delivery type for event-channel in HVM_PARAM_CALLBACK_IRQ

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

Add a new delivery type:
val[63:56] == 3: val[15:8] is flag: val[7:0] is a PPI.
To the flag, bit 0 stands the interrupt mode is edge(1) or level(0) and
bit 1 stands the interrupt polarity is active low(1) or high(0).

Signed-off-by: Shannon Zhao 
---
 include/xen/interface/hvm/params.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/xen/interface/hvm/params.h 
b/include/xen/interface/hvm/params.h
index 70ad208..5dd629f 100644
--- a/include/xen/interface/hvm/params.h
+++ b/include/xen/interface/hvm/params.h
@@ -53,6 +53,15 @@
  * if this delivery method is available.
  */
 
+#define HVM_PARAM_CALLBACK_TYPE_EVENT3
+/*
+ * val[15:8] is flag of event-channel interrupt:
+ *  bit 0: interrupt is edge(1) or level(0) triggered
+ *  bit 1: interrupt is active low(1) or high(0)
+ * val[7:0] is PPI number used by event-channel.
+ * This is only used by ARM/ARM64.
+ */
+
 #define HVM_PARAM_STORE_PFN1
 #define HVM_PARAM_STORE_EVTCHN 2
 
-- 
2.0.4




[PATCH v3 00/17] Add ACPI support for Xen Dom0 on ARM64

2016-01-22 Thread Shannon Zhao
From: Shannon Zhao 

This patch set adds ACPI support for Xen Dom0 on ARM64. The relevant Xen
ACPI on ARM64 design document could be found from [1].

This patch set adds a new FDT node "uefi" under /hypervisor to pass UEFI
information. Introduce a bus notifier of AMBA and Platform bus to map
the new added device's MMIO space. Make Xen domain use
xlated_setup_gnttab_pages to setup grant table and a new hypercall to
get event-channel irq.

Regarding the initialization flow of Linux kernel, it needs to move
xen_early_init() before efi_init(). Then xen_early_init() will check
whether it runs on Xen through the /hypervisor node and efi_init() will
call a new function fdt_find_xen_uefi_params(), to parse those
xen,uefi-* parameters just like the existing efi_get_fdt_params().

And in arm64_enable_runtime_services() it will check whether it runs on
Xen and call another new function xen_efi_runtime_setup() to setup
runtime service instead of efi_native_runtime_setup(). The
xen_efi_runtime_setup() will assign the runtime function pointers with
the functions of driver/xen/efi.c.

And since we pass a /hypervisor node and a /chosen node to Dom0, it
needs to check whether the DTS only contains a /hypervisor node and a
/chosen node in acpi_boot_table_init().

Patches are tested on FVP base model. The corresponding Xen patches can
be fetched from [2].

Thanks,
Shannon

[1] http://lists.xen.org/archives/html/xen-devel/2015-11/msg00488.html
[2] http://git.linaro.org/people/shannon.zhao/xen.git  ACPI_XEN_ARM_V3

Changes since v2:
* Use 0 to check if it should ignore the UART
* Fix the use of page_to_xen_pfn
* Factor ACPI and DT parts in xen_guest_init
* Check "uefi" node by full path
* Fix the statement of Documentation/devicetree/bindings/arm/xen.txt

Changes since v1:
* Rebase on linux mainline and wallclock patch from Stefano
* Refactor AMBA and platform device MMIO map to one file
* Use EFI_PARAVIRT to check if it supports XEN EFI
* Refactor Xen EFI codes
* Address other comments

Shannon Zhao (17):
  Xen: ACPI: Hide UART used by Xen
  xen/grant-table: Move xlated_setup_gnttab_pages to common place
  Xen: xlate: Use page_to_xen_pfn instead of page_to_pfn
  arm/xen: Use xen_xlate_map_ballooned_pages to setup grant table
  xen: memory : Add new XENMAPSPACE type XENMAPSPACE_dev_mmio
  Xen: ARM: Add support for mapping platform device mmio
  Xen: ARM: Add support for mapping AMBA device mmio
  Xen: public/hvm: sync changes of HVM_PARAM_CALLBACK_VIA ABI from Xen
  xen/hvm/params: Add a new delivery type for event-channel in
HVM_PARAM_CALLBACK_IRQ
  arm/xen: Get event-channel irq through HVM_PARAM when booting with
ACPI
  ARM: XEN: Move xen_early_init() before efi_init()
  ARM64: ACPI: Check if it runs on Xen to enable or disable ACPI
  ARM: Xen: Document UEFI support on Xen ARM virtual platforms
  XEN: EFI: Move x86 specific codes to architecture directory
  ARM64: XEN: Add a function to initialize Xen specific UEFI runtime
services
  FDT: Add a helper to get specified name subnode
  Xen: EFI: Parse DT parameters for Xen specific UEFI

 Documentation/devicetree/bindings/arm/xen.txt |  34 +
 arch/arm/xen/enlighten.c  | 109 +++
 arch/arm64/include/asm/xen/xen-ops.h  |   6 +
 arch/arm64/kernel/acpi.c  |  12 +-
 arch/arm64/kernel/efi.c   |  17 ++-
 arch/arm64/kernel/setup.c |   2 +-
 arch/arm64/xen/Makefile   |   1 +
 arch/arm64/xen/efi.c  |  40 ++
 arch/x86/xen/efi.c| 112 
 arch/x86/xen/grant-table.c|  57 +---
 drivers/acpi/bus.c|  36 -
 drivers/firmware/efi/efi.c|  45 ++-
 drivers/of/fdt.c  |  35 +
 drivers/xen/Kconfig   |   2 +-
 drivers/xen/Makefile  |   1 +
 drivers/xen/arm-device.c  | 184 ++
 drivers/xen/efi.c | 174 +---
 drivers/xen/xlate_mmu.c   |  67 ++
 include/linux/of_fdt.h|   2 +
 include/xen/interface/hvm/params.h|  36 -
 include/xen/interface/memory.h|   1 +
 include/xen/xen-ops.h |  32 +++--
 22 files changed, 757 insertions(+), 248 deletions(-)
 create mode 100644 arch/arm64/include/asm/xen/xen-ops.h
 create mode 100644 arch/arm64/xen/efi.c
 create mode 100644 drivers/xen/arm-device.c

-- 
2.0.4




Re: [PATCH 1/2] sysctl: expand use of proc_dointvec_minmax_sysadmin

2016-01-22 Thread Eric W. Biederman
Kees Cook  writes:

> Several sysctls expect a state where the highest value (in extra2) is
> locked once set for that boot. Yama does this, and kptr_restrict should
> be doing it. This extracts Yama's logic and adds it to the existing
> proc_dointvec_minmax_sysadmin, taking care to avoid the simple boolean
> states (which do not get locked). Since Yama wants to be checking a
> different capability, we build wrappers for both cases (CAP_SYS_ADMIN
> and CAP_SYS_PTRACE).

Sigh this sysctl appears susceptible to known attacks.

In my quick skim I believe this sysctl implementation that checks
capabilities is susceptible to attacks where the already open file
descriptor is set as stdout on a setuid root application.

Can we come up with an interface that isn't exploitable by an
application that will act as a setuid cat?

Eric

> -#ifdef CONFIG_PRINTK
> -static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
> - void __user *buffer, size_t *lenp, loff_t *ppos)
> +int proc_dointvec_minmax_cap(int cap, struct ctl_table *table, int write,
> +  void __user *buffer, size_t *lenp, loff_t *ppos)
>  {
> - if (write && !capable(CAP_SYS_ADMIN))
> + struct ctl_table table_copy;
> + int value;
> +
> + /* Require init capabilities to make changes. */
> + if (write && !capable(cap))
>   return -EPERM;
>  
> - return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> + /*
> +  * To deal with const sysctl tables, we make a copy to perform
> +  * the locking. When data is >1 and ==extra2, lock extra1 to
> +  * extra2 to stop the value from being changed any further at
> +  * runtime.
> +  */
> + table_copy = *table;
> + value = *(int *)table_copy.data;
> + if (value > 1 && value == *(int *)table_copy.extra2)
> + table_copy.extra1 = table_copy.extra2;
> +
> + return proc_dointvec_minmax(_copy, write, buffer, lenp, ppos);
>  }
> -#endif


Re: [PATCH 0/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Eric W. Biederman
Kees Cook  writes:

> There continues to be unexpected side-effects and security exposures
> via CLONE_NEWUSER. For many end-users running distro kernels with
> CONFIG_USER_NS enabled, there is no way to disable this feature when
> desired. As such, this creates a sysctl to restrict CLONE_NEWUSER so
> admins not running containers or Chrome can avoid the risks of this
> feature.

I don't actually think there do continue to be unexpected side-effects
and security exposures with CLONE_NEWUSER.  It takes a while for all of
the fixes to trickle out to distros.  At most what I have seen recently
are problems with other kernel interfaces being amplified with user
namespaces.  AKA the current mess with devpts, and the unexpected
issues with bind mounts in mount namespaces.

I have a couple of concerns with a sysctl.

1) As user namespaces settle out this sysctl has the potential to
   decrease the security of the system overall as sandboxing
   features of the kernel will not be available to unprivileged
   applications.

   Web browsing with chrome will be less safe for example.

2) I strongly suspect the granularity of a sysctl is wrong for access to
   user namespaces on a production system.

   In general I suspect what we want is something like seccomp.  I
   believe all of the relevant bits are in registers.  I actually
   thought that was enough for seccomp.  Does seccomp not work for
   some reason?

3) A sysctl breeds a false sense of security in thinking that if a
   security issue is discovered you can just flip a switch, disable
   all new user namespaces and you won't be vulnerable.

   In fact most of the issues in the past have only required being in
   a user namespace to trigger.  Which means any containers or user
   namespaces that already exist could be used to exploit any new
   found issue.  Which means that a I don't think a sysctl will give
   the desired level of protection.

   In my analysis of the issues to date I don't know of anything
   short of a reboot that would meaninfully remove the threat.

4) With applications like docker coming on-line I don't think a
   restriction to processes with capabilities is actually meaninful
   for restricting access to user namespaces.

So I have concerns about both efficacy and usability with the proposed
sysctl.

So to keep this productive.  Please tell me about the threat model
you envision, and how you envision knobs in the kernel being used to
counter those threats.

Eric



Re: [PATCH v2 3/3] mtd: nand: sunxi: add randomizer support

2016-01-22 Thread Brian Norris
All three look good, so pushed to l2-mtd.git/next. One comment below:

On Wed, Dec 02, 2015 at 12:01:07PM +0100, Boris Brezillon wrote:

...

> +static u16 sunxi_nfc_randomizer_state(struct mtd_info *mtd, int page, bool 
> ecc)
> +{
> + const u16 *seeds = sunxi_nfc_randomizer_page_seeds;
> + int mod = mtd->erasesize / mtd->writesize;

Richard suggested you use the mtd.h helper here. Patch below.

> +
> + if (mod > ARRAY_SIZE(sunxi_nfc_randomizer_page_seeds))
> + mod = ARRAY_SIZE(sunxi_nfc_randomizer_page_seeds);
> +
> + if (ecc) {
> + if (mtd->ecc_step_size == 512)
> + seeds = sunxi_nfc_randomizer_ecc512_seeds;
> + else
> + seeds = sunxi_nfc_randomizer_ecc1024_seeds;
> + }
> +
> + return seeds[page % mod];
> +}

From: Brian Norris 
Date: Fri, 22 Jan 2016 18:54:02 -0800
Subject: [PATCH] mtd: nand: sunxi: use mtd_div_by_ws() helper

Suggested-by: Richard Weinberger 
Signed-off-by: Brian Norris 
---
 drivers/mtd/nand/sunxi_nand.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mtd/nand/sunxi_nand.c b/drivers/mtd/nand/sunxi_nand.c
index 5f700719d5c2..b5ea6b312df0 100644
--- a/drivers/mtd/nand/sunxi_nand.c
+++ b/drivers/mtd/nand/sunxi_nand.c
@@ -624,7 +624,7 @@ static u16 sunxi_nfc_randomizer_step(u16 state, int count)
 static u16 sunxi_nfc_randomizer_state(struct mtd_info *mtd, int page, bool ecc)
 {
const u16 *seeds = sunxi_nfc_randomizer_page_seeds;
-   int mod = mtd->erasesize / mtd->writesize;
+   int mod = mtd_div_by_ws(mtd->erasesize, mtd);
 
if (mod > ARRAY_SIZE(sunxi_nfc_randomizer_page_seeds))
mod = ARRAY_SIZE(sunxi_nfc_randomizer_page_seeds);
-- 
2.7.0.rc3.207.g0ac5344



Re: [PATCHv8 2/4] ARM: dts: Add Altera L2 Cache and OCRAM EDAC entries

2016-01-22 Thread Rob Herring
On Thu, Jan 21, 2016 at 11:34:26AM -0600, ttha...@opensource.altera.com wrote:
> From: Thor Thayer 
> 
> Adding the device tree entries and bindings needed to support
> the Altera L2 cache and On-Chip RAM EDAC. This patch relies upon
> an earlier patch to declare and setup On-chip RAM properly.
> http://www.spinics.net/lists/devicetree/msg51117.html
> 
> Signed-off-by: Thor Thayer 
> Signed-off-by: Dinh Nguyen 
> ---
> v8: Fix node names to include chip family and use ecc manager
> to better describe the driver. Rename socfpga-edac.txt to
> socfpga-eccmgr.txt.
> v7: No Change
> v6: Change to nested EDAC device nodes based on community
> feedback. Remove L2 syscon. Use consolidated binding.
> v3-5: No Change
> v2: Remove OCRAM declaration and reference prior patch.
> ---
>  .../bindings/arm/altera/socfpga-eccmgr.txt |   49 
> 
>  arch/arm/boot/dts/socfpga.dtsi |   20 
>  2 files changed, 69 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt

A couple of nits, otherwise:

Acked-by: Rob Herring 

> 
> diff --git a/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt 
> b/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt
> new file mode 100644
> index 000..4f45690
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/altera/socfpga-eccmgr.txt
> @@ -0,0 +1,49 @@
> +Altera SoCFPGA ECC Manager
> +This driver uses the EDAC framework to implement the SOCFPGA ECC Manager.
> +The ECC Manager counts and corrects single bit errors and counts/handles
> +double bit errors which are uncorrectable.
> +
> +Required Properties:
> +- compatible : Should be "altr,socfpga-ecc-manager"
> +- #address-cells: must be 1
> +- #size-cells: must be 1
> +- ranges : standard definition, should translate from local addresses
> +
> +Subcomponents:
> +
> +L2 Cache ECC
> +Required Properties:
> +- compatible : Should be "altr,socfpga-l2-ecc"
> +- reg : Address and size for ECC error interrupt clear registers.
> +- interrupts : Should be single bit error interrupt, then double bit error
> + interrupt. Note the rising edge type.
> +
> +On Chip RAM ECC
> +Required Properties:
> +- compatible : Should be "altr,socfpga-ocram-ecc"
> +- reg : Address and size for ECC error interrupt clear registers.
> +- iram : phandle to On-Chip RAM definition.
> +- interrupts : Should be single bit error interrupt, then double bit error
> + interrupt. Note the rising edge type.
> +
> +Example:
> +
> + eccmgr: eccmgr@0xffd08140 {

drop the '0x'

> + compatible = "altr,socfpga-ecc-manager";
> + #address-cells = <1>;
> + #size-cells = <1>;
> + ranges;
> +
> + l2-ecc@ffd08140 {
> + compatible = "altr,socfpga-l2-ecc";
> + reg = <0xffd08140 0x4>;
> + interrupts = <0 36 1>, <0 37 1>;
> + };
> +
> + ocram-ecc@ffd08144 {
> + compatible = "altr,socfpga-ocram-ecc";
> + reg = <0xffd08144 0x4>;
> + iram = <>;
> + interrupts = <0 178 1>, <0 179 1>;
> + };
> + };
> diff --git a/arch/arm/boot/dts/socfpga.dtsi b/arch/arm/boot/dts/socfpga.dtsi
> index 39c470e..9bb383e 100644
> --- a/arch/arm/boot/dts/socfpga.dtsi
> +++ b/arch/arm/boot/dts/socfpga.dtsi
> @@ -656,6 +656,26 @@
>   status = "disabled";
>   };
>  
> + eccmgr: eccmgr@0xffd08140 {

and here.

> + compatible = "altr,socfpga-ecc-manager";
> + #address-cells = <1>;
> + #size-cells = <1>;
> + ranges;
> +
> + l2-ecc@ffd08140 {
> + compatible = "altr,socfpga-l2-ecc";
> + reg = <0xffd08140 0x4>;
> + interrupts = <0 36 1>, <0 37 1>;
> + };
> +
> + ocram-ecc@ffd08144 {
> + compatible = "altr,socfpga-ocram-ecc";
> + reg = <0xffd08144 0x4>;
> + iram = <>;
> + interrupts = <0 178 1>, <0 179 1>;
> + };
> + };
> +
>   L2: l2-cache@fffef000 {
>   compatible = "arm,pl310-cache";
>   reg = <0xfffef000 0x1000>;
> -- 
> 1.7.9.5
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


[PATCH v5] dma: rename dma_*_writecombine() to dma_*_wc()

2016-01-22 Thread Luis R. Rodriguez
From: "Luis R. Rodriguez" 

Rename dma_*_writecombine() to dma_*_wc(), so that the naming
is coherent across the various write-combining APIs. Keep the
old names for compatibility for a while, these can be removed
at a later time. A guard is left to enable backporting of the
rename, and later remove of the old mapping defines seemlessly.

Build tested successfully with allmodconfig.

The following Coccinelle SmPL patch was used for this simple
transformation:

@ rename_dma_alloc_writecombine @
expression dev, size, dma_addr, gfp;
@@

-dma_alloc_writecombine(dev, size, dma_addr, gfp)
+dma_alloc_wc(dev, size, dma_addr, gfp)

@ rename_dma_free_writecombine @
expression dev, size, cpu_addr, dma_addr;
@@

-dma_free_writecombine(dev, size, cpu_addr, dma_addr)
+dma_free_wc(dev, size, cpu_addr, dma_addr)

@ rename_dma_mmap_writecombine @
expression dev, vma, cpu_addr, dma_addr, size;
@@

-dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size)
+dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size)

V5 changes: keep the old names as compatibility helpers, and
guard against their definition to make backporting easier.

Generated-by: Coccinelle SmPL
Suggested-by: Ingo Molnar 
Signed-off-by: Luis R. Rodriguez 
---

Note: in the future a linux-oven tree that with more advanced
tools could help with the type of churn of respinning simple
patches as these which could be either scripted or the changes
made using Coccinelle. More advanced tools are needed to make
that easier, but such work is being considered [0].

[0] http://kernelnewbies.org/KernelProjects/linux-oven

 arch/arm/mach-lpc32xx/phy3250.c   | 13 ++---
 arch/arm/mach-netx/fb.c   | 14 ++
 arch/arm/mach-nspire/clcd.c   | 13 ++---
 drivers/dma/iop-adma.c|  8 
 drivers/dma/mv_xor.c  |  4 ++--
 drivers/dma/qcom_bam_dma.c| 14 +++---
 drivers/gpu/drm/drm_gem_cma_helper.c  | 13 ++---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c |  8 
 drivers/gpu/drm/omapdrm/omap_dmm_tiler.c  | 13 ++---
 drivers/gpu/drm/omapdrm/omap_gem.c|  8 
 drivers/gpu/drm/sti/sti_cursor.c  | 20 +---
 drivers/gpu/drm/sti/sti_gdp.c |  3 +--
 drivers/gpu/drm/sti/sti_hqvdp.c   |  6 +++---
 drivers/gpu/drm/tegra/gem.c   | 11 +--
 drivers/gpu/drm/vc4/vc4_bo.c  |  5 ++---
 drivers/gpu/host1x/cdma.c |  8 
 drivers/gpu/host1x/job.c  | 10 --
 drivers/media/platform/coda/coda-bit.c| 10 +-
 drivers/video/fbdev/acornfb.c |  4 ++--
 drivers/video/fbdev/amba-clcd-versatile.c | 14 ++
 drivers/video/fbdev/amba-clcd.c   |  4 ++--
 drivers/video/fbdev/atmel_lcdfb.c |  9 +
 drivers/video/fbdev/ep93xx-fb.c   |  8 +++-
 drivers/video/fbdev/gbefb.c   |  8 
 drivers/video/fbdev/imxfb.c   | 12 ++--
 drivers/video/fbdev/mx3fb.c   |  9 -
 drivers/video/fbdev/nuc900fb.c|  8 
 drivers/video/fbdev/omap/lcdc.c   | 16 
 drivers/video/fbdev/pxa168fb.c|  8 
 drivers/video/fbdev/pxafb.c   |  4 ++--
 drivers/video/fbdev/s3c-fb.c  |  7 +++
 drivers/video/fbdev/s3c2410fb.c   |  8 
 drivers/video/fbdev/sa1100fb.c|  8 
 include/linux/dma-mapping.h   | 25 +
 sound/arm/pxa2xx-pcm-lib.c| 12 
 sound/soc/fsl/imx-pcm-fiq.c   | 10 --
 sound/soc/nuc900/nuc900-pcm.c |  6 ++
 sound/soc/omap/omap-pcm.c | 12 
 38 files changed, 176 insertions(+), 197 deletions(-)

diff --git a/arch/arm/mach-lpc32xx/phy3250.c b/arch/arm/mach-lpc32xx/phy3250.c
index 77d6b1bab278..ee06fabdf60e 100644
--- a/arch/arm/mach-lpc32xx/phy3250.c
+++ b/arch/arm/mach-lpc32xx/phy3250.c
@@ -86,8 +86,8 @@ static int lpc32xx_clcd_setup(struct clcd_fb *fb)
 {
dma_addr_t dma;
 
-   fb->fb.screen_base = dma_alloc_writecombine(>dev->dev,
-   PANEL_SIZE, , GFP_KERNEL);
+   fb->fb.screen_base = dma_alloc_wc(>dev->dev, PANEL_SIZE, ,
+ GFP_KERNEL);
if (!fb->fb.screen_base) {
printk(KERN_ERR "CLCD: unable to map framebuffer\n");
return -ENOMEM;
@@ -116,15 +116,14 @@ static int lpc32xx_clcd_setup(struct clcd_fb *fb)
 
 static int lpc32xx_clcd_mmap(struct clcd_fb *fb, struct vm_area_struct *vma)
 {
-   return dma_mmap_writecombine(>dev->dev, vma,
-   fb->fb.screen_base, fb->fb.fix.smem_start,
-   fb->fb.fix.smem_len);
+   return dma_mmap_wc(>dev->dev, vma, fb->fb.screen_base,
+  fb->fb.fix.smem_start, fb->fb.fix.smem_len);
 }
 
 static void 

Re: Crashes with 874bbfe600a6 in 3.18.25

2016-01-22 Thread Ben Hutchings
On Fri, 2016-01-22 at 11:09 -0500, Tejun Heo wrote:
> (cc'ing Thomas)
> 
> On Thu, Jan 21, 2016 at 08:10:20PM -0500, Sasha Levin wrote:
> > On 01/21/2016 04:52 AM, Jan Kara wrote:
> > > On Wed 20-01-16 13:39:01, Shaohua Li wrote:
> > > > On Wed, Jan 20, 2016 at 10:19:26PM +0100, Jan Kara wrote:
> > > > > Hello,
> > > > > 
> > > > > a friend of mine started seeing crashes with 3.18.25 kernel - once
> > > > > appropriate load is put on the machine it crashes within minutes. He
> > > > > tracked down that reverting commit 874bbfe600a6 (this is the commit 
> > > > > ID from
> > > > > Linus' tree, in stable tree the commit ID is 1e7af294dd03) 
> > > > > "workqueue: make
> > > > > sure delayed work run in local cpu" makes the kernel stable again. I'm
> > > > > attaching screenshot of the crash - sadly the initial part is missing 
> > > > > but
> > > > > it seems that we crashed when processing timers on otherwise idle 
> > > > > CPU. This
> > > > > is a production machine so experimentation is not easy but if we 
> > > > > really
> > > > > need more information it may be possible to reproduce the issue again 
> > > > > and
> > > > > gather it.
> > > > > 
> > > > > Anyone has idea what is going on? I was looking into the code for a 
> > > > > while
> > > > > but so far I have no good explanation.  It would be good to 
> > > > > understand the
> > > > > cause instead of just blindly reverting the commit from stable tree...
> > > > 
> > > > Tejun fixed a bug in timer: 22b886dd10180939. is it included in 3.18.25?
> > > 
> > > That doesn't seem to be included in 3.18-stable although it was CCed to 
> > > stable.
> > > Sasha?
> > 
> > Looks like it requires more than trivial backport (I think). Tejun?
> 
> The timer migration has changed quite a bit.  Given that we've never
> seen vmstat work crashing in 3.18 era, I wonder whether the right
> thing to do here is reverting 874bbfe600a6 from 3.18 stable?

It's not just 3.18 that has this; 874bbfe600a6 was backported to all
stable branches from 3.10 onward.  Only the 4.2-ckt branch has
22b886dd10180939.

Ben.

-- 
Ben Hutchings
Life is what happens to you while you're busy making other plans.
   - John Lennon

signature.asc
Description: This is a digitally signed message part


Re: [PATCH] mptlan: add checks for dma mapping errors

2016-01-22 Thread kbuild test robot
Hi Alexey,

[auto build test WARNING on v4.4-rc8]
[also build test WARNING on next-20160122]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Alexey-Khoroshilov/mptlan-add-checks-for-dma-mapping-errors/20160123-070633
config: x86_64-randconfig-s1-01230930 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/message/fusion/mptlan.h:55,
from drivers/message/fusion/mptlan.c:55:
   drivers/message/fusion/mptlan.c: In function 'mpt_lan_sdu_send':
   drivers/message/fusion/mptlan.c:737:24: warning: passing argument 1 of 
'dma_mapping_error' from incompatible pointer type 
[-Wincompatible-pointer-types]
 if (dma_mapping_error(mpt_dev->pcidev, dma)) {
   ^
   include/linux/compiler.h:147:28: note: in definition of macro '__trace_if'
 if (__builtin_constant_p((cond)) ? !!(cond) :   \
   ^
>> drivers/message/fusion/mptlan.c:737:2: note: in expansion of macro 'if'
 if (dma_mapping_error(mpt_dev->pcidev, dma)) {
 ^
   In file included from arch/x86/include/asm/dma-mapping.h:49:0,
from include/linux/dma-mapping.h:87,
from include/linux/skbuff.h:34,
from include/linux/if_ether.h:23,
from include/uapi/linux/ethtool.h:17,
from include/linux/ethtool.h:16,
from include/linux/netdevice.h:42,
from drivers/message/fusion/mptlan.h:58,
from drivers/message/fusion/mptlan.c:55:
   include/asm-generic/dma-mapping-common.h:316:19: note: expected 'struct 
device *' but argument is of type 'struct pci_dev *'
static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
  ^
   In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/message/fusion/mptlan.h:55,
from drivers/message/fusion/mptlan.c:55:
   drivers/message/fusion/mptlan.c:737:24: warning: passing argument 1 of 
'dma_mapping_error' from incompatible pointer type 
[-Wincompatible-pointer-types]
 if (dma_mapping_error(mpt_dev->pcidev, dma)) {
   ^
   include/linux/compiler.h:147:40: note: in definition of macro '__trace_if'
 if (__builtin_constant_p((cond)) ? !!(cond) :   \
   ^
>> drivers/message/fusion/mptlan.c:737:2: note: in expansion of macro 'if'
 if (dma_mapping_error(mpt_dev->pcidev, dma)) {
 ^
   In file included from arch/x86/include/asm/dma-mapping.h:49:0,
from include/linux/dma-mapping.h:87,
from include/linux/skbuff.h:34,
from include/linux/if_ether.h:23,
from include/uapi/linux/ethtool.h:17,
from include/linux/ethtool.h:16,
from include/linux/netdevice.h:42,
from drivers/message/fusion/mptlan.h:58,
from drivers/message/fusion/mptlan.c:55:
   include/asm-generic/dma-mapping-common.h:316:19: note: expected 'struct 
device *' but argument is of type 'struct pci_dev *'
static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
  ^
   In file included from include/uapi/linux/stddef.h:1:0,
from include/linux/stddef.h:4,
from include/uapi/linux/posix_types.h:4,
from include/uapi/linux/types.h:13,
from include/linux/types.h:5,
from include/linux/list.h:4,
from include/linux/module.h:9,
from drivers/message/fusion/mptlan.h:55,
from drivers/message/fusion/mptlan.c:55:
   drivers/message/fusion/mptlan.c:737:24: warning: passing argument 1 of 
'dma_mapping_error' from incompatible pointer type 
[-Wincompatible-pointer-types]
 if (dma_mapping_error(mpt_dev->pcidev, dma)) {
   ^
   include/linux/compiler.h:158:16: note: i

Re: regression 4.4: deadlock in with cgroup percpu_rwsem

2016-01-22 Thread Paul E. McKenney
On Wed, Jan 20, 2016 at 10:30:07AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jan 20, 2016 at 11:47:58AM +0100, Peter Zijlstra wrote:
> > TJ, is css_offline guaranteed to be called in hierarchical order? I
> 
> No, they aren't.  The ancestors of a css are guaranteed to stay around
> until css_free is called on the css and that's the only ordering
> guarantee.
> 
> > got properly lost in the whole cgroup destroy code. There's endless
> > workqueues and rcu callbacks there.
> 
> Yeah, it's hairy.  I wondered about adding support for bouncing to
> workqueue in both percpu_ref and rcu which would make things easier to
> follow.  Not sure how often this pattern happens tho.

This came up recently offlist for call_rcu(), so that a call to (say)
call_rcu_schedule_work() would do a schedule_work() after a grace period
elapsed, invoking the function passed in to call_rcu_schedule_work().
There are several existing cases that do this, so special-casing it seems
worthwhile.  Perhaps something vaguely similar would work for percpu_ref.

Thanx, Paul



Re: [RFC PATCH] dax, ext2, ext4, XFS: fix data corruption race

2016-01-22 Thread Matthew Wilcox
On Fri, Jan 22, 2016 at 04:06:11PM -0700, Ross Zwisler wrote:
> +++ b/fs/block_dev.c
> @@ -1733,13 +1733,28 @@ static const struct address_space_operations 
> def_blk_aops = {
>   */
>  static int blkdev_dax_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  {
> - return __dax_fault(vma, vmf, blkdev_get_block, NULL);
> + int ret;
> +
> + ret = __dax_fault(vma, vmf, blkdev_get_block, NULL, false);
> +
> + if (WARN_ON_ONCE(ret == -EAGAIN))
> + ret = VM_FAULT_SIGBUS;
> +
> + return ret;
>  }

Let's not mix up -E returns and VM_FAULT returns.  We already have a
perfectly good VM_FAULT return value -- VM_FAULT_RETRY.


Re: [PATCH] pci: fix unavailable irq number 255 reported by BIOS

2016-01-22 Thread Rafael J. Wysocki
On Fri, Jan 22, 2016 at 8:23 PM, David Daney  wrote:
> On 01/22/2016 09:53 AM, Bjorn Helgaas wrote:
>>
>> On Thu, Jan 21, 2016 at 11:58:26PM +0100, Rafael J. Wysocki wrote:
>>>
>>> On Thu, Jan 21, 2016 at 3:41 PM, Cao jin 
>>> wrote:

 Hi,

  IMHO, I think maybe modification on i801_smbus driver is easier.

  Because when i801_smbus request_irq using pci_dev->irq, this
 pci_dev->irq seems still holds the value read from register(
 pci_setup_device->pci_read_irq), if the value is 255, it is invalid in
 register,
>>>
>>>
>>> Right.
>>>
>>> Which is why the PCI core should not leak it into the driver's ->probe
>>> callback.
>>
>>
>> Is there a reserved IRQ value we could use to mean "invalid"?
>
>
> In many (most) cases, zero indicates no irq.

Zero is a valid timer IRQ on x86, though, so it's better not to give
any special meaning to it in general.

Using ~0 as suggested by Bjorn should work as it would cause
request_irq() to return -EINVAL if passed to it AFAICS.

Thanks,
Rafael


Re: [RFC PATCH] x86/head_64.S: remove redundant check that kernel address is 2M aligned

2016-01-22 Thread Brian Gerst
On Fri, Jan 22, 2016 at 1:13 PM, Alexander Kuleshov
 wrote:
> We check that the base address of the kernel is 2M aligned in
> the arch/x86/kernel/head_65.S right after jump to the decompressed
> kernel. But we already have a check in the decompress_kernel()
> which validates that kernel location is MIN_KERNEL_ALIGN aligned
> which is 2M too for x86_64.
>
> Signed-off-by: Alexander Kuleshov 
> ---
>  arch/x86/kernel/head_64.S | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index ffdc0e8..4967cba 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -75,12 +75,6 @@ startup_64:
> leaq_text(%rip), %rbp
> subq$_text - __START_KERNEL_map, %rbp
>
> -   /* Is the address not 2M aligned? */
> -   movq%rbp, %rax
> -   andl$~PMD_PAGE_MASK, %eax
> -   testl   %eax, %eax
> -   jnz bad_address
> -
> /*
>  * Is the address too large?
>  */

I think we still need to do the check, in case we came from a 64-bit
bootloader that directly jumped to startup_64.  However, this check
can be simplified to:

testl $~PMD_PAGE_MASK, %ebp
jnz bad_address

--
Brian Gerst


Re: [perf/x86] 75925e1ad7: BUG: unable to handle kernel paging request at 000045b8

2016-01-22 Thread Andi Kleen
On Fri, Jan 22, 2016 at 12:33:24PM +0800, kernel test robot wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Thanks. I managed to break 32bit kernels. The appended patch should
fix it.



x86, perf: Fix perf user stack trace walking

Fix 75925e1ad7 (perf/x86: Optimize stack walk user accesses)
   
Replace the hard coded 64bit frame pointer sizes, with sizeof depending
on the size of unsigned long on the host.

This avoids a stack smash on 32bit kernels, which was dutifully reported
by the 0day kbuild robot.

Signed-off-by: Andi Kleen 

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 1b443db..ea4eb5c 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -2328,13 +2328,16 @@ perf_callchain_user(struct perf_callchain_entry *entry, 
struct pt_regs *regs)
frame.next_frame = NULL;
frame.return_address = 0;
 
-   if (!access_ok(VERIFY_READ, fp, 16))
+   if (!access_ok(VERIFY_READ, fp, sizeof(frame)))
break;
 
-   bytes = __copy_from_user_nmi(_frame, fp, 8);
+   bytes = __copy_from_user_nmi(_frame, fp,
+   sizeof(frame.next_frame));
if (bytes != 0)
break;
-   bytes = __copy_from_user_nmi(_address, fp+8, 8);
+   bytes = __copy_from_user_nmi(_address,
+   fp + sizeof(frame.next_frame),
+   sizeof(frame.return_address));
if (bytes != 0)
break;
 


[PATCH] x86/fpu: Revert earlier patch of Disable AVX when eagerfpu is off

2016-01-22 Thread Yu-cheng Yu
AVX was mistakenly believed to be dependent on eagerfpu switch.
This turns out to be false. The earlier patch should be reverted.

Original patch:
http://git.kernel.org/tip/394db20ca240741a08d472173db13d6f6a6e5a28

Signed-off-by: Yu-cheng Yu 
Reported_by: Leonid Shatz 
Cc: x...@kernel.org
Cc: H. Peter Anvin 
Cc: Thomas Gleixner 
Cc: Dave Hansen 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: Sai Praneeth Prakhya 
Cc: Ravi V. Shankar 
Cc: Leonid Shatz 
Cc: Fenghua Yu 
---
 arch/x86/include/asm/fpu/xstate.h | 9 -
 arch/x86/kernel/fpu/init.c| 6 --
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h 
b/arch/x86/include/asm/fpu/xstate.h
index af30fde..f23cd8c 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -20,16 +20,15 @@
 
 /* Supported features which support lazy state saving */
 #define XFEATURE_MASK_LAZY (XFEATURE_MASK_FP | \
-XFEATURE_MASK_SSE)
-
-/* Supported features which require eager state saving */
-#define XFEATURE_MASK_EAGER(XFEATURE_MASK_BNDREGS | \
-XFEATURE_MASK_BNDCSR | \
+XFEATURE_MASK_SSE | \
 XFEATURE_MASK_YMM | \
 XFEATURE_MASK_OPMASK | \
 XFEATURE_MASK_ZMM_Hi256 | \
 XFEATURE_MASK_Hi16_ZMM)
 
+/* Supported features which require eager state saving */
+#define XFEATURE_MASK_EAGER(XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)
+
 /* All currently supported features */
 #define XCNTXT_MASK(XFEATURE_MASK_LAZY | XFEATURE_MASK_EAGER)
 
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6d9f0a7..f0ab368 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -300,12 +300,6 @@ u64 __init fpu__get_supported_xfeatures_mask(void)
 static void __init fpu__clear_eager_fpu_features(void)
 {
setup_clear_cpu_cap(X86_FEATURE_MPX);
-   setup_clear_cpu_cap(X86_FEATURE_AVX);
-   setup_clear_cpu_cap(X86_FEATURE_AVX2);
-   setup_clear_cpu_cap(X86_FEATURE_AVX512F);
-   setup_clear_cpu_cap(X86_FEATURE_AVX512PF);
-   setup_clear_cpu_cap(X86_FEATURE_AVX512ER);
-   setup_clear_cpu_cap(X86_FEATURE_AVX512CD);
 }
 
 /*
-- 
1.9.1



Re: [kernel-hardening] Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Ben Hutchings
On Fri, 2016-01-22 at 15:00 -0800, Kees Cook wrote:
> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
> > 
> > > > Seems that Debian and some older Ubuntu versions are already using
> > > > 
> > > > $ sysctl -a | grep usern
> > > > kernel.unprivileged_userns_clone = 0
> > > > 
> > > > Shall we be consistent wit it?
> > > 
> > > Oh! I didn't see that on systems I checked. On which version did you find 
> > > that?
> > 
> > $ uname -a
> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
> > (2016-01-07) x86_64 GNU/Linux
> > $ cat /etc/debian_version
> > 8.2
> 
> Ah-ha, Debian only, though it looks like this was just committed to
> the Ubuntu kernel tree too:
> 
> 
> > IIRC some older kernels delivered with Ubuntu Precise were also using
> > it (but maybe I'm mistaken)
> 
> I don't see it there.
> 
> I think my patch is more complete, but I'm happy to change the name if
> this sysctl has already started to enter the global consciousness. ;)
> 
> Serge, Ben, what do you think?

I agree that using the '_restrict' suffix for new restrictions makes
sense.  I also don't think that a third possible value for
kernel.unprivileged_userns_clone would would be understandable.

I would probably make kernel.unprivileged_userns_clone a wrapper for
kernel.userns_restrict in Debian, then deprecate and eventually remove
it.

Ben.

-- 
Ben Hutchings
Life is what happens to you while you're busy making other plans.
   - John Lennon

signature.asc
Description: This is a digitally signed message part


Re: [PATCH] autofs: show pipe inode in mount options

2016-01-22 Thread Ian Kent
On Sat, 2016-01-23 at 08:30 +0800, Ian Kent wrote:
> On Fri, 2016-01-22 at 12:34 +0100, Stanislav Kinsburskiy wrote:
> > Hi again,
> > 
> > I would like to ask about any progress with this patch?
> > Any other requirements to make it able to merge?
> 
> Sorry for the delay.
> 
> Since there haven't been any comments from Al or Stephen I'm think I
> should include it in the series I plan on sending to linux-next to
> rename autofs4 to autofs (among other things).
> 
> I haven't had anything significant enough for autofs to warrant
> maintaining a tree and sending push requests so I'll need to ask
> Stephen what I need to do (perhaps you could offer some advise on
> that
> now Stephen, please).

Apologies, there appears to be a parse error in my grammar above,
sorry, and clearly "push" should be "pull" in the paragraph above.

> 
> I'm also struggling to get back to this and carry out the needed
> testing and I'll need to re-base the series too now but I'm getting
> there.
> 
> I didn't see a follow up patch with an updated description, did I
> miss
> it?
> 
> Ian
> --
> To unsubscribe from this list: send the line "unsubscribe autofs" in


[PATCH 2/5] x86/fpu: Fix FNSAVE usage in eagerfpu mode

2016-01-22 Thread Andy Lutomirski
In eager fpu mode, having deactivated fpu without immediately
reloading some other context is illegal.  Therefore, to recover from
FNSAVE, we can't just deactivate the state -- we need to reload it
if we're not actively context switching.

We had this wrong in fpu__save and fpu__copy.  Fix both.
__kernel_fpu_begin was fine -- add a comment.

This fixes a warning triggerable with nofxsr eagerfpu=on.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/fpu/core.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 08e1e11a05ca..7a9244df33e2 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -114,6 +114,10 @@ void __kernel_fpu_begin(void)
kernel_fpu_disable();
 
if (fpu->fpregs_active) {
+   /*
+* Ignore return value -- we don't care if reg state
+* is clobbered.
+*/
copy_fpregs_to_fpstate(fpu);
} else {
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
@@ -189,8 +193,12 @@ void fpu__save(struct fpu *fpu)
 
preempt_disable();
if (fpu->fpregs_active) {
-   if (!copy_fpregs_to_fpstate(fpu))
-   fpregs_deactivate(fpu);
+   if (!copy_fpregs_to_fpstate(fpu)) {
+   if (use_eager_fpu())
+   copy_kernel_to_fpregs(>state);
+   else
+   fpregs_deactivate(fpu);
+   }
}
preempt_enable();
 }
@@ -259,7 +267,11 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu 
*src_fpu)
preempt_disable();
if (!copy_fpregs_to_fpstate(dst_fpu)) {
memcpy(_fpu->state, _fpu->state, xstate_size);
-   fpregs_deactivate(src_fpu);
+
+   if (use_eager_fpu())
+   copy_kernel_to_fpregs(_fpu->state);
+   else
+   fpregs_deactivate(src_fpu);
}
preempt_enable();
 }
-- 
2.5.0



[PATCH 4/5] x86/fpu: Speed up lazy FPU restores slightly

2016-01-22 Thread Andy Lutomirski
If we have an FPU, there's no need to check CR0 for FPU emulation.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/traps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 87f80febf477..183b300f6a8b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -752,7 +752,7 @@ do_device_not_available(struct pt_regs *regs, long 
error_code)
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
 
 #ifdef CONFIG_MATH_EMULATION
-   if (read_cr0() & X86_CR0_EM) {
+   if (!cpu_has_fpu && (read_cr0() & X86_CR0_EM)) {
struct math_emu_info info = { };
 
conditional_sti(regs);
-- 
2.5.0



[PATCH 1/5] x86/fpu: Fix math emulation in eager fpu mode

2016-01-22 Thread Andy Lutomirski
Systems without an FPU are generally old and therefore use lazy FPU
switching.  Unsurprisingly, math emulation in eager FPU mode is a
bit buggy.  Fix it.

There were two bugs involving kernel code trying to use the FPU
registers in eager mode even if they didn't exist and one BUG_ON
that was incorrect.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/fpu/internal.h | 3 ++-
 arch/x86/kernel/fpu/core.c  | 2 +-
 arch/x86/kernel/traps.c | 1 -
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu/internal.h 
b/arch/x86/include/asm/fpu/internal.h
index 0fd440df63f1..a1f78a9fbf41 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -589,7 +589,8 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu 
*new_fpu, int cpu)
 * If the task has used the math, pre-load the FPU on xsave processors
 * or if the past 5 consecutive context-switches used math.
 */
-   fpu.preload = new_fpu->fpstate_active &&
+   fpu.preload = static_cpu_has(X86_FEATURE_FPU) &&
+ new_fpu->fpstate_active &&
  (use_eager_fpu() || new_fpu->counter > 5);
 
if (old_fpu->fpregs_active) {
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index d25097c3fc1d..08e1e11a05ca 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -423,7 +423,7 @@ void fpu__clear(struct fpu *fpu)
 {
WARN_ON_FPU(fpu != >thread.fpu); /* Almost certainly an 
anomaly */
 
-   if (!use_eager_fpu()) {
+   if (!use_eager_fpu() || !static_cpu_has(X86_FEATURE_FPU)) {
/* FPU state will be reallocated lazily at the first use. */
fpu__drop(fpu);
} else {
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index ade185a46b1d..87f80febf477 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -750,7 +750,6 @@ dotraplinkage void
 do_device_not_available(struct pt_regs *regs, long error_code)
 {
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
-   BUG_ON(use_eager_fpu());
 
 #ifdef CONFIG_MATH_EMULATION
if (read_cr0() & X86_CR0_EM) {
-- 
2.5.0



[PATCH 5/5] x86/fpu: Default eagerfpu=on on all CPUs

2016-01-22 Thread Andy Lutomirski
We have eager and lazy fpu modes, introduced in 304bceda6a18 ("x86,
fpu: use non-lazy fpu restore for processors supporting xsave").

The result is rather messy.  There are two code paths in almost all
of the FPU code, and only one of them (the eager case) is tested
frequently, since most kernel developers have new enough hardware
that we use eagerfpu.

It seems that, on any remotely recent hardware, eagerfpu is a win:
glibc uses SSE2, so laziness is probably overoptimistic, and, in any
case, manipulating TS is far slower that saving and restoring the
full state.  (Stores to CR0.TS are serializing and are poorly
optimized.)

To try to shake out any latent issues on old hardware, this changes
the default to eager on all CPUs.  If no performance or functionality
problems show up, a subsequent patch could remove lazy mode entirely.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/fpu/init.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index d53ab3d3b8e8..e12cc0ad368e 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -262,7 +262,10 @@ static void __init 
fpu__init_system_xstate_size_legacy(void)
  * not only saved the restores along the way, but we also have the
  * FPU ready to be used for the original task.
  *
- * 'eager' switching is used on modern CPUs, there we switch the FPU
+ * 'lazy' is deprecated because it's almost never a performance win
+ * and it's much more complicated than 'eager'.
+ *
+ * 'eager' switching is by default on all CPUs, there we switch the FPU
  * state during every context switch, regardless of whether the task
  * has used FPU instructions in that time slice or not. This is done
  * because modern FPU context saving instructions are able to optimize
@@ -273,7 +276,7 @@ static void __init fpu__init_system_xstate_size_legacy(void)
  *   to use 'eager' restores, if we detect that a task is using the FPU
  *   frequently. See the fpu->counter logic in fpu/internal.h for that. ]
  */
-static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
+static enum { ENABLE, DISABLE } eagerfpu = ENABLE;
 
 /*
  * Find supported xfeatures based on cpu features and command-line input.
@@ -350,15 +353,9 @@ static void __init fpu__init_system_ctx_switch(void)
  */
 static void __init fpu__init_parse_early_param(void)
 {
-   /*
-* No need to check "eagerfpu=auto" again, since it is the
-* initial default.
-*/
if (cmdline_find_option_bool(boot_command_line, "eagerfpu=off")) {
eagerfpu = DISABLE;
fpu__clear_eager_fpu_features();
-   } else if (cmdline_find_option_bool(boot_command_line, "eagerfpu=on")) {
-   eagerfpu = ENABLE;
}
 
if (cmdline_find_option_bool(boot_command_line, "no387"))
-- 
2.5.0



[PATCH 3/5] x86/fpu: Fold fpu_copy into fpu__copy

2016-01-22 Thread Andy Lutomirski
Splitting it into two functions needlessly obfuscated the code.
While we're at it, improve the comment slightly.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/kernel/fpu/core.c | 32 +++-
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7a9244df33e2..299b58bb975b 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -231,14 +231,15 @@ void fpstate_init(union fpregs_state *state)
 }
 EXPORT_SYMBOL_GPL(fpstate_init);
 
-/*
- * Copy the current task's FPU state to a new task's FPU context.
- *
- * In both the 'eager' and the 'lazy' case we save hardware registers
- * directly to the destination buffer.
- */
-static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
+int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 {
+   dst_fpu->counter = 0;
+   dst_fpu->fpregs_active = 0;
+   dst_fpu->last_cpu = -1;
+
+   if (!src_fpu->fpstate_active || !cpu_has_fpu)
+   return 0;
+
WARN_ON_FPU(src_fpu != >thread.fpu);
 
/*
@@ -251,10 +252,9 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu 
*src_fpu)
/*
 * Save current FPU registers directly into the child
 * FPU context, without any memory-to-memory copying.
-*
-* If the FPU context got destroyed in the process (FNSAVE
-* done on old CPUs) then copy it back into the source
-* context and mark the current task for lazy restore.
+* In lazy mode, if the FPU context isn't loaded into
+* fpregs, CR0.TS will be set and do_device_not_available
+* will load the FPU context.
 *
 * We have to do all this with preemption disabled,
 * mostly because of the FNSAVE case, because in that
@@ -274,16 +274,6 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu 
*src_fpu)
fpregs_deactivate(src_fpu);
}
preempt_enable();
-}
-
-int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
-{
-   dst_fpu->counter = 0;
-   dst_fpu->fpregs_active = 0;
-   dst_fpu->last_cpu = -1;
-
-   if (src_fpu->fpstate_active && cpu_has_fpu)
-   fpu_copy(dst_fpu, src_fpu);
 
return 0;
 }
-- 
2.5.0



[PATCH 0/5] x86/fpu: eagerfpu fixes, speedups, and default enablement

2016-01-22 Thread Andy Lutomirski
Hi all-

Patches 1, 2, and 3 are fixes.

Patch 4 is probably a small speedup.  It also only matters in lazy
FPU mode, which means that, most likely, no one cares.  Apply or
don't -- I don't care much.

Patch 5 is, in some sense, a radical change.  Currently we select
eager or lazy mode depending on CPU type.  I think that lazy mode
sucks and that we should deprecate and remove it.

With patches 1-3 applied, I think that eagerfpu works on all
systems.  Patch 5 will use it on all systems subject to a chicken
flag -- eagerfpu=off will still disable it.

I propose that we apply patch 5, let it soak in -next until the 4.6
merge window opens, possibly let it actually land in 4.6, and then
remove lazy mode entirely for 4.7.  This will open up enormous
cleanup possibilities, and it will make the fpu code vastly more
comprehensible.

Thoughts?

Andy Lutomirski (5):
  x86/fpu: Fix math emulation in eager fpu mode
  x86/fpu: Fix FNSAVE usage in eagerfpu mode
  x86/fpu: Fold fpu_copy into fpu__copy
  x86/fpu: Speed up lazy FPU restores slightly
  x86/fpu: Default eagerfpu=on on all CPUs

 arch/x86/include/asm/fpu/internal.h |  3 ++-
 arch/x86/kernel/fpu/core.c  | 52 +++--
 arch/x86/kernel/fpu/init.c  | 13 --
 arch/x86/kernel/traps.c |  3 +--
 4 files changed, 35 insertions(+), 36 deletions(-)

-- 
2.5.0



Re: [Xen-devel] [PATCH v1 04/12] xen/hvmlite: Bootstrap HVMlite guest

2016-01-22 Thread Luis R. Rodriguez
On Fri, Jan 22, 2016 at 4:30 PM, Andrew Cooper
 wrote:
> the DMLite boot
> protocol is OS agnostic, and will be staying that way.

What's the DMLite boot protocol? Is that the protocol that is defined
by Xen to boot Xen guests and dom0? Is this well documented somewhere?

To be clear are you saying that by no means will Xen change to instead
of setting a, say zero-page, it would just want to always stuff a xen
struct, pass that to the boot entry, and then expect always the guest
kernel to always parse this?

If true, then by no means, no matter how hard we try, and no matter
what we do on the Linux front to help clean things up will we be able
to have a unified bare metal / Xen entry. I'm noting it could be
possible though provided we do just set the zero page, the subarch to
Xen and subarch_data to the Xen custom data structure.

 Luis


Re: [Xen-devel] [PATCH v1 04/12] xen/hvmlite: Bootstrap HVMlite guest

2016-01-22 Thread Luis R. Rodriguez
On Fri, Jan 22, 2016 at 4:30 PM, Andrew Cooper
 wrote:
> I would have though the correct way to do direct Linux support would be
> to have a very small init stub which constructs an appropriate zero
> page, and lets the native entry point get on with things.

As hpa noted recently in another thread [0] that is precisely what
hardware_subarch and hardware_subarch_data was meant to be used for,
and its what I'm alluding to.

The only thing though is that as far as we're concerned on x86 we had
expected use of hardware_subarch and hardware_subarch_data only for
PV, and not for HVM. This seems to be HVM related, but I think this is
just a rebranding of PVH to HVMLite, right, so I think the use case of
hardware_subarch and hardware_subarch_data are still welcomed as
expected in the original design.

[0] http://lkml.kernel.org/r/56a130b5.8060...@zytor.com

 Luis


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Serge Hallyn
Quoting Kees Cook (keesc...@chromium.org):
> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
> >
> >>> Seems that Debian and some older Ubuntu versions are already using
> >>>
> >>> $ sysctl -a | grep usern
> >>> kernel.unprivileged_userns_clone = 0
> >>>
> >>> Shall we be consistent wit it?
> >>
> >> Oh! I didn't see that on systems I checked. On which version did you find 
> >> that?
> >
> > $ uname -a
> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
> > (2016-01-07) x86_64 GNU/Linux
> > $ cat /etc/debian_version
> > 8.2
> 
> Ah-ha, Debian only, though it looks like this was just committed to
> the Ubuntu kernel tree too:
> 
> 
> > IIRC some older kernels delivered with Ubuntu Precise were also using
> > it (but maybe I'm mistaken)
> 
> I don't see it there.
> 
> I think my patch is more complete, but I'm happy to change the name if
> this sysctl has already started to enter the global consciousness. ;)
> 
> Serge, Ben, what do you think?

Oh, sorry - as for the name of it, what is the alternative you are proposing?


Re: [PATCH 2/2] sysctl: allow CLONE_NEWUSER to be disabled

2016-01-22 Thread Serge Hallyn
Quoting Kees Cook (keesc...@chromium.org):
> On Fri, Jan 22, 2016 at 2:55 PM, Robert Święcki  wrote:
> > 2016-01-22 23:50 GMT+01:00 Kees Cook :
> >
> >>> Seems that Debian and some older Ubuntu versions are already using
> >>>
> >>> $ sysctl -a | grep usern
> >>> kernel.unprivileged_userns_clone = 0
> >>>
> >>> Shall we be consistent wit it?
> >>
> >> Oh! I didn't see that on systems I checked. On which version did you find 
> >> that?
> >
> > $ uname -a
> > Linux bc1 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-5~bpo8+1
> > (2016-01-07) x86_64 GNU/Linux
> > $ cat /etc/debian_version
> > 8.2
> 
> Ah-ha, Debian only, though it looks like this was just committed to
> the Ubuntu kernel tree too:
> 
> 
> > IIRC some older kernels delivered with Ubuntu Precise were also using
> > it (but maybe I'm mistaken)
> 
> I don't see it there.
> 
> I think my patch is more complete, but I'm happy to change the name if
> this sysctl has already started to enter the global consciousness. ;)
> 
> Serge, Ben, what do you think?
> 
> -Kees

Hey,

I had originally written this for Ubuntu when userns was still new
and not upstream.  Then we dropped it when it got upstream.

The reason we are re-adding it is because we're going to be pushing the
envelop again wrt unprivileged userns usage.  Seth has been working on
supporting mounts of fuse, for instance.  When everything is upstream,
(or we drop it :) we'll drop the patch again.

-serge


mmotm 2016-01-22-16-43 uploaded

2016-01-22 Thread akpm
The mm-of-the-moment snapshot 2016-01-22-16-43 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 4.4:
(patches marked "*" will be included in linux-next)

  origin.patch
* dax-fix-null-pointer-dereference-in-__dax_dbg.patch
* dax-fix-conversion-of-holes-to-pmds.patch
* pmem-add-wb_cache_pmem-to-the-pmem-api.patch
* dax-support-dirty-dax-entries-in-radix-tree.patch
* mm-add-find_get_entries_tag.patch
* dax-add-support-for-fsync-sync.patch
* ext2-call-dax_pfn_mkwrite-for-dax-fsync-msync.patch
* ext4-call-dax_pfn_mkwrite-for-dax-fsync-msync.patch
* xfs-call-dax_pfn_mkwrite-for-dax-fsync-msync.patch
* dax-never-rely-on-bhb_dev-being-set-by-get_block.patch
* tree-wide-use-kvfree-than-conditional-kfree-vfree.patch
* maintainers-return-arch-sh-to-maintained-state-with-new-maintainers.patch
  i-need-old-gcc.patch
  arch-alpha-kernel-systblss-remove-debug-check.patch
  drivers-gpu-drm-i915-intel_spritec-fix-build.patch
  drivers-gpu-drm-i915-intel_tvc-fix-build.patch
  arm-mm-do-not-use-virt_to_idmap-for-nommu-systems.patch
* thp-make-split_queue-per-node.patch
* thp-change-deferred_split_count-to-return-number-of-thp-in-queue.patch
* thp-change-deferred_split_count-to-return-number-of-thp-in-queue-fix.patch
* thp-limit-number-of-object-to-scan-on-deferred_split_scan.patch
* lib-test-string_helpersc-fix-and-improve-string_get_size-tests.patch
* phys_to_pfn_t-use-phys_addr_t.patch
* ocfs2-cluster-fix-memory-leak-in-o2hb_region_release.patch
* vmstat-remove-bug_on-from-vmstat_update.patch
* proc-revert-proc-pid-maps-annotation.patch
* proc-fix-missing-reference-of-mm.patch
* kallsyms-add-support-for-relative-offsets-in-kallsyms-address-table.patch
* kallsyms-add-support-for-relative-offsets-in-kallsyms-address-table-fix.patch
* fs-ext4-fsyncc-generic_file_fsync-call-based-on-barrier-flag.patch
* ocfs2-add-ocfs2_write_type_t-type-to-identify-the-caller-of-write.patch
* ocfs2-use-c_new-to-indicate-newly-allocated-extents.patch
* ocfs2-test-target-page-before-change-it.patch
* ocfs2-do-not-change-i_size-in-write_end-for-direct-io.patch
* ocfs2-return-the-physical-address-in-ocfs2_write_cluster.patch
* ocfs2-record-unwritten-extents-when-populate-write-desc.patch
* ocfs2-fix-sparse-file-data-ordering-issue-in-direct-io.patch
* ocfs2-code-clean-up-for-direct-io.patch
* ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue.patch
* ocfs2-fix-ip_unaligned_aio-deadlock-with-dio-work-queue-fix.patch
* ocfs2-take-ip_alloc_sem-in-ocfs2_dio_get_block-ocfs2_dio_end_io_write.patch
* ocfs2-fix-disk-file-size-and-memory-file-size-mismatch.patch
* ocfs2-fix-a-deadlock-issue-in-ocfs2_dio_end_io_write.patch
* ocfs2-dlm-fix-race-between-convert-and-recovery.patch
* 

[PATCH v2] mptlan: add checks for dma mapping errors

2016-01-22 Thread Alexey Khoroshilov
mpt_lan_sdu_send() and mpt_lan_post_receive_buckets() do not check
if mapping dma memory succeed.
The patch adds the checks and failure handling.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/message/fusion/mptlan.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/message/fusion/mptlan.c b/drivers/message/fusion/mptlan.c
index cbe96072a6cc..3b6c8a755713 100644
--- a/drivers/message/fusion/mptlan.c
+++ b/drivers/message/fusion/mptlan.c
@@ -734,6 +734,12 @@ mpt_lan_sdu_send (struct sk_buff *skb, struct net_device 
*dev)
 
 dma = pci_map_single(mpt_dev->pcidev, skb->data, skb->len,
 PCI_DMA_TODEVICE);
+   if (pci_dma_mapping_error(mpt_dev->pcidev, dma)) {
+   netif_stop_queue(dev);
+
+   printk (KERN_ERR "%s: dma mapping failed\n", __func__);
+   return NETDEV_TX_BUSY;
+   }
 
priv->SendCtl[ctx].skb = skb;
priv->SendCtl[ctx].dma = dma;
@@ -1232,6 +1238,14 @@ mpt_lan_post_receive_buckets(struct mpt_lan_priv *priv)
 
dma = pci_map_single(mpt_dev->pcidev, skb->data,
 len, PCI_DMA_FROMDEVICE);
+   if (pci_dma_mapping_error(mpt_dev->pcidev, 
dma)) {
+   printk (KERN_WARNING
+   MYNAM "/%s: dma mapping 
failed\n",
+   __func__);
+   
priv->mpt_rxfidx[++priv->mpt_rxfidx_tail] = ctx;
+   
spin_unlock_irqrestore(>rxfidx_lock, flags);
+   break;
+   }
 
priv->RcvCtl[ctx].skb = skb;
priv->RcvCtl[ctx].dma = dma;
-- 
1.9.1



Re: [PATCH V3 00/10] Introduce ACPI world to GICv3 & ITS irqchip

2016-01-22 Thread Robert Richter
On 19.01.16 14:11:13, Tomasz Nowicki wrote:
> Patches base on Suravee's ACPI GICv2m support:
> https://lkml.org/lkml/2015/12/10/475
> 
> The following git branch contains submitted patches along with
> the useful patches from the test point of view (mainly ACPI ARM64 PCI 
> support).
> https://github.com/semihalf-nowicki-tomasz/linux.git (gicv3-its-acpi-v3)
> 
> Series has been tested on Cavium ThunderX server.
> 
> v2 -> v3
> - rebased on top of 4.4
> - fixes and improvements for redistributor init via GICC structures
> - fixes as per kbuild reports
> 
> v1 -> v2
> - rebased on top of 4.4-rc4
> - use pci_msi_domain_get_msi_rid for requester ID to device ID translation
> 
> Hanjun Guo (1):
>   irqchip / GICv3: remove gic root node in ITS
> 
> Tomasz Nowicki (9):
>   irqchip / GICv3: Refactor gic_of_init() for GICv3 driver
>   irqchip / GICv3: Add ACPI support for GICv3+ initialization
>   irqchip,GICv3,ACPI: Add redistributor support via GICC structures.
>   irqchip, gicv3, its: Mark its_init() and its children as __init
>   irqchip, GICv3, ITS: Refator ITS dt init code to prepare for ACPI.
>   ARM64, ACPI, PCI: I/O Remapping Table (IORT) initial support.
>   irqchip, gicv3, its: Probe ITS in the ACPI way.
>   acpi, gicv3, msi: Factor out code that might be reused for ACPI
> equivalent.
>   acpi, gicv3, its: Use MADT ITS subtable to do PCI/MSI domain
> initialization.
> 
>  drivers/acpi/Kconfig |   3 +
>  drivers/acpi/Makefile|   1 +
>  drivers/acpi/iort.c  | 326 +
>  drivers/irqchip/Kconfig  |   1 +
>  drivers/irqchip/irq-gic-v3-its-pci-msi.c |  85 ++--
>  drivers/irqchip/irq-gic-v3-its.c | 143 +
>  drivers/irqchip/irq-gic-v3.c | 344 
> ++-
>  drivers/pci/msi.c|   3 +
>  include/linux/iort.h |  38 
>  include/linux/irqchip/arm-gic-v3.h   |   2 +-
>  10 files changed, 845 insertions(+), 101 deletions(-)
>  create mode 100644 drivers/acpi/iort.c
>  create mode 100644 include/linux/iort.h

Tested on various ThunderX single and dual node systems.

Tested-by: Robert Richter 

Thanks Tomasz,

-Robert


Re: [PATCH] mtd: nand: mpc5121: use 'of_machine_is_compatible' to simplify code

2016-01-22 Thread Brian Norris
On Mon, Jan 11, 2016 at 01:55:58PM +0100, Boris Brezillon wrote:
> On Sun, 10 Jan 2016 07:46:39 +0100
> Christophe JAILLET  wrote:
> 
> > The current code is the same as 'of_machine_is_compatible'.
> > So use it in order to remove a few lines of code and to be more
> > consistent with other parts of the kernel.
> > 
> > Signed-off-by: Christophe JAILLET 
> 
> Your patch looks good from a functional POV, but what is this board
> detection hack doing in the NAND driver code in the first place?
> 
> Let's say I didn't see that :-).

:)

> Reviewed-by: Boris Brezillon 

Applied to l2-mtd.git/next


Re: [Xen-devel] [PATCH v1 04/12] xen/hvmlite: Bootstrap HVMlite guest

2016-01-22 Thread Andrew Cooper
On 22/01/2016 23:32, Luis R. Rodriguez wrote:
> On Fri, Jan 22, 2016 at 04:35:50PM -0500, Boris Ostrovsky wrote:
>> +/*
>> + * See Documentation/x86/boot.txt.
>> + *
>> + * Version 2.12 supports Xen entry point but we will use default x86/PC
>> + * environment (i.e. hardware_subarch 0).
>> + */
>> +xen_hvmlite_boot_params.hdr.version = 0x212;
>> +xen_hvmlite_boot_params.hdr.type_of_loader = 9; /* Xen loader */
>> +}
> I realize PV got away with setting up boot_params on C code but best
> ask now that this new code is being introduced: why can't we just have
> the Xen hypervisor fill this in? It'd save us all this code.

I agree that this looks to be a mess.  Having said that, the DMLite boot
protocol is OS agnostic, and will be staying that way.

It happens to look suspiciously like multiboot; a flat 32bit protected
mode entry (at a location chosen in an ELF note), with %ebx pointing to
an in-ram structure containing things like a command line and module list.

I would have though the correct way to do direct Linux support would be
to have a very small init stub which constructs an appropriate zero
page, and lets the native entry point get on with things.

This covers the usecase where people wish to boot a specific Linux
kernel straight out of the dom0 filesystem.

For the alternative usecase of general OS support, dom0 would boot
something such as grub2 as the DMLite "kernel", at which point all
stooging around in the guests filesystem is done from guest context,
rather than control context (mitigating a substantial attack surface).

~Andrew


Re: [PATCH] autofs: show pipe inode in mount options

2016-01-22 Thread Ian Kent
On Fri, 2016-01-22 at 12:34 +0100, Stanislav Kinsburskiy wrote:
> Hi again,
> 
> I would like to ask about any progress with this patch?
> Any other requirements to make it able to merge?

Sorry for the delay.

Since there haven't been any comments from Al or Stephen I'm think I
should include it in the series I plan on sending to linux-next to
rename autofs4 to autofs (among other things).

I haven't had anything significant enough for autofs to warrant
maintaining a tree and sending push requests so I'll need to ask
Stephen what I need to do (perhaps you could offer some advise on that
now Stephen, please).

I'm also struggling to get back to this and carry out the needed
testing and I'll need to re-base the series too now but I'm getting
there.

I didn't see a follow up patch with an updated description, did I miss
it?

Ian


Re: [PATCH v15 5/6] fpga: fpga-area and fpga-bus: device tree control for FPGA

2016-01-22 Thread Moritz Fischer
On Fri, Jan 22, 2016 at 5:37 PM, atull  wrote:
> On Fri, 22 Jan 2016, Moritz Fischer wrote:
>
>> Alan,
>>
>> On Wed, Jan 20, 2016 at 8:24 PM,   wrote:
>>
>> > +static int fpga_area_probe(struct platform_device *pdev)
>> > +{
>> > +   struct device *dev = >dev;
>> > +   struct device_node *np = dev->of_node;
>> > +   struct fpga_area *area;
>> > +   int ret;
>> > +
>> > +   area = devm_kzalloc(dev, sizeof(*area), GFP_KERNEL);
>> > +   if (!area)
>> > +   return -ENOMEM;
>> > +
>> > +   INIT_LIST_HEAD(>bridge_list);
>> > +
>> > +   ret = fpga_bridge_register(dev, "FPGA Area", NULL, area);
>> > +   if (ret)
>> > +   return ret;
>> > +   area->br = dev_get_drvdata(dev);
>> > +
>> > +   if (of_property_read_string(np, "firmware-name",
>> > +   >firmware_name)) {
>> > +   of_platform_populate(np, of_default_bus_match_table, NULL, 
>> > dev);
>> > +   return 0;
>> > +   }
>>
>> This is the use case where the bootloader loaded the fpga, and you
>> just want to populate
>> the devices in the fabric, right?
>
> Hi Moritz,
>
> Yes
>
>>
>> > +   if (of_property_read_bool(np, "partial-reconfig"))
>> > +   area->flags |= FPGA_MGR_PARTIAL_RECONFIG;
>> > +
>> > +   ret = fpga_area_get_bus(area);
>> > +   if (ret) {
>> > +   dev_dbg(dev, "Should be child of a FPGA Bus");
>> > +   goto err_unreg;
>> > +   }
>>
>> Looking at socfpga.dtsi, would that mean that the fpgamgr0 node would
>> need to become a subnode of fpgabus@0 at the same place?
>>
>> i.e. /soc/fpgamgr@ff706000 -> /soc/fpgabus@0/fpgamgr@ff706000
>>
>> and the ranges property would be used to translate to the fpga memory
>> mapped space?
>>
>> I know we're going back and forth on this. I think Rob brought up a
>> similar question:
>> "Does the bus really go thru the fpgamgr and then the bridge as this
>> implies? Or fpgamgr is a sideband controller?"
>>
>> To which I think the answer is 'sideband' controller, yet with the new
>> bindings it looks like
>> the bus goes through the fpgamgr.
>
> Yeah, let's get this right.  First, let's be clear on the reason for FPGA Bus 
> to
> exist.  There may be >1 FPGA in a system.  I want the FPGA Bus bring together
> the bridges and manager that are associated with a certain FPGA.  This allows
> the system designer to specify which FPGA is getting programmed with which
> image/hardware.  So at minimum, we need some way of associating a FPGA Bus 
> with
> a FPGA Manager.

I see your argument for the FPGA bus. I agree that we need to
distinguish different FPGAs,
and we need a way to associate an area with a manager (and potentially bridges).

> As far as the target path is concerned, in the case of no bridges, we could 
> have
> the overlay target the FPGA Bus instead of the FPGA Manager.  That may be more
> logical.  This would just be a documentation change; I think fpga-area.c will
> work OK if you specify the FPGA bus as your target (the manager still has to
> be a child of the bus so the bus knows what manager to use).

Could the bus not just use a phandle to the manager? Or the area a
phandle to the bus?
Like that one could have potentially disjunct groups. Say I have a SPI
device that is in an FPGA area.
With our current system, I'd have a FPGA Manager that needs to be a
child of the bus as child of the SPI controller
with the memory addresses being addresses on the SOC's memory bus:

spi_ctrl@deadbeef {
fpga_bus@0 {
fpgamgr@f8007000 {
mgr regs etc...
... and now the SPI slaves ...
slave@42 {
};
};
};
};

(or something like that, maybe I'm confused)

with my proposed one it looks like it would work with any bus (maybe
then areas would have to register with
the manager or something like that...)

spi_ctrl@deadbeef {
fpga_area {
fpga-mgr = <>;
slave0@42 {
};
};
};

I keep bringing up SPI because it's another bus with addresses, not
because I think we should particularly
optimize for that use case ;-)

Cheers,

Moritz

PS: Feel free to 'break' my suggestion above. It's just an idea


[PATCH v1] block: fix bio splitting on max sectors

2016-01-22 Thread Ming Lei
After commit e36f62042880(block: split bios to maxpossible length),
bio can be splitted in the middle of a vector entry, then it
is easy to split out one bio which size isn't aligned with block
size, especially when the block size is bigger than 512.

This patch fixes the issue by making the max io size aligned
to logical block size.

Fixes: e36f62042880(block: split bios to maxpossible length)
Reported-by: Stefan Haberland 
Cc: Keith Busch 
Suggested-by: Linus Torvalds 
Signed-off-by: Ming Lei 
---
V1:
- avoid double shift as suggested by Linus
- compute 'max_sectors' once as suggested by Keith

 block/blk-merge.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 1699df5..888a7fe 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -70,6 +70,18 @@ static struct bio *blk_bio_write_same_split(struct 
request_queue *q,
return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
 }
 
+static inline unsigned get_max_io_size(struct request_queue *q,
+  struct bio *bio)
+{
+   unsigned sectors = blk_max_size_offset(q, bio->bi_iter.bi_sector);
+   unsigned mask = queue_logical_block_size(q) - 1;
+
+   /* aligned to logical block size */
+   sectors &= ~(mask >> 9);
+
+   return sectors;
+}
+
 static struct bio *blk_bio_segment_split(struct request_queue *q,
 struct bio *bio,
 struct bio_set *bs,
@@ -81,6 +93,7 @@ static struct bio *blk_bio_segment_split(struct request_queue 
*q,
unsigned front_seg_size = bio->bi_seg_front_size;
bool do_split = true;
struct bio *new = NULL;
+   const unsigned max_sectors = get_max_io_size(q, bio);
 
bio_for_each_segment(bv, bio, iter) {
/*
@@ -90,20 +103,19 @@ static struct bio *blk_bio_segment_split(struct 
request_queue *q,
if (bvprvp && bvec_gap_to_prev(q, bvprvp, bv.bv_offset))
goto split;
 
-   if (sectors + (bv.bv_len >> 9) >
-   blk_max_size_offset(q, bio->bi_iter.bi_sector)) 
{
+   if (sectors + (bv.bv_len >> 9) > max_sectors) {
/*
 * Consider this a new segment if we're splitting in
 * the middle of this vector.
 */
if (nsegs < queue_max_segments(q) &&
-   sectors < blk_max_size_offset(q,
-   bio->bi_iter.bi_sector)) {
+   sectors < max_sectors) {
nsegs++;
-   sectors = blk_max_size_offset(q,
-   bio->bi_iter.bi_sector);
+   sectors = max_sectors;
}
-   goto split;
+   if (sectors)
+   goto split;
+   /* Make this single bvec as the 1st segment */
}
 
if (bvprvp && blk_queue_cluster(q)) {
-- 
1.9.1



[PATCH 3.13.y-ckt 004/108] atl1c: Improve driver not to do order 4 GFP_ATOMIC allocation

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Pavel Machek 

[ Upstream commit f2a3771ae8aca879c32336c76ad05a017629bae2 ]

atl1c driver is doing order-4 allocation with GFP_ATOMIC
priority. That often breaks  networking after resume. Switch to
GFP_KERNEL. Still not ideal, but should be significantly better.

atl1c_setup_ring_resources() is called from .open() function, and
already uses GFP_KERNEL, so this change is safe.

Signed-off-by: Pavel Machek 
Acked-by: Michal Hocko 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c 
b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index 2980175..da0617c 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -1018,13 +1018,12 @@ static int atl1c_setup_ring_resources(struct 
atl1c_adapter *adapter)
sizeof(struct atl1c_recv_ret_status) * rx_desc_count +
8 * 4;
 
-   ring_header->desc = pci_alloc_consistent(pdev, ring_header->size,
-   _header->dma);
+   ring_header->desc = dma_zalloc_coherent(>dev, ring_header->size,
+   _header->dma, GFP_KERNEL);
if (unlikely(!ring_header->desc)) {
-   dev_err(>dev, "pci_alloc_consistend failed\n");
+   dev_err(>dev, "could not get memory for DMA buffer\n");
goto err_nomem;
}
-   memset(ring_header->desc, 0, ring_header->size);
/* init TPD ring */
 
tpd_ring[0].dma = roundup(ring_header->dma, 8);
-- 
1.9.1



[PATCH 3.13.y-ckt 003/108] gre6: allow to update all parameters via rtnl

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Nicolas Dichtel 

[ Upstream commit 6a61d4dbf4f54b5683e0f1e58d873cecca7cb977 ]

Parameters were updated only if the kernel was unable to find the tunnel
with the new parameters, ie only if core pamareters were updated (keys,
addr, link, type).
Now it's possible to update ttl, hoplimit, flowinfo and flags.

Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: Nicolas Dichtel 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 net/ipv6/ip6_gre.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 841abf1..7075205 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1561,13 +1561,11 @@ static int ip6gre_changelink(struct net_device *dev, 
struct nlattr *tb[],
return -EEXIST;
} else {
t = nt;
-
-   ip6gre_tunnel_unlink(ign, t);
-   ip6gre_tnl_change(t, , !tb[IFLA_MTU]);
-   ip6gre_tunnel_link(ign, t);
-   netdev_state_change(dev);
}
 
+   ip6gre_tunnel_unlink(ign, t);
+   ip6gre_tnl_change(t, , !tb[IFLA_MTU]);
+   ip6gre_tunnel_link(ign, t);
return 0;
 }
 
-- 
1.9.1



[PATCH 3.13.y-ckt 005/108] sctp: use the same clock as if sock source timestamps were on

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Marcelo Ricardo Leitner 

[ Upstream commit cb5e173ed7c03a0d4630ce68a95a186cce3cc872 ]

SCTP echoes a cookie o INIT ACK chunks that contains a timestamp, for
detecting stale cookies. This cookie is echoed back to the server by the
client and then that timestamp is checked.

Thing is, if the listening socket is using packet timestamping, the
cookie is encoded with ktime_get() value and checked against
ktime_get_real(), as done by __net_timestamp().

The fix is to sctp also use ktime_get_real(), so we can compare bananas
with bananas later no matter if packet timestamping was enabled or not.

Fixes: 52db882f3fc2 ("net: sctp: migrate cookie life from timeval to ktime")
Signed-off-by: Marcelo Ricardo Leitner 
Acked-by: Vlad Yasevich 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 net/sctp/sm_make_chunk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 72cf5a3..5c9f4ab 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1651,7 +1651,7 @@ static sctp_cookie_param_t *sctp_pack_cookie(const struct 
sctp_endpoint *ep,
 
/* Set an expiration time for the cookie.  */
cookie->c.expiration = ktime_add(asoc->cookie_life,
-ktime_get());
+ktime_get_real());
 
/* Copy the peer's init packet.  */
memcpy(>c.peer_init[0], init_chunk->chunk_hdr,
@@ -1779,7 +1779,7 @@ no_hmac:
if (sock_flag(ep->base.sk, SOCK_TIMESTAMP))
kt = skb_get_ktime(skb);
else
-   kt = ktime_get();
+   kt = ktime_get_real();
 
if (!asoc && ktime_compare(bear_cookie->expiration, kt) < 0) {
/*
-- 
1.9.1



[PATCH 3.13.y-ckt 001/108] ARC: Fix silly typo in MAINTAINERS file

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Vineet Gupta 

commit 30b9dbee895ff0d5cbf155bd1ef3f0f5992bca6f upstream.

Signed-off-by: Kamal Mostafa 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 511bf41..24ee99d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8301,7 +8301,7 @@ F:include/linux/swiotlb.h
 
 SYNOPSYS ARC ARCHITECTURE
 M: Vineet Gupta 
-L: linux-snps-...@lists.infraded.org
+L: linux-snps-...@lists.infradead.org
 S: Supported
 F: arch/arc/
 F: Documentation/devicetree/bindings/arc/
-- 
1.9.1



[PATCH 3.13.y-ckt 007/108] ipv6: sctp: clone options to avoid use after free

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Eric Dumazet 

[ Upstream commit 9470e24f35ab81574da54e69df90c1eb4a96b43f ]

SCTP is lacking proper np->opt cloning at accept() time.

TCP and DCCP use ipv6_dup_options() helper, do the same
in SCTP.

We might later factorize this code in a common helper to avoid
future mistakes.

Reported-by: Dmitry Vyukov 
Signed-off-by: Eric Dumazet 
Acked-by: Vlad Yasevich 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 net/sctp/ipv6.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index 7567e6f..526e88c 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -638,6 +638,7 @@ static struct sock *sctp_v6_create_accept_sk(struct sock 
*sk,
struct sock *newsk;
struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
struct sctp6_sock *newsctp6sk;
+   struct ipv6_txoptions *opt;
 
newsk = sk_alloc(sock_net(sk), PF_INET6, GFP_KERNEL, sk->sk_prot);
if (!newsk)
@@ -657,6 +658,13 @@ static struct sock *sctp_v6_create_accept_sk(struct sock 
*sk,
 
memcpy(newnp, np, sizeof(struct ipv6_pinfo));
 
+   rcu_read_lock();
+   opt = rcu_dereference(np->opt);
+   if (opt)
+   opt = ipv6_dup_options(newsk, opt);
+   RCU_INIT_POINTER(newnp->opt, opt);
+   rcu_read_unlock();
+
/* Initialize sk's sport, dport, rcv_saddr and daddr for getsockname()
 * and getpeername().
 */
-- 
1.9.1



[PATCH 3.13.y-ckt 006/108] sctp: update the netstamp_needed counter when copying sockets

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Marcelo Ricardo Leitner 

[ Upstream commit 01ce63c90170283a9855d1db4fe81934dddce648 ]

Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy
related to disabling sock timestamp.

When SCTP accepts an association or peel one off, it copies sock flags
but forgot to call net_enable_timestamp() if a packet timestamping flag
was copied, leading to extra calls to net_disable_timestamp() whenever
such clones were closed.

The fix is to call net_enable_timestamp() whenever we copy a sock with
that flag on, like tcp does.

Reported-by: Dmitry Vyukov 
Signed-off-by: Marcelo Ricardo Leitner 
Acked-by: Vlad Yasevich 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 include/net/sock.h | 2 ++
 net/core/sock.c| 2 --
 net/sctp/socket.c  | 3 +++
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2ffc5be..42377ef4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -695,6 +695,8 @@ enum sock_flags {
SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */
 };
 
+#define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << 
SOCK_TIMESTAMPING_RX_SOFTWARE))
+
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
 {
nsk->sk_flags = osk->sk_flags;
diff --git a/net/core/sock.c b/net/core/sock.c
index fa5f321..a5b42eb 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -422,8 +422,6 @@ static void sock_warn_obsolete_bsdism(const char *name)
}
 }
 
-#define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << 
SOCK_TIMESTAMPING_RX_SOFTWARE))
-
 static void sock_disable_timestamp(struct sock *sk, unsigned long flags)
 {
if (sk->sk_flags & flags) {
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 7fe4dec..57d255b 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -6966,6 +6966,9 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
newinet->mc_ttl = 1;
newinet->mc_index = 0;
newinet->mc_list = NULL;
+
+   if (newsk->sk_flags & SK_FLAGS_TIMESTAMP)
+   net_enable_timestamp();
 }
 
 static inline void sctp_copy_descendant(struct sock *sk_to,
-- 
1.9.1



[PATCH 3.13.y-ckt 010/108] pptp: verify sockaddr_len in pptp_bind() and pptp_connect()

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: WANG Cong 

[ Upstream commit 09ccfd238e5a0e670d8178cf50180ea81ae09ae1 ]

Reported-by: Dmitry Vyukov 
Signed-off-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 drivers/net/ppp/pptp.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
index 1dc628f..0710214 100644
--- a/drivers/net/ppp/pptp.c
+++ b/drivers/net/ppp/pptp.c
@@ -420,6 +420,9 @@ static int pptp_bind(struct socket *sock, struct sockaddr 
*uservaddr,
struct pptp_opt *opt = >proto.pptp;
int error = 0;
 
+   if (sockaddr_len < sizeof(struct sockaddr_pppox))
+   return -EINVAL;
+
lock_sock(sk);
 
opt->src_addr = sp->sa_addr.pptp;
@@ -441,6 +444,9 @@ static int pptp_connect(struct socket *sock, struct 
sockaddr *uservaddr,
struct flowi4 fl4;
int error = 0;
 
+   if (sockaddr_len < sizeof(struct sockaddr_pppox))
+   return -EINVAL;
+
if (sp->sa_protocol != PX_PROTO_PPTP)
return -EINVAL;
 
-- 
1.9.1



[PATCH 3.13.y-ckt 011/108] bluetooth: Validate socket address length in sco_sock_bind().

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: "David S. Miller" 

[ Upstream commit 5233252fce714053f0151680933571a2da9cbfb4 ]

Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 net/bluetooth/sco.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index 316dd4e..cd7d93d 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -459,6 +459,9 @@ static int sco_sock_bind(struct socket *sock, struct 
sockaddr *addr, int addr_le
if (!addr || addr->sa_family != AF_BLUETOOTH)
return -EINVAL;
 
+   if (addr_len < sizeof(struct sockaddr_sco))
+   return -EINVAL;
+
lock_sock(sk);
 
if (sk->sk_state != BT_OPEN) {
-- 
1.9.1



[PATCH 3.13.y-ckt 008/108] net: add validation for the socket syscall protocol argument

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Hannes Frederic Sowa 

[ Upstream commit 79462ad02e861803b3840cc782248c7359451cd9 ]

郭永刚 reported that one could simply crash the kernel as root by
using a simple program:

int socket_fd;
struct sockaddr_in addr;
addr.sin_port = 0;
addr.sin_addr.s_addr = INADDR_ANY;
addr.sin_family = 10;

socket_fd = socket(10,3,0x4000);
connect(socket_fd , ,16);

AF_INET, AF_INET6 sockets actually only support 8-bit protocol
identifiers. inet_sock's skc_protocol field thus is sized accordingly,
thus larger protocol identifiers simply cut off the higher bits and
store a zero in the protocol fields.

This could lead to e.g. NULL function pointer because as a result of
the cut off inet_num is zero and we call down to inet_autobind, which
is NULL for raw sockets.

kernel: Call Trace:
kernel:  [] ? inet_autobind+0x2e/0x70
kernel:  [] inet_dgram_connect+0x54/0x80
kernel:  [] SYSC_connect+0xd9/0x110
kernel:  [] ? ptrace_notify+0x5b/0x80
kernel:  [] ? syscall_trace_enter_phase2+0x108/0x200
kernel:  [] SyS_connect+0xe/0x10
kernel:  [] tracesys_phase2+0x84/0x89

I found no particular commit which introduced this problem.

CVE: CVE-2015-8543
Cc: Cong Wang 
Reported-by: 郭永刚 
Signed-off-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
[ kamal: backport to 3.13-stable: hardcoded U8_MAX value ]
Signed-off-by: Kamal Mostafa 
---
 include/net/sock.h | 1 +
 net/ax25/af_ax25.c | 3 +++
 net/decnet/af_decnet.c | 3 +++
 net/ipv4/af_inet.c | 3 +++
 net/ipv6/af_inet6.c| 3 +++
 net/irda/af_irda.c | 3 +++
 6 files changed, 16 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 42377ef4..6367a0d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -369,6 +369,7 @@ struct sock {
sk_no_check  : 2,
sk_userlocks : 4,
sk_protocol  : 8,
+#define SK_PROTOCOL_MAX ((u8)~0U)
sk_type  : 16;
kmemcheck_bitfield_end(flags);
int sk_wmem_queued;
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 7bb1605..8967279 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -806,6 +806,9 @@ static int ax25_create(struct net *net, struct socket 
*sock, int protocol,
struct sock *sk;
ax25_cb *ax25;
 
+   if (protocol < 0 || protocol > SK_PROTOCOL_MAX)
+   return -EINVAL;
+
if (!net_eq(net, _net))
return -EAFNOSUPPORT;
 
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index dd4d506..c030d5c 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -677,6 +677,9 @@ static int dn_create(struct net *net, struct socket *sock, 
int protocol,
 {
struct sock *sk;
 
+   if (protocol < 0 || protocol > SK_PROTOCOL_MAX)
+   return -EINVAL;
+
if (!net_eq(net, _net))
return -EAFNOSUPPORT;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7dd59a8..24a8918 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -263,6 +263,9 @@ static int inet_create(struct net *net, struct socket 
*sock, int protocol,
int try_loading_module = 0;
int err;
 
+   if (protocol < 0 || protocol >= IPPROTO_MAX)
+   return -EINVAL;
+
sock->state = SS_UNCONNECTED;
 
/* Look for the requested type/protocol pair. */
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 4fbdb70..d064527 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -110,6 +110,9 @@ static int inet6_create(struct net *net, struct socket 
*sock, int protocol,
int try_loading_module = 0;
int err;
 
+   if (protocol < 0 || protocol >= IPPROTO_MAX)
+   return -EINVAL;
+
/* Look for the requested type/protocol pair. */
 lookup_protocol:
err = -ESOCKTNOSUPPORT;
diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index de7db23..d70e530 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1105,6 +1105,9 @@ static int irda_create(struct net *net, struct socket 
*sock, int protocol,
 
IRDA_DEBUG(2, "%s()\n", __func__);
 
+   if (protocol < 0 || protocol > SK_PROTOCOL_MAX)
+   return -EINVAL;
+
if (net != _net)
return -EAFNOSUPPORT;
 
-- 
1.9.1



[PATCH 3.13.y-ckt 015/108] tools: Add a "make all" rule

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Kamal Mostafa 

commit f6ba98c5dc78708cb7fd29950c4a50c4c7e88f95 upstream.

Signed-off-by: Kamal Mostafa 
Acked-by: Pavel Machek 
Cc: Jiri Olsa 
Cc: Jonathan Cameron 
Cc: Pali Rohar 
Cc: Roberta Dobrescu 
Link: 
http://lkml.kernel.org/r/1447280736-2161-2-git-send-email-ka...@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo 
[ kamal: backport to 3.13-stable: build all tools for this version ]
Signed-off-by: Kamal Mostafa 
---
 tools/Makefile | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/Makefile b/tools/Makefile
index a9b0200..0f28b09 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -23,6 +23,10 @@ help:
@echo '  from the kernel command line to build and install one of'
@echo '  the tools above'
@echo ''
+   @echo '  $$ make tools/all'
+   @echo ''
+   @echo '  builds all tools.'
+   @echo ''
@echo '  $$ make tools/install'
@echo ''
@echo '  installs all tools.'
@@ -54,6 +58,11 @@ turbostat x86_energy_perf_policy: FORCE
 tmon: FORCE
$(call descend,thermal/$@)
 
+all: cgroup cpupower firewire lguest \
+   perf selftests turbostat usb \
+   virtio vm net x86_energy_perf_policy \
+   tmon
+
 cpupower_install:
$(call descend,power/$(@:_install=),install)
 
-- 
1.9.1



[PATCH 3.13.y-ckt 009/108] sh_eth: fix kernel oops in skb_put()

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Sergei Shtylyov 

[ Upstream commit 248be83dcb3feb3f6332eb3d010a016402138484 ]

In a low memory situation the following kernel oops occurs:

Unable to handle kernel NULL pointer dereference at virtual address 0050
pgd = 8490c000
[0050] *pgd=4651e831, *pte=, *ppte=
Internal error: Oops: 17 [#1] PREEMPT ARM
Modules linked in:
CPU: 0Not tainted  (3.4-at16 #9)
PC is at skb_put+0x10/0x98
LR is at sh_eth_poll+0x2c8/0xa10
pc : [<8035f780>]lr : [<8028bf50>]psr: 6113
sp : 84eb1a90  ip : 84eb1ac8  fp : 84eb1ac4
r10: 003f  r9 : 05ea  r8 : 
r7 :   r6 : 940453b0  r5 : 0003  r4 : 9381b180
r3 :   r2 :   r1 : 05ea  r0 : 
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 4248c059  DAC: 0015
Process klogd (pid: 2046, stack limit = 0x84eb02e8)
[...]

This is  because netdev_alloc_skb() fails and 'mdp->rx_skbuff[entry]' is left
NULL but sh_eth_rx() later  uses it without checking.  Add such check...

Reported-by: Yasushi SHOJI 
Signed-off-by: Sergei Shtylyov 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 drivers/net/ethernet/renesas/sh_eth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c 
b/drivers/net/ethernet/renesas/sh_eth.c
index 8d002f1..d04c457 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -1338,6 +1338,7 @@ static int sh_eth_rx(struct net_device *ndev, u32 
intr_status, int *quota)
if (mdp->cd->shift_rd0)
desc_status >>= 16;
 
+   skb = mdp->rx_skbuff[entry];
if (desc_status & (RD_RFS1 | RD_RFS2 | RD_RFS3 | RD_RFS4 |
   RD_RFS5 | RD_RFS6 | RD_RFS10)) {
ndev->stats.rx_errors++;
@@ -1353,12 +1354,11 @@ static int sh_eth_rx(struct net_device *ndev, u32 
intr_status, int *quota)
ndev->stats.rx_missed_errors++;
if (desc_status & RD_RFS10)
ndev->stats.rx_over_errors++;
-   } else {
+   } else  if (skb) {
if (!mdp->cd->hw_swap)
sh_eth_soft_swap(
phys_to_virt(ALIGN(rxdesc->addr, 4)),
pkt_len + 2);
-   skb = mdp->rx_skbuff[entry];
mdp->rx_skbuff[entry] = NULL;
if (mdp->cd->rpadir)
skb_reserve(skb, NET_IP_ALIGN);
-- 
1.9.1



[PATCH 3.13.y-ckt 012/108] af_unix: Revert 'lock_interruptible' in stream receive code

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Rainer Weikusat 

[ Upstream commit 3822b5c2fc62e3de8a0f33806ff279fb7df92432 ]

With b3ca9b02b00704053a38bfe4c31dbbb9c13595d0, the AF_UNIX SOCK_STREAM
receive code was changed from using mutex_lock(>readlock) to
mutex_lock_interruptible(>readlock) to prevent signals from being
delayed for an indefinite time if a thread sleeping on the mutex
happened to be selected for handling the signal. But this was never a
problem with the stream receive code (as opposed to its datagram
counterpart) as that never went to sleep waiting for new messages with the
mutex held and thus, wouldn't cause secondary readers to block on the
mutex waiting for the sleeping primary reader. As the interruptible
locking makes the code more complicated in exchange for no benefit,
change it back to using mutex_lock.

Signed-off-by: Rainer Weikusat 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 net/unix/af_unix.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 9ce79ed..31b88dc 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2088,14 +2088,7 @@ static int unix_stream_recvmsg(struct kiocb *iocb, 
struct socket *sock,
memset(_scm, 0, sizeof(tmp_scm));
}
 
-   err = mutex_lock_interruptible(>readlock);
-   if (unlikely(err)) {
-   /* recvmsg() in non blocking mode is supposed to return -EAGAIN
-* sk_rcvtimeo is not honored by mutex_lock_interruptible()
-*/
-   err = noblock ? -EAGAIN : -ERESTARTSYS;
-   goto out;
-   }
+   mutex_lock(>readlock);
 
if (flags & MSG_PEEK)
skip = sk_peek_offset(sk, flags);
@@ -2136,12 +2129,12 @@ again:
 
timeo = unix_stream_data_wait(sk, timeo, last);
 
-   if (signal_pending(current)
-   ||  mutex_lock_interruptible(>readlock)) {
+   if (signal_pending(current)) {
err = sock_intr_errno(timeo);
goto out;
}
 
+   mutex_lock(>readlock);
continue;
  unlock:
unix_state_unlock(sk);
-- 
1.9.1



[PATCH 3.13.y-ckt 017/108] net: ipmr: fix static mfc/dev leaks on table destruction

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Nikolay Aleksandrov 

commit 0e615e9601a15efeeb8942cf7cd4dadba0c8c5a7 upstream.

When destroying an mrt table the static mfc entries and the static
devices are kept, which leads to devices that can never be destroyed
(because of refcnt taken) and leaked memory, for example:
unreferenced object 0x880034c144c0 (size 192):
  comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
  hex dump (first 32 bytes):
98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.S.4
ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  
  backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] kmem_cache_alloc+0x190/0x300
[] ip_mroute_setsockopt+0x5cb/0x910
[] do_ip_setsockopt.isra.11+0x105/0xff0
[] ip_setsockopt+0x30/0xa0
[] raw_setsockopt+0x33/0x90
[] sock_common_setsockopt+0x14/0x20
[] SyS_setsockopt+0x71/0xc0
[] entry_SYSCALL_64_fastpath+0x16/0x7a
[] 0x

Make sure that everything is cleaned on netns destruction.

Signed-off-by: Nikolay Aleksandrov 
Reviewed-by: Cong Wang 
Signed-off-by: David S. Miller 
Signed-off-by: Kamal Mostafa 
---
 net/ipv4/ipmr.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index a99f914..2f8de5f 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -136,7 +136,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct 
sk_buff *skb,
  struct mfc_cache *c, struct rtmsg *rtm);
 static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
 int cmd);
-static void mroute_clean_tables(struct mr_table *mrt);
+static void mroute_clean_tables(struct mr_table *mrt, bool all);
 static void ipmr_expire_process(unsigned long arg);
 
 #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
@@ -348,7 +348,7 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 
id)
 static void ipmr_free_table(struct mr_table *mrt)
 {
del_timer_sync(>ipmr_expire_timer);
-   mroute_clean_tables(mrt);
+   mroute_clean_tables(mrt, true);
kfree(mrt);
 }
 
@@ -1199,7 +1199,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table 
*mrt,
  * Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr_table *mrt)
+static void mroute_clean_tables(struct mr_table *mrt, bool all)
 {
int i;
LIST_HEAD(list);
@@ -1208,8 +1208,9 @@ static void mroute_clean_tables(struct mr_table *mrt)
/* Shut down all active vif entries */
 
for (i = 0; i < mrt->maxvif; i++) {
-   if (!(mrt->vif_table[i].flags & VIFF_STATIC))
-   vif_delete(mrt, i, 0, );
+   if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+   continue;
+   vif_delete(mrt, i, 0, );
}
unregister_netdevice_many();
 
@@ -1217,7 +1218,7 @@ static void mroute_clean_tables(struct mr_table *mrt)
 
for (i = 0; i < MFC_LINES; i++) {
list_for_each_entry_safe(c, next, >mfc_cache_array[i], 
list) {
-   if (c->mfc_flags & MFC_STATIC)
+   if (!all && (c->mfc_flags & MFC_STATIC))
continue;
list_del_rcu(>list);
mroute_netlink_event(mrt, c, RTM_DELROUTE);
@@ -1252,7 +1253,7 @@ static void mrtsock_destruct(struct sock *sk)
NETCONFA_IFINDEX_ALL,
net->ipv4.devconf_all);
RCU_INIT_POINTER(mrt->mroute_sk, NULL);
-   mroute_clean_tables(mrt);
+   mroute_clean_tables(mrt, false);
}
}
rtnl_unlock();
-- 
1.9.1



[PATCH 3.13.y-ckt 016/108] efi: Disable interrupts around EFI calls, not in the epilog/prolog calls

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Ingo Molnar 

commit 23a0d4e8fa6d3a1d7fb819f79bcc0a3739c30ba9 upstream.

Tapasweni Pathak reported that we do a kmalloc() in efi_call_phys_prolog()
on x86-64 while having interrupts disabled, which is a big no-no, as
kmalloc() can sleep.

Solve this by removing the irq disabling from the prolog/epilog calls
around EFI calls: it's unnecessary, as in this stage we are single
threaded in the boot thread, and we don't ever execute this from
interrupt contexts.

Reported-by: Tapasweni Pathak 
Signed-off-by: Ingo Molnar 
Signed-off-by: Matt Fleming 
[ luis: backported to 3.10: adjusted context ]
Signed-off-by: Luis Henriques 
Signed-off-by: Kamal Mostafa 
---
 arch/x86/platform/efi/efi.c|  7 +++
 arch/x86/platform/efi/efi_32.c | 11 +++
 arch/x86/platform/efi/efi_64.c |  3 ---
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 30075f9..a697fa5 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -244,12 +244,19 @@ static efi_status_t __init 
phys_efi_set_virtual_address_map(
efi_memory_desc_t *virtual_map)
 {
efi_status_t status;
+   unsigned long flags;
 
efi_call_phys_prelog();
+
+   /* Disable interrupts around EFI calls: */
+   local_irq_save(flags);
status = efi_call_phys4(efi_phys.set_virtual_address_map,
memory_map_size, descriptor_size,
descriptor_version, virtual_map);
+   local_irq_restore(flags);
+
efi_call_phys_epilog();
+
return status;
 }
 
diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
index 40e4469..bebbee0 100644
--- a/arch/x86/platform/efi/efi_32.c
+++ b/arch/x86/platform/efi/efi_32.c
@@ -33,19 +33,16 @@
 
 /*
  * To make EFI call EFI runtime service in physical addressing mode we need
- * prelog/epilog before/after the invocation to disable interrupt, to
- * claim EFI runtime service handler exclusively and to duplicate a memory in
- * low memory space say 0 - 3G.
+ * prolog/epilog before/after the invocation to claim the EFI runtime service
+ * handler exclusively and to duplicate a memory mapping in low memory space,
+ * say 0 - 3G.
  */
 
-static unsigned long efi_rt_eflags;
 
 void efi_call_phys_prelog(void)
 {
struct desc_ptr gdt_descr;
 
-   local_irq_save(efi_rt_eflags);
-
load_cr3(initial_page_table);
__flush_tlb_all();
 
@@ -64,6 +61,4 @@ void efi_call_phys_epilog(void)
 
load_cr3(swapper_pg_dir);
__flush_tlb_all();
-
-   local_irq_restore(efi_rt_eflags);
 }
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 39a0e7f..2f6c1a9 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -40,7 +40,6 @@
 #include 
 
 static pgd_t *save_pgd __initdata;
-static unsigned long efi_flags __initdata;
 
 static void __init early_code_mapping_set_exec(int executable)
 {
@@ -66,7 +65,6 @@ void __init efi_call_phys_prelog(void)
int n_pgds;
 
early_code_mapping_set_exec(1);
-   local_irq_save(efi_flags);
 
n_pgds = DIV_ROUND_UP((max_pfn << PAGE_SHIFT), PGDIR_SIZE);
save_pgd = kmalloc(n_pgds * sizeof(pgd_t), GFP_KERNEL);
@@ -90,7 +88,6 @@ void __init efi_call_phys_epilog(void)
set_pgd(pgd_offset_k(pgd * PGDIR_SIZE), save_pgd[pgd]);
kfree(save_pgd);
__flush_tlb_all();
-   local_irq_restore(efi_flags);
early_code_mapping_set_exec(0);
 }
 
-- 
1.9.1



[PATCH 3.13.y-ckt 019/108] usb: gadget: pxa27x: fix suspend callback

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Felipe Balbi 

commit 391e6dcb37857d5659b53def2f41e2f56850d33c upstream.

pxa27x disconnects pullups on suspend but doesn't
notify the gadget driver about it, so gadget driver
can't disable the endpoints it was using.

This causes problems on resume because gadget core
will think endpoints are still enabled and just
ignore the following usb_ep_enable().

Fix this problem by calling
gadget_driver->disconnect().

Tested-by: Robert Jarzmik 
Signed-off-by: Felipe Balbi 
[ luis: backported to 3.16:
  - file rename: drivers/usb/gadget/udc/pxa27x_udc.c ->
drivers/usb/gadget/pxa27x_udc.c ]
Signed-off-by: Luis Henriques 
Signed-off-by: Kamal Mostafa 
---
 drivers/usb/gadget/pxa27x_udc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/gadget/pxa27x_udc.c b/drivers/usb/gadget/pxa27x_udc.c
index 3c97da7..22b0e20 100644
--- a/drivers/usb/gadget/pxa27x_udc.c
+++ b/drivers/usb/gadget/pxa27x_udc.c
@@ -2555,6 +2555,9 @@ static int pxa_udc_suspend(struct platform_device *_dev, 
pm_message_t state)
udc->pullup_resume = udc->pullup_on;
dplus_pullup(udc, 0);
 
+   if (udc->driver)
+   udc->driver->disconnect(>gadget);
+
return 0;
 }
 
-- 
1.9.1



[PATCH 3.13.y-ckt 014/108] KVM: x86: Reload pit counters for all channels when restoring state

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Andrew Honig 

commit 0185604c2d82c560dab2f2933a18f797e74ab5a8 upstream.

Currently if userspace restores the pit counters with a count of 0
on channels 1 or 2 and the guest attempts to read the count on those
channels, then KVM will perform a mod of 0 and crash.  This will ensure
that 0 values are converted to 65536 as per the spec.

This is CVE-2015-7513.

Signed-off-by: Andy Honig 
Signed-off-by: Paolo Bonzini 
Cc: Moritz Muehlenhoff 
[ luis: backported to 3.16: adjusted context ]
Signed-off-by: Luis Henriques 
Signed-off-by: Kamal Mostafa 
---
 arch/x86/kvm/x86.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f83ea4..20ced12 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3505,10 +3505,12 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct 
kvm_pit_state *ps)
 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 {
int r = 0;
+   int i;
 
mutex_lock(>arch.vpit->pit_state.lock);
memcpy(>arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state));
-   kvm_pit_load_count(kvm, 0, ps->channels[0].count, 0);
+   for (i = 0; i < 3; i++)
+   kvm_pit_load_count(kvm, i, ps->channels[i].count, 0);
mutex_unlock(>arch.vpit->pit_state.lock);
return r;
 }
@@ -3529,6 +3531,7 @@ static int kvm_vm_ioctl_get_pit2(struct kvm *kvm, struct 
kvm_pit_state2 *ps)
 static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct kvm_pit_state2 *ps)
 {
int r = 0, start = 0;
+   int i;
u32 prev_legacy, cur_legacy;
mutex_lock(>arch.vpit->pit_state.lock);
prev_legacy = kvm->arch.vpit->pit_state.flags & 
KVM_PIT_FLAGS_HPET_LEGACY;
@@ -3538,7 +3541,8 @@ static int kvm_vm_ioctl_set_pit2(struct kvm *kvm, struct 
kvm_pit_state2 *ps)
memcpy(>arch.vpit->pit_state.channels, >channels,
   sizeof(kvm->arch.vpit->pit_state.channels));
kvm->arch.vpit->pit_state.flags = ps->flags;
-   kvm_pit_load_count(kvm, 0, kvm->arch.vpit->pit_state.channels[0].count, 
start);
+   for (i = 0; i < 3; i++)
+   kvm_pit_load_count(kvm, i, 
kvm->arch.vpit->pit_state.channels[i].count, start);
mutex_unlock(>arch.vpit->pit_state.lock);
return r;
 }
-- 
1.9.1



[PATCH 3.13.y-ckt 026/108] drm/ttm: Fixed a read/write lock imbalance

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Thomas Hellstrom 

commit 025af189fb44250206dd8a32fa4a682392af3301 upstream.

In ttm_write_lock(), the uninterruptible path should call
__ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx hang
on F23 start up.

syeh: Extracted this from one of Thomas' internal patches.

Signed-off-by: Thomas Hellstrom 
Reviewed-by: Sinclair Yeh 
Signed-off-by: Kamal Mostafa 
---
 drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_lock.c b/drivers/gpu/drm/ttm/ttm_lock.c
index 3daa9a3..9b7cb11 100644
--- a/drivers/gpu/drm/ttm/ttm_lock.c
+++ b/drivers/gpu/drm/ttm/ttm_lock.c
@@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock *lock, bool 
interruptible)
spin_unlock(>lock);
}
} else
-   wait_event(lock->queue, __ttm_read_lock(lock));
+   wait_event(lock->queue, __ttm_write_lock(lock));
 
return ret;
 }
-- 
1.9.1



[PATCH 3.13.y-ckt 013/108] KEYS: Fix race between read and revoke

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: David Howells 

commit b4a1b4f5047e4f54e194681125c74c0aa64d637d upstream.

This fixes CVE-2015-7550.

There's a race between keyctl_read() and keyctl_revoke().  If the revoke
happens between keyctl_read() checking the validity of a key and the key's
semaphore being taken, then the key type read method will see a revoked key.

This causes a problem for the user-defined key type because it assumes in
its read method that there will always be a payload in a non-revoked key
and doesn't check for a NULL pointer.

Fix this by making keyctl_read() check the validity of a key after taking
semaphore instead of before.

I think the bug was introduced with the original keyrings code.

This was discovered by a multithreaded test program generated by syzkaller
(http://github.com/google/syzkaller).  Here's a cleaned up version:

#include 
#include 
#include 
void *thr0(void *arg)
{
key_serial_t key = (unsigned long)arg;
keyctl_revoke(key);
return 0;
}
void *thr1(void *arg)
{
key_serial_t key = (unsigned long)arg;
char buffer[16];
keyctl_read(key, buffer, 16);
return 0;
}
int main()
{
key_serial_t key = add_key("user", "%", "foo", 3, 
KEY_SPEC_USER_KEYRING);
pthread_t th[5];
pthread_create([0], 0, thr0, (void *)(unsigned long)key);
pthread_create([1], 0, thr1, (void *)(unsigned long)key);
pthread_create([2], 0, thr0, (void *)(unsigned long)key);
pthread_create([3], 0, thr1, (void *)(unsigned long)key);
pthread_join(th[0], 0);
pthread_join(th[1], 0);
pthread_join(th[2], 0);
pthread_join(th[3], 0);
return 0;
}

Build as:

cc -o keyctl-race keyctl-race.c -lkeyutils -lpthread

Run as:

while keyctl-race; do :; done

as it may need several iterations to crash the kernel.  The crash can be
summarised as:

BUG: unable to handle kernel NULL pointer dereference at 
0010
IP: [] user_read+0x56/0xa3
...
Call Trace:
 [] keyctl_read_key+0xb6/0xd7
 [] SyS_keyctl+0x83/0xe0
 [] entry_SYSCALL_64_fastpath+0x12/0x6f

Reported-by: Dmitry Vyukov 
Signed-off-by: David Howells 
Tested-by: Dmitry Vyukov 
Signed-off-by: James Morris 
Signed-off-by: Kamal Mostafa 
---
 security/keys/keyctl.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index cee72ce..cb1eef9 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -744,16 +744,16 @@ long keyctl_read_key(key_serial_t keyid, char __user 
*buffer, size_t buflen)
 
/* the key is probably readable - now try to read it */
 can_read_key:
-   ret = key_validate(key);
-   if (ret == 0) {
-   ret = -EOPNOTSUPP;
-   if (key->type->read) {
-   /* read the data with the semaphore held (since we
-* might sleep) */
-   down_read(>sem);
+   ret = -EOPNOTSUPP;
+   if (key->type->read) {
+   /* Read the data with the semaphore held (since we might sleep)
+* to protect against the key being updated or revoked.
+*/
+   down_read(>sem);
+   ret = key_validate(key);
+   if (ret == 0)
ret = key->type->read(key, buffer, buflen);
-   up_read(>sem);
-   }
+   up_read(>sem);
}
 
 error2:
-- 
1.9.1



[PATCH 3.13.y-ckt 018/108] fuse: break infinite loop in fuse_fill_write_pages()

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Roman Gushchin 

commit 3ca8138f014a913f98e6ef40e939868e1e9ea876 upstream.

I got a report about unkillable task eating CPU. Further
investigation shows, that the problem is in the fuse_fill_write_pages()
function. If iov's first segment has zero length, we get an infinite
loop, because we never reach iov_iter_advance() call.

Fix this by calling iov_iter_advance() before repeating an attempt to
copy data from userspace.

A similar problem is described in 124d3b7041f ("fix writev regression:
pan hanging unkillable and un-straceable"). If zero-length segmend
is followed by segment with invalid address,
iov_iter_fault_in_readable() checks only first segment (zero-length),
iov_iter_copy_from_user_atomic() skips it, fails at second and
returns zero -> goto again without skipping zero-length segment.

Patch calls iov_iter_advance() before goto again: we'll skip zero-length
segment at second iteraction and iov_iter_fault_in_readable() will detect
invalid address.

Special thanks to Konstantin Khlebnikov, who helped a lot with the commit
description.

Cc: Andrew Morton 
Cc: Maxim Patlasov 
Cc: Konstantin Khlebnikov 
Signed-off-by: Roman Gushchin 
Signed-off-by: Miklos Szeredi 
Fixes: ea9b9907b82a ("fuse: implement perform_write")
Signed-off-by: Kamal Mostafa 
---
 fs/fuse/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7e70506..f536000 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -993,6 +993,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_req *req,
 
mark_page_accessed(page);
 
+   iov_iter_advance(ii, tmp);
if (!tmp) {
unlock_page(page);
page_cache_release(page);
@@ -1005,7 +1006,6 @@ static ssize_t fuse_fill_write_pages(struct fuse_req *req,
req->page_descs[req->num_pages].length = tmp;
req->num_pages++;
 
-   iov_iter_advance(ii, tmp);
count += tmp;
pos += tmp;
offset += tmp;
-- 
1.9.1



[PATCH 3.13.y-ckt 023/108] USB: serial: Another Infineon flash loader USB ID

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Jonas Jonsson 

commit a0e80fbd56b4573de997c9a088a33abbc1121400 upstream.

The flash loader has been seen on a Telit UE910 modem. The flash loader
is a bit special, it presents both an ACM and CDC Data interface but
only the latter is useful. Unless a magic string is sent to the device
it will disappear and the regular modem device appears instead.

Signed-off-by: Jonas Jonsson 
Tested-by: Daniele Palmas 
Signed-off-by: Johan Hovold 
Signed-off-by: Kamal Mostafa 
---
 drivers/usb/serial/usb-serial-simple.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/usb-serial-simple.c 
b/drivers/usb/serial/usb-serial-simple.c
index 147f019..2008329 100644
--- a/drivers/usb/serial/usb-serial-simple.c
+++ b/drivers/usb/serial/usb-serial-simple.c
@@ -48,6 +48,7 @@ DEVICE(funsoft, FUNSOFT_IDS);
 
 /* Infineon Flashloader driver */
 #define FLASHLOADER_IDS()  \
+   { USB_DEVICE_INTERFACE_CLASS(0x058b, 0x0041, USB_CLASS_CDC_DATA) }, \
{ USB_DEVICE(0x8087, 0x0716) }
 DEVICE(flashloader, FLASHLOADER_IDS);
 
-- 
1.9.1



[PATCH 3.13.y-ckt 022/108] USB: cdc_acm: Ignore Infineon Flash Loader utility

2016-01-22 Thread Kamal Mostafa
3.13.11-ckt33 -stable review patch.  If anyone has any objections, please let 
me know.

---8<

From: Jonas Jonsson 

commit f33a7f72e5fc033daccbb8d4753d7c5c41a4d67b upstream.

Some modems, such as the Telit UE910, are using an Infineon Flash Loader
utility. It has two interfaces, 2/2/0 (Abstract Modem) and 10/0/0 (CDC
Data). The latter can be used as a serial interface to upgrade the
firmware of the modem. However, that isn't possible when the cdc-acm
driver takes control of the device.

The following is an explanation of the behaviour by Daniele Palmas during
discussion on linux-usb.

"This is what happens when the device is turned on (without modifying
the drivers):

[155492.352031] usb 1-3: new high-speed USB device number 27 using ehci-pci
[155492.485429] usb 1-3: config 1 interface 0 altsetting 0 endpoint 0x81 has an 
invalid bInterval 255, changing to 11
[155492.485436] usb 1-3: New USB device found, idVendor=058b, idProduct=0041
[155492.485439] usb 1-3: New USB device strings: Mfr=0, Product=0, 
SerialNumber=0
[155492.485952] cdc_acm 1-3:1.0: ttyACM0: USB ACM device

This is the flashing device that is caught by the cdc-acm driver. Once
the ttyACM appears, the application starts sending a magic string
(simple write on the file descriptor) to keep the device in flashing
mode. If this magic string is not properly received in a certain time
interval, the modem goes on in normal operative mode:

[155493.748094] usb 1-3: USB disconnect, device number 27
[155494.916025] usb 1-3: new high-speed USB device number 28 using ehci-pci
[155495.059978] usb 1-3: New USB device found, idVendor=1bc7, idProduct=0021
[155495.059983] usb 1-3: New USB device strings: Mfr=1, Product=2, 
SerialNumber=3
[155495.059986] usb 1-3: Product: 6 CDC-ACM + 1 CDC-ECM
[155495.059989] usb 1-3: Manufacturer: Telit
[155495.059992] usb 1-3: SerialNumber: 359658044004697
[155495.138958] cdc_acm 1-3:1.0: ttyACM0: USB ACM device
[155495.140832] cdc_acm 1-3:1.2: ttyACM1: USB ACM device
[155495.142827] cdc_acm 1-3:1.4: ttyACM2: USB ACM device
[155495.144462] cdc_acm 1-3:1.6: ttyACM3: USB ACM device
[155495.145967] cdc_acm 1-3:1.8: ttyACM4: USB ACM device
[155495.147588] cdc_acm 1-3:1.10: ttyACM5: USB ACM device
[155495.154322] cdc_ether 1-3:1.12 wwan0: register 'cdc_ether' at 
usb-:00:1a.7-3, Mobile Broadband Network Device, 00:00:11:12:13:14

Using the cdc-acm driver, the string, though being sent in the same way
than using the usb-serial-simple driver (I can confirm that the data is
passing properly since I used an hw usb sniffer), does not make the
device to stay in flashing mode."

Signed-off-by: Jonas Jonsson 
Tested-by: Daniele Palmas 
Signed-off-by: Johan Hovold 
Signed-off-by: Kamal Mostafa 
---
 drivers/usb/class/cdc-acm.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 06cdbb4..87f216d 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1719,6 +1719,11 @@ static const struct usb_device_id acm_ids[] = {
},
 #endif
 
+   /* Exclude Infineon Flash Loader utility */
+   { USB_DEVICE(0x058b, 0x0041),
+   .driver_info = IGNORE_DEVICE,
+   },
+
/* control interfaces without any protocol set */
{ USB_INTERFACE_INFO(USB_CLASS_COMM, USB_CDC_SUBCLASS_ACM,
USB_CDC_PROTO_NONE) },
-- 
1.9.1



[PATCH] usb: dwc2: host: Properly set the HFIR

2016-01-22 Thread Douglas Anderson
According to the most up to date version of the dwc2 databook, the FRINT
field of the HFIR register should be programmed to:
* 125 us * (PHY clock freq for HS) - 1
* 1000 us * (PHY clock freq for FS/LS) - 1

This is opposed to older versions of the doc that claimed it should be:
* 125 us * (PHY clock freq for HS)
* 1000 us * (PHY clock freq for FS/LS)

In case you didn't spot it, the difference is the "- 1".

Let's add the "- 1" to match the newest user manual.  It's presumed that
the "- 1" should have always been there and that this was always a
documentation error.  If some hardware needs the "- 1" and other
hardware doesn't, we'll have to add a configuration parameter for it in
the future.

I checked things before and after this patch on rk3288 using a Total
Phase Beagle 5000 analyzer.

Before this patch, a low speed mouse shows constant Frame Timing Jitter
errors.  After this patch errors have gone away.

Before this patch SOF packets move forward about 1 us per 4 ms.  After
this patch the SOF packets move backward about 1 us per 255 ms.  Some
specific SOF timestamps from the analyzer are below.

Before:
  6.603.790
  6.603.916
  6.604.041
  6.604.166
  ...
  6.607.541
  6.607.667
  6.607.792
  6.607.917
  ...
  6.611.417
  6.611.543
  6.611.668
  6.611.793

After:
  6.215.159
  6.215.284
  6.215.408
  6.215.533
  6.215.658
  ...
  6.470.658
  6.470.783
  6.470.907
  ...
  6.726.032
  6.726.157
  6.725.281
  6.725.406

Signed-off-by: Douglas Anderson 
---
 drivers/usb/dwc2/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/dwc2/core.c b/drivers/usb/dwc2/core.c
index 39a0fa8a4c0a..c7798476e25b 100644
--- a/drivers/usb/dwc2/core.c
+++ b/drivers/usb/dwc2/core.c
@@ -2251,10 +2251,10 @@ u32 dwc2_calc_frame_interval(struct dwc2_hsotg *hsotg)
 
if ((hprt0 & HPRT0_SPD_MASK) >> HPRT0_SPD_SHIFT == HPRT0_SPD_HIGH_SPEED)
/* High speed case */
-   return 125 * clock;
+   return 125 * clock - 1;
else
/* FS/LS case */
-   return 1000 * clock;
+   return 1000 * clock - 1;
 }
 
 /**
-- 
2.7.0.rc3.207.g0ac5344



  1   2   3   4   5   6   7   8   9   10   >