32-bit powerpc, aty128fb: vmap allocation for size 135168 failed

2017-08-17 Thread Meelis Roos
I was trying 4.13.0-rc5-00075-gac9a40905a61 on my PowerMac G4 with 1G 
RAM and after some time of sddm respawning and X trying to restart, 
dmesg is full of messages about vmap allocation failures.

Maybe the aty128fb is leaking ROM allocations or something like that?

sddm has been crashing eearlier too but I have not investigated it yet, 
but right after reboot the messages are about ATI ROM contents:

Aug 17 23:53:57 pohl kernel: [ 2940.146546] aty128fb :00:10.0: Invalid PCI 
ROM header signature: expecting 0xaa55, got 0x
Aug 17 23:54:02 pohl kernel: [ 2944.804838] aty128fb :00:10.0: Invalid PCI 
ROM header signature: expecting 0xaa55, got 0x1110
Aug 17 23:54:06 pohl kernel: [ 2948.992457] sddm[14039]: unhandled signal 11 at 
0030 nip 0030 lr 0f55f858 code 30001

Then it changes to groups like this:
Aug 17 23:54:29 pohl kernel: [ 2971.514484] sddm[14093]: unhandled signal 11 at 
0090 nip 0090 lr 0f55f858 code 30001
Aug 17 23:54:30 pohl kernel: [ 2972.994486] aty128fb :00:10.0: Invalid PCI 
ROM header signature: expecting 0xaa55, got 0x
Aug 17 23:54:30 pohl kernel: [ 2973.040595] vmap allocation for size 3149824 
failed: use vmalloc= to increase size
Aug 17 23:54:33 pohl kernel: [ 2976.245220] aty128fb :00:10.0: Invalid PCI 
ROM header signature: expecting 0xaa55, got 0x
Aug 17 23:54:33 pohl kernel: [ 2976.295452] vmap allocation for size 3149824 
failed: use vmalloc= to increase size

And finally it becomes just vmalloc errors, no ATI ROM messages at all:

[32075.316981] sddm[14563]: unhandled signal 11 at 0050 nip 0050 lr 
0f55f858 code 30001
[32076.766965] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32076.788476] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32079.124735] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32079.146326] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32081.305352] sddm[14590]: unhandled signal 11 at 0050 nip 0050 lr 
0f55f858 code 30001
[32082.768060] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32082.789530] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32085.125847] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32085.147228] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32087.311193] sddm[14617]: unhandled signal 11 at 00430068 nip 00430068 lr 
0f55f858 code 30001
[32088.767983] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32088.789536] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32091.108732] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32091.130348] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32093.285222] sddm[14644]: unhandled signal 11 at 0050 nip 0050 lr 
0f55f858 code 30001
[32094.767678] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32094.789329] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32097.128241] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32097.149745] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32099.293082] sddm[14671]: unhandled signal 11 at 0030 nip 0030 lr 
0f55f858 code 30001
[32100.768030] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32100.789505] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size
[32103.124223] vmap allocation for size 135168 failed: use vmalloc= to 
increase size
[32103.145881] vmap allocation for size 3149824 failed: use vmalloc= to 
increase size


# cat /proc/meminfo 
MemTotal:1031440 kB
MemFree:  135948 kB
MemAvailable: 889288 kB
Buffers:  215448 kB
Cached:   573780 kB
SwapCached:   44 kB
Active:   430480 kB
Inactive: 392416 kB
Active(anon):  14984 kB
Inactive(anon):92276 kB
Active(file): 415496 kB
Inactive(file):   300140 kB
Unevictable:   0 kB
Mlocked:   0 kB
HighTotal:262144 kB
HighFree:  22140 kB
LowTotal: 769296 kB
LowFree:  113808 kB
SwapTotal:848984 kB
SwapFree: 848464 kB
Dirty:   180 kB
Writeback: 0 kB
AnonPages: 33640 kB
Mapped:84392 kB
Shmem: 73592 kB
Slab:  66972 kB
SReclaimable:  54952 kB
SUnreclaim:12020 kB
KernelStack: 680 kB
PageTables: 1344 kB
NFS_Unstable:  0 kB
Bounce:0 kB
WritebackTmp:  0 kB
CommitLimit: 1364704 kB
Committed_AS: 351992 kB
VmallocTotal: 211808 kB
VmallocUsed:   0 kB
VmallocChunk:  0 kB



-- 
Meelis Roos (mr...@linux.ee)


Re: [RFC v7 25/25] powerpc: Enable pkey subsystem

2017-08-17 Thread Michael Ellerman
Ram Pai  writes:
> On Thu, Aug 17, 2017 at 05:30:27PM -0300, Thiago Jung Bauermann wrote:
>> Ram Pai  writes:
>> > On Thu, Aug 10, 2017 at 06:27:34PM -0300, Thiago Jung Bauermann wrote:
>> >> Ram Pai  writes:
>> >> > @@ -227,6 +229,24 @@ static inline void pkey_mm_init(struct mm_struct 
>> >> > *mm)
>> >> > mm->context.execute_only_pkey = -1;
>> >> >  }
>> >> >
>> >> > +static inline void pkey_mmu_values(int total_data, int total_execute)
>> >> > +{
>> >> > +   /*
>> >> > +* since any pkey can be used for data or execute, we
>> >> > +* will  just  treat all keys as equal and track them
>> >> > +* as one entity.
>> >> > +*/
>> >> > +   pkeys_total = total_data + total_execute;
>> >> > +}
>> >> 
>> >> Right now this works because the firmware reports 0 execute keys in the
>> >> device tree, but if (when?) it is fixed to report 32 execute keys as
>> >> well as 32 data keys (which are the same keys), any place using
>> >> pkeys_total expecting it to mean the number of keys that are available
>> >> will be broken. This includes pkey_initialize and mm_pkey_is_allocated.
>> >
>> > Good point. we should just ignore total_execute. It should
>> > be the same value as total_data on the latest platforms.
>> > On older platforms it will continue to be zero.
>> 
>> Indeed. There should just be a special case to disable execute
>> protection for P7.
>
> Ok. we should disable execute protection for P7 and earlier generations of 
> CPU.

You should do what the device tree says you can do.

If it says there are no execute keys then you shouldn't touch the IAMR.

If you don't want to handle the case where there are 0 execute keys but
some data keys then you should do:

  total_keys = min(data_keys, exec_keys);


cheers


Re: [RFC v7 24/25] powerpc: Deliver SEGV signal on pkey violation

2017-08-17 Thread Michael Ellerman
Ram Pai  writes:
> On Fri, Aug 11, 2017 at 08:26:30PM +1000, Michael Ellerman wrote:
>> Thiago Jung Bauermann  writes:
>> 
>> > Ram Pai  writes:
>> >
>> >> The value of the AMR register at the time of exception
>> >> is made available in gp_regs[PT_AMR] of the siginfo.
...
>> 
>> I don't understand why we are putting it in there at all?
>> 
>> Is there some special handling of the actual register on signals? I
>> haven't seen it. In which case the process can get the value of AMR by
>> reading the register. ??
>
> The value of AMR register at the time of the key-exception may not be
> the same when the signal handler is invoked. 

Why not?

cheers


Re: WARNING: CPU: 15 PID: 0 at block/blk-mq.c:1111 __blk_mq_run_hw_queue+0x1d8/0x1f0

2017-08-17 Thread Abdul Haleem
On Thu, 2017-08-17 at 14:18 -0500, Brian King wrote:
> On 08/17/2017 10:32 AM, Bart Van Assche wrote:
> > On Wed, 2017-08-16 at 15:10 -0500, Brian King wrote:
> >> On 08/16/2017 01:15 PM, Bart Van Assche wrote:
> >>> On Wed, 2017-08-16 at 23:37 +0530, Abdul Haleem wrote:
>  Linux-next booted with the below warnings on powerpc
> 
>  [ ... ]
> 
>  boot warnings:
>  --
>  kvm: exiting hardware virtualization
>  [ cut here ]
>  WARNING: CPU: 15 PID: 0 at block/blk-mq.c: __blk_mq_run_hw_queue
>  +0x1d8/0x1f0
>  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
>  Call Trace:
>  [c0037990] [c088f7b0] __blk_mq_delay_run_hw_queue
>  +0x1f0/0x210
>  [c00379d0] [c088fcb8] blk_mq_start_hw_queue+0x58/0x80
>  [c00379f0] [c088fd40] blk_mq_start_hw_queues+0x60/0xb0
>  [c0037a30] [c0ae2b54] scsi_kick_queue+0x34/0xa0
>  [c0037a50] [c0ae2f70] scsi_run_queue+0x3b0/0x660
>  [c0037ac0] [c0ae7ed4] scsi_run_host_queues+0x64/0xc0
>  [c0037b00] [c0ae7f64] scsi_unblock_requests+0x34/0x60
>  [c0037b20] [c0b14998] ipr_ioa_bringdown_done+0xf8/0x3a0
>  [c0037bc0] [c0b12528] ipr_reset_ioa_job+0xd8/0x170
>  [c0037c00] [c0b18790] ipr_reset_timer_done+0x110/0x160
>  [c0037c50] [c024db50] call_timer_fn+0xa0/0x3a0
>  [c0037ce0] [c024e058] expire_timers+0x1b8/0x350
>  [c0037d50] [c024e2f0] run_timer_softirq+0x100/0x3e0
>  [c0037df0] [c0162edc] __do_softirq+0x20c/0x620
>  [c0037ee0] [c0163a80] irq_exit+0x230/0x290
>  [c0037f10] [c001d770] __do_irq+0x170/0x410
>  [c0037f90] [c003ea20] call_do_irq+0x14/0x24
>  [c007f84e3a70] [c001dae0] do_IRQ+0xd0/0x190
>  [c007f84e3ac0] [c0008c58] hardware_interrupt_common
>  +0x158/0x160
> >>>
> >>> Hello Brian,
> >>>
> >>> In the MAINTAINERS file I found the following:
> >>>
> >>> IBM Power Linux RAID adapter
> >>> M:  Brian King 
> >>> S:  Supported
> >>> F:  drivers/scsi/ipr.*
> >>>
> >>> Is that information up-to-date? Do you agree that the above message 
> >>> indicates
> >>> a bug in the ipr driver?
> >>
> >> Yes. Can you try with this patch that is in 4.13/scsi-fixes:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.13/scsi-fixes=b0e17a9b0df29590c45dfb296f541270a5941f41
> > 

Hi Brian,

The patch fixes the warning, Thanks for the fix.

Tested-by : Abdul Haleem 

-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





[PATCH 2/2] kvm/xive: Add missing barriers and document them

2017-08-17 Thread Benjamin Herrenschmidt
This adds missing memory barriers to order updates/tests of
the virtual CPPR and MFRR, thus fixing a lost IPI problem.

While at it also document all barriers in this file

This fixes a bug causing guest IPIs to occasionally get lost.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/kvm/book3s_xive_template.c | 57 +++--
 1 file changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive_template.c 
b/arch/powerpc/kvm/book3s_xive_template.c
index 150be86b1018..d1ed2c41b5d2 100644
--- a/arch/powerpc/kvm/book3s_xive_template.c
+++ b/arch/powerpc/kvm/book3s_xive_template.c
@@ -17,6 +17,12 @@ static void GLUE(X_PFX,ack_pending)(struct kvmppc_xive_vcpu 
*xc)
u16 ack;
 
/*
+* Ensure any previous store to CPPR is ordered vs.
+* the subsequent loads from PIPR or ACK.
+*/
+   eieio();
+
+   /*
 * DD1 bug workaround: If PIPR is less favored than CPPR
 * ignore the interrupt or we might incorrectly lose an IPB
 * bit.
@@ -244,6 +250,11 @@ static u32 GLUE(X_PFX,scan_interrupts)(struct 
kvmppc_xive_vcpu *xc,
/*
 * If we found an interrupt, adjust what the guest CPPR should
 * be as if we had just fetched that interrupt from HW.
+*
+* Note: This can only make xc->cppr smaller as the previous
+* loop will only exit with hirq != 0 if prio is lower than
+* the current xc->cppr. Thus we don't need to re-check xc->mfrr
+* for pending IPIs.
 */
if (hirq)
xc->cppr = prio;
@@ -390,6 +401,12 @@ X_STATIC int GLUE(X_PFX,h_cppr)(struct kvm_vcpu *vcpu, 
unsigned long cppr)
xc->cppr = cppr;
 
/*
+* Order the above update of xc->cppr with the subsequent
+* read of xc->mfrr inside push_pending_to_hw()
+*/
+   smp_mb();
+
+   /*
 * We are masking less, we need to look for pending things
 * to deliver and set VP pending bits accordingly to trigger
 * a new interrupt otherwise we might miss MFRR changes for
@@ -429,21 +446,37 @@ X_STATIC int GLUE(X_PFX,h_eoi)(struct kvm_vcpu *vcpu, 
unsigned long xirr)
 * used to signal MFRR changes is EOId when fetched from
 * the queue.
 */
-   if (irq == XICS_IPI || irq == 0)
+   if (irq == XICS_IPI || irq == 0) {
+   /*
+* This barrier orders the setting of xc->cppr vs.
+* subsquent test of xc->mfrr done inside
+* scan_interrupts and push_pending_to_hw
+*/
+   smp_mb();
goto bail;
+   }
 
/* Find interrupt source */
sb = kvmppc_xive_find_source(xive, irq, );
if (!sb) {
pr_devel(" source not found !\n");
rc = H_PARAMETER;
+   /* Same as above */
+   smp_mb();
goto bail;
}
state = >irq_state[src];
kvmppc_xive_select_irq(state, _num, );
 
state->in_eoi = true;
-   mb();
+
+   /*
+* This barrier orders both setting of in_eoi above vs,
+* subsequent test of guest_priority, and the setting
+* of xc->cppr vs. subsquent test of xc->mfrr done inside
+* scan_interrupts and push_pending_to_hw
+*/
+   smp_mb();
 
 again:
if (state->guest_priority == MASKED) {
@@ -470,6 +503,14 @@ X_STATIC int GLUE(X_PFX,h_eoi)(struct kvm_vcpu *vcpu, 
unsigned long xirr)
 
}
 
+   /*
+* This barrier orders the above guest_priority check
+* and spin_lock/unlock with clearing in_eoi below.
+*
+* It also has to be a full mb() as it must ensure
+* the MMIOs done in source_eoi() are completed before
+* state->in_eoi is visible.
+*/
mb();
state->in_eoi = false;
 bail:
@@ -504,6 +545,18 @@ X_STATIC int GLUE(X_PFX,h_ipi)(struct kvm_vcpu *vcpu, 
unsigned long server,
/* Locklessly write over MFRR */
xc->mfrr = mfrr;
 
+   /*
+* The load of xc->cppr below and the subsequent MMIO store
+* to the IPI must happen after the above mfrr update is
+* globally visible so that:
+*
+* - Synchronize with another CPU doing an H_EOI or a H_CPPR
+*   updating xc->cppr then reading xc->mfrr.
+*
+* - The target of the IPI sees the xc->mfrr update
+*/
+   mb();
+
/* Shoot the IPI if most favored than target cppr */
if (mfrr < xc->cppr)
__x_writeq(0, __x_trig_page(>vp_ipi_data));



[PATCH 1/2] kvm/xive: Workaround P9 DD1.0 bug with IPB bit loss

2017-08-17 Thread Benjamin Herrenschmidt
Thankfully it only happens when manually manipulating CPPR which
is rather quite rare.

Signed-off-by: Benjamin Herrenschmidt 
---
 arch/powerpc/kvm/book3s_xive_template.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_xive_template.c 
b/arch/powerpc/kvm/book3s_xive_template.c
index 4636ca6e7d38..150be86b1018 100644
--- a/arch/powerpc/kvm/book3s_xive_template.c
+++ b/arch/powerpc/kvm/book3s_xive_template.c
@@ -16,7 +16,16 @@ static void GLUE(X_PFX,ack_pending)(struct kvmppc_xive_vcpu 
*xc)
u8 cppr;
u16 ack;
 
-   /* XXX DD1 bug workaround: Check PIPR vs. CPPR first ! */
+   /*
+* DD1 bug workaround: If PIPR is less favored than CPPR
+* ignore the interrupt or we might incorrectly lose an IPB
+* bit.
+*/
+   if (cpu_has_feature(CPU_FTR_POWER9_DD1)) {
+   u8 pipr = __x_readb(__x_tima + TM_QW1_OS + TM_PIPR);
+   if (pipr >= xc->hw_cppr)
+   return;
+   }
 
/* Perform the acknowledge OS to register cycle. */
ack = be16_to_cpu(__x_readw(__x_tima + TM_SPC_ACK_OS_REG));



Re: [RFC v7 26/25] mm/mprotect, powerpc/mm/pkeys, x86/mm/pkeys: Add sysfs interface

2017-08-17 Thread Ram Pai
On Fri, Aug 11, 2017 at 02:34:43PM -0300, Thiago Jung Bauermann wrote:
> Expose useful information for programs using memory protection keys.
> Provide implementation for powerpc and x86.
> 
> On a powerpc system with pkeys support, here is what is shown:
> 
> $ head /sys/kernel/mm/protection_keys/*
> ==> /sys/kernel/mm/protection_keys/disable_execute_supported <==
> true

We should not just call out disable_execute_supported.
disable_access_supported and disable_write_supported should also 
be called out.

> 
> ==> /sys/kernel/mm/protection_keys/total_keys <==
> 32
> 

> ==> /sys/kernel/mm/protection_keys/usable_keys <==
> 30

This is little nebulous.  It depends on how we define
usable as.  Is it the number of keys that are available
to the app?  If that is the case that value is dynamic.
Sometime the OS steals one key for execute-only key.
And anything that is dynamic can be inherently racy.
So I think we should define 'usable' as guaranteed number
of keys available to the app and display a value that is
one less than what is available.

in the above example the value should be 29.

RP



Re: [RFC v7 25/25] powerpc: Enable pkey subsystem

2017-08-17 Thread Ram Pai
On Thu, Aug 17, 2017 at 05:30:27PM -0300, Thiago Jung Bauermann wrote:
> 
> Ram Pai  writes:
> 
> > On Thu, Aug 10, 2017 at 06:27:34PM -0300, Thiago Jung Bauermann wrote:
> >> 
> >> Ram Pai  writes:
> >> > --- a/arch/powerpc/include/asm/cputable.h
> >> > +++ b/arch/powerpc/include/asm/cputable.h
> >> > @@ -214,6 +214,7 @@ enum {
> >> >  #define CPU_FTR_DAWR
> >> > LONG_ASM_CONST(0x0400)
> >> >  #define CPU_FTR_DABRX   
> >> > LONG_ASM_CONST(0x0800)
> >> >  #define CPU_FTR_PMAO_BUG
> >> > LONG_ASM_CONST(0x1000)
> >> > +#define CPU_FTR_PKEY
> >> > LONG_ASM_CONST(0x2000)
> >> >  #define CPU_FTR_POWER9_DD1  
> >> > LONG_ASM_CONST(0x4000)
> >> >
> >> >  #ifndef __ASSEMBLY__
> >> > @@ -452,7 +453,7 @@ enum {
> >> >  CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
> >> >  CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | 
> >> > CPU_FTR_POPCNTD | \
> >> >  CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | \
> >> > -CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX)
> >> > +CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | 
> >> > CPU_FTR_PKEY)
> >> 
> >> P7 supports protection keys for data access (AMR) but not for
> >> instruction access (IAMR), right? There's nothing in the code making
> >> this distinction, so either CPU_FTR_PKEY shouldn't be enabled in P7 or
> >> separate feature bits for AMR and IAMR should be used and checked before
> >> trying to access the IAMR.
> >
> > did'nt David say P7 supports both? P6, i think, only support data.
> > my pkey tests have passed on p7.
> 
> He said that P7 was the first processor to support 32 keys, but if you
> look at the Virtual Page Class Key Protection section in ISA 2.06,
> there's no IAMR.
> 
> There was a bug in the code where init_iamr was calling write_amr
> instead of write_iamr, perhaps that's why it worked when you tested on P7?
> 
> >> 
> >> >  #define CPU_FTRS_POWER8 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
> >> >  CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
> >> >  CPU_FTR_MMCRA | CPU_FTR_SMT | \
> >> > @@ -462,7 +463,7 @@ enum {
> >> >  CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | 
> >> > CPU_FTR_POPCNTD | \
> >> >  CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | 
> >> > CPU_FTR_VMX_COPY | \
> >> >  CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
> >> > -CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
> >> > +CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_PKEY)
> >> >  #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
> >> >  #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
> >> >  #define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
> >> > @@ -474,7 +475,8 @@ enum {
> >> >  CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | 
> >> > CPU_FTR_POPCNTD | \
> >> >  CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
> >> >  CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
> >> > -CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300)
> >> > +CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | \
> >> > +CPU_FTR_PKEY)
> >> >  #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
> >> >   (~CPU_FTR_SAO))
> >> >  #define CPU_FTRS_CELL   (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
> >> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
> >> > b/arch/powerpc/include/asm/mmu_context.h
> >> > index a1cfcca..acd59d8 100644
> >> > --- a/arch/powerpc/include/asm/mmu_context.h
> >> > +++ b/arch/powerpc/include/asm/mmu_context.h
> >> > @@ -188,6 +188,7 @@ static inline bool arch_vma_access_permitted(struct 
> >> > vm_area_struct *vma,
> >> >
> >> >  #define pkey_initialize()
> >> >  #define pkey_mm_init(mm)
> >> > +#define pkey_mmu_values(total_data, total_execute)
> >> >
> >> >  static inline int vma_pkey(struct vm_area_struct *vma)
> >> >  {
> >> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> >> > b/arch/powerpc/include/asm/pkeys.h
> >> > index ba7bff6..e61ed6c 100644
> >> > --- a/arch/powerpc/include/asm/pkeys.h
> >> > +++ b/arch/powerpc/include/asm/pkeys.h
> >> > @@ -1,6 +1,8 @@
> >> >  #ifndef _ASM_PPC64_PKEYS_H
> >> >  #define _ASM_PPC64_PKEYS_H
> >> >
> >> > +#include 
> >> > +
> >> >  extern bool pkey_inited;
> >> >  extern int pkeys_total; /* total pkeys as per device tree */
> >> >  extern u32 initial_allocation_mask;/* bits set for reserved keys */
> >> > @@ -227,6 +229,24 @@ static inline void pkey_mm_init(struct mm_struct 
> >> > *mm)
> >> >  mm->context.execute_only_pkey = -1;
> >> >  }
> >> >
> >> > +static inline void pkey_mmu_values(int total_data, int total_execute)
> >> > +{
> >> > +/*
> >> > + * since any pkey can be used for data or 

Re: [RFC v6 21/62] powerpc: introduce execute-only pkey

2017-08-17 Thread Ram Pai
On Thu, Aug 17, 2017 at 04:35:55PM -0700, Ram Pai wrote:
> On Wed, Aug 02, 2017 at 07:40:46PM +1000, Michael Ellerman wrote:
> > Thiago Jung Bauermann  writes:
> > 
> > > Michael Ellerman  writes:
> > >
> > >> Thiago Jung Bauermann  writes:
> > >>> Ram Pai  writes:
> > >> ...
> >  +
> >  +  /* We got one, store it and use it from here on out */
> >  +  if (need_to_set_mm_pkey)
> >  +  mm->context.execute_only_pkey = execute_only_pkey;
> >  +  return execute_only_pkey;
> >  +}
> > >>>
> > >>> If you follow the code flow in __execute_only_pkey, the AMR and UAMOR
> > >>> are read 3 times in total, and AMR is written twice. IAMR is read and
> > >>> written twice. Since they are SPRs and access to them is slow (or isn't
> > >>> it?),
> > >>
> > >> SPRs read/writes are slow, but they're not *that* slow in comparison to
> > >> a system call (which I think is where this code is being called?).
> > >
> > > Yes, this code runs on mprotect and mmap syscalls if the memory is
> > > requested to have execute but not read nor write permissions.
> > 
> > Yep. That's not in the fast path for key usage, ie. the fast path is
> > userspace changing the AMR itself, and the overhead of a syscall is
> > already hundreds of cycles.
> > 
> > >> So we should try to avoid too many SPR read/writes, but at the same time
> > >> we can accept more than the minimum if it makes the code much easier to
> > >> follow.
> > >
> > > Ok. Ram had asked me to suggest a way to optimize the SPR reads and
> > > writes and I came up with the patch below. Do you think it's worth it?
> > 
> > At a glance no I don't think it is. Sorry you spent that much time on it.
> > 
> > I think we can probably reduce the number of SPR accesses without
> > needing to go to that level of complexity.
> > 
> > But don't throw the patch away, I may eat my words once I have the full
> > series applied and am looking at it hard - at the moment I'm just
> > reviewing the patches piecemeal as I get time.
> 

Thiago's patch does save some cycles. I dont feel like throwing his
work. I agree, It should be considered after applying all the patches. 
 
RP

-- 
Ram Pai



Re: [RFT PATCH] tpm: ibmvtpm: simplify crq initialization and document crq format

2017-08-17 Thread msuchanek
ping?

On Fri, 24 Feb 2017 20:35:16 +0100
Michal Suchanek  wrote:

> The crq is passed in registers and is the same on BE and LE hosts.
> However, current implementation allocates a structure on-stack to
> represent the crq, initializes the members swapping them to BE, and
> loads the structure swapping it from BE. This is pointless and causes
> GCC warnings about ununitialized members. Get rid of the structure and
> the warnings.
> 
> Signed-off-by: Michal Suchanek 
> Reviewed-by: Jarkko Sakkinen 
> ---
> v2
> 
> fix typos and spelling in comments
> ---
>  drivers/char/tpm/tpm_ibmvtpm.c | 96
> ++ 1 file changed, 60
> insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c
> b/drivers/char/tpm/tpm_ibmvtpm.c index 1b9d61ffe991..89027339d55f
> 100644 --- a/drivers/char/tpm/tpm_ibmvtpm.c
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -39,19 +39,63 @@ static struct vio_device_id
> tpm_ibmvtpm_device_table[] = { MODULE_DEVICE_TABLE(vio,
> tpm_ibmvtpm_device_table); 
>  /**
> + *
> + * ibmvtpm_send_crq_word - Send a CRQ request
> + * @vdev:vio device struct
> + * @w1:  pre-constructed first word of tpm crq (second
> word is reserved)
> + *
> + * Return:
> + *   0 - Success
> + *   Non-zero - Failure
> + */
> +static int ibmvtpm_send_crq_word(struct vio_dev *vdev, u64 w1)
> +{
> + return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address,
> w1, 0); +}
> +
> +/**
> + *
>   * ibmvtpm_send_crq - Send a CRQ request
>   *
>   * @vdev:vio device struct
> - * @w1:  first word
> - * @w2:  second word
> + * @valid:   Valid field
> + * @msg: Type field
> + * @len: Length field
> + * @data:Data field
> + *
> + * The ibmvtpm crq is defined as follows:
> + *
> + * Byte  |   0   |   1   |   2   |   3   |   4   |   5   |   6   |
> 7
> + *
> ---
> + * Word0 | Valid | Type  | Length|  Data
> + *
> ---
> + * Word1 |Reserved
> + *
> ---
> + *
> + * Which matches the following structure (on bigendian host):
> + *
> + * struct ibmvtpm_crq {
> + * u8 valid;
> + * u8 msg;
> + * __be16 len;
> + * __be32 data;
> + * __be64 reserved;
> + * } __attribute__((packed, aligned(8)));
> + *
> + * However, the value is passed in a register so just compute the
> numeric value
> + * to load into the register avoiding byteswap altogether. Endian
> only affects
> + * memory loads and stores - registers are internally represented
> the same. *
>   * Return:
> - *   0 -Sucess
> + *   0 (H_SUCCESS) - Success
>   *   Non-zero - Failure
>   */
> -static int ibmvtpm_send_crq(struct vio_dev *vdev, u64 w1, u64 w2)
> +static int ibmvtpm_send_crq(struct vio_dev *vdev,
> + u8 valid, u8 msg, u16 len, u32 data)
>  {
> - return plpar_hcall_norets(H_SEND_CRQ, vdev->unit_address,
> w1, w2);
> + u64 w1 = ((u64)valid << 56) | ((u64)msg << 48) | ((u64)len
> << 32) |
> + (u64)data;
> + return ibmvtpm_send_crq_word(vdev, w1);
>  }
>  
>  /**
> @@ -109,8 +153,6 @@ static int tpm_ibmvtpm_recv(struct tpm_chip
> *chip, u8 *buf, size_t count) static int tpm_ibmvtpm_send(struct
> tpm_chip *chip, u8 *buf, size_t count) {
>   struct ibmvtpm_dev *ibmvtpm = dev_get_drvdata(>dev);
> - struct ibmvtpm_crq crq;
> - __be64 *word = (__be64 *)
>   int rc, sig;
>  
>   if (!ibmvtpm->rtce_buf) {
> @@ -137,10 +179,6 @@ static int tpm_ibmvtpm_send(struct tpm_chip
> *chip, u8 *buf, size_t count) spin_lock(>rtce_lock);
>   ibmvtpm->res_len = 0;
>   memcpy((void *)ibmvtpm->rtce_buf, (void *)buf, count);
> - crq.valid = (u8)IBMVTPM_VALID_CMD;
> - crq.msg = (u8)VTPM_TPM_COMMAND;
> - crq.len = cpu_to_be16(count);
> - crq.data = cpu_to_be32(ibmvtpm->rtce_dma_handle);
>  
>   /*
>* set the processing flag before the Hcall, since we may
> get the @@ -148,8 +186,9 @@ static int tpm_ibmvtpm_send(struct
> tpm_chip *chip, u8 *buf, size_t count) */
>   ibmvtpm->tpm_processing_cmd = true;
>  
> - rc = ibmvtpm_send_crq(ibmvtpm->vdev, be64_to_cpu(word[0]),
> -   be64_to_cpu(word[1]));
> + rc = ibmvtpm_send_crq(ibmvtpm->vdev,
> + IBMVTPM_VALID_CMD, VTPM_TPM_COMMAND,
> + count, ibmvtpm->rtce_dma_handle);
>   if (rc != H_SUCCESS) {
>   dev_err(ibmvtpm->dev, "tpm_ibmvtpm_send failed
> rc=%d\n", rc); rc = 0;
> @@ -182,15 +221,10 @@ static u8 tpm_ibmvtpm_status(struct tpm_chip
> *chip) */
>  static int ibmvtpm_crq_get_rtce_size(struct ibmvtpm_dev *ibmvtpm)
>  {
> - struct ibmvtpm_crq crq;
> - u64 *buf = (u64 *) 
>   int rc;
>  
> - crq.valid = 

[PATCH v2 20/20] powerpc/mm: Add speculative page fault

2017-08-17 Thread Laurent Dufour
This patch enable the speculative page fault on the PowerPC
architecture.

This will try a speculative page fault without holding the mmap_sem,
if it returns with WM_FAULT_RETRY, the mmap_sem is acquired and the
traditional page fault processing is done.

Support is only provide for BOOK3S_64 currently because:
- require CONFIG_PPC_STD_MMU because checks done in
  set_access_flags_filter()
- require BOOK3S because we can't support for book3e_hugetlb_preload()
  called by update_mmu_cache()

Signed-off-by: Laurent Dufour 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  5 +
 arch/powerpc/mm/fault.c  | 30 +++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 818a58fc3f4f..897f8b9f67e6 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -313,6 +313,11 @@ extern unsigned long pci_io_base;
 /* Advertise support for _PAGE_SPECIAL */
 #define __HAVE_ARCH_PTE_SPECIAL
 
+/* Advertise that we call the Speculative Page Fault handler */
+#if defined(CONFIG_PPC_BOOK3S_64)
+#define __HAVE_ARCH_CALL_SPF
+#endif
+
 #ifndef __ASSEMBLY__
 
 /*
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 4c422632047b..7b3cc4c30eab 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -291,9 +291,36 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
if (is_write && is_user)
store_update_sp = store_updates_sp(regs);
 
-   if (is_user)
+   if (is_user) {
flags |= FAULT_FLAG_USER;
 
+#if defined(__HAVE_ARCH_CALL_SPF)
+   /* let's try a speculative page fault without grabbing the
+* mmap_sem.
+*/
+
+   /*
+* flags is set later based on the VMA's flags, for the common
+* speculative service, we need some flags to be set.
+*/
+   if (is_write)
+   flags |= FAULT_FLAG_WRITE;
+
+   fault = handle_speculative_fault(mm, address, flags);
+   if (!(fault & VM_FAULT_RETRY || fault & VM_FAULT_ERROR)) {
+   perf_sw_event(PERF_COUNT_SW_SPF_DONE, 1,
+ regs, address);
+   goto done;
+   }
+
+   /*
+* Resetting flags since the following code assumes
+* FAULT_FLAG_WRITE is not set.
+*/
+   flags &= ~FAULT_FLAG_WRITE;
+#endif /* defined(__HAVE_ARCH_CALL_SPF) */
+   }
+
/* When running in the kernel we expect faults to occur only to
 * addresses in user space.  All other faults represent errors in the
 * kernel and should generate an OOPS.  Unfortunately, in the case of an
@@ -479,6 +506,7 @@ int do_page_fault(struct pt_regs *regs, unsigned long 
address,
rc = 0;
}
 
+done:
/*
 * Major/minor page fault accounting.
 */
-- 
2.7.4



[PATCH v2 19/20] x86/mm: Add speculative pagefault handling

2017-08-17 Thread Laurent Dufour
From: Peter Zijlstra 

Try a speculative fault before acquiring mmap_sem, if it returns with
VM_FAULT_RETRY continue with the mmap_sem acquisition and do the
traditional fault.

Signed-off-by: Peter Zijlstra (Intel) 

[Clearing of FAULT_FLAG_ALLOW_RETRY is now done in
 handle_speculative_fault()]
[Retry with usual fault path in the case VM_ERROR is returned by
 handle_speculative_fault(). This allows signal to be delivered]
Signed-off-by: Laurent Dufour 
---
 arch/x86/include/asm/pgtable_types.h |  7 +++
 arch/x86/mm/fault.c  | 19 +++
 2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index bf9638e1ee42..4fd2693a037e 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -234,6 +234,13 @@ enum page_cache_mode {
 #define PGD_IDENT_ATTR  0x001  /* PRESENT (no other attributes) */
 #endif
 
+/*
+ * Advertise that we call the Speculative Page Fault handler.
+ */
+#ifdef CONFIG_X86_64
+#define __HAVE_ARCH_CALL_SPF
+#endif
+
 #ifdef CONFIG_X86_32
 # include 
 #else
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2a1fa10c6a98..4c070b9a4362 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1365,6 +1365,24 @@ __do_page_fault(struct pt_regs *regs, unsigned long 
error_code,
if (error_code & PF_INSTR)
flags |= FAULT_FLAG_INSTRUCTION;
 
+#ifdef __HAVE_ARCH_CALL_SPF
+   if (error_code & PF_USER) {
+   fault = handle_speculative_fault(mm, address, flags);
+
+   /*
+* We also check against VM_FAULT_ERROR because we have to
+* raise a signal by calling later mm_fault_error() which
+* requires the vma pointer to be set. So in that case,
+* we fall through the normal path.
+*/
+   if (!(fault & VM_FAULT_RETRY || fault & VM_FAULT_ERROR)) {
+   perf_sw_event(PERF_COUNT_SW_SPF_DONE, 1,
+ regs, address);
+   goto done;
+   }
+   }
+#endif /* __HAVE_ARCH_CALL_SPF */
+
/*
 * When running in the kernel we expect faults to occur only to
 * addresses in user space.  All other faults represent errors in
@@ -1474,6 +1492,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long 
error_code,
return;
}
 
+done:
/*
 * Major/minor page fault accounting. If any of the events
 * returned VM_FAULT_MAJOR, we account it as a major fault.
-- 
2.7.4



[PATCH v2 18/20] perf tools: Add support for the SPF perf event

2017-08-17 Thread Laurent Dufour
Add support for the new speculative faults event.

Signed-off-by: Laurent Dufour 
---
 tools/include/uapi/linux/perf_event.h | 1 +
 tools/perf/util/evsel.c   | 1 +
 tools/perf/util/parse-events.c| 4 
 tools/perf/util/parse-events.l| 1 +
 tools/perf/util/python.c  | 1 +
 5 files changed, 8 insertions(+)

diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index b1c0b187acfe..3043ec0988e9 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -111,6 +111,7 @@ enum perf_sw_ids {
PERF_COUNT_SW_EMULATION_FAULTS  = 8,
PERF_COUNT_SW_DUMMY = 9,
PERF_COUNT_SW_BPF_OUTPUT= 10,
+   PERF_COUNT_SW_SPF_DONE  = 11,
 
PERF_COUNT_SW_MAX,  /* non-ABI */
 };
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 413f74df08de..660a7038198b 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -426,6 +426,7 @@ const char *perf_evsel__sw_names[PERF_COUNT_SW_MAX] = {
"alignment-faults",
"emulation-faults",
"dummy",
+   "speculative-faults",
 };
 
 static const char *__perf_evsel__sw_name(u64 config)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 01e779b91c8e..ef8ef30d39c3 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -135,6 +135,10 @@ struct event_symbol event_symbols_sw[PERF_COUNT_SW_MAX] = {
.symbol = "bpf-output",
.alias  = "",
},
+   [PERF_COUNT_SW_SPF_DONE] = {
+   .symbol = "speculative-faults",
+   .alias  = "spf",
+   },
 };
 
 #define __PERF_EVENT_FIELD(config, name) \
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 660fca05bc93..5cb78f004737 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -274,6 +274,7 @@ alignment-faults{ return 
sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_AL
 emulation-faults   { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
 dummy  { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
 bpf-output { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
+speculative-faults|spf { return sym(yyscanner, 
PERF_TYPE_SOFTWARE, PERF_COUNT_SW_SPF_DONE); }
 
/*
 * We have to handle the kernel PMU event 
cycles-ct/cycles-t/mem-loads/mem-stores separately.
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index c129e99114ae..1ee06e47d9dc 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -1141,6 +1141,7 @@ static struct {
PERF_CONST(COUNT_SW_ALIGNMENT_FAULTS),
PERF_CONST(COUNT_SW_EMULATION_FAULTS),
PERF_CONST(COUNT_SW_DUMMY),
+   PERF_CONST(COUNT_SW_SPF_DONE),
 
PERF_CONST(SAMPLE_IP),
PERF_CONST(SAMPLE_TID),
-- 
2.7.4



[PATCH v2 17/20] perf: Add a speculative page fault sw event

2017-08-17 Thread Laurent Dufour
Add a new software event to count succeeded speculative page faults.

Signed-off-by: Laurent Dufour 
---
 include/uapi/linux/perf_event.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..3043ec0988e9 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -111,6 +111,7 @@ enum perf_sw_ids {
PERF_COUNT_SW_EMULATION_FAULTS  = 8,
PERF_COUNT_SW_DUMMY = 9,
PERF_COUNT_SW_BPF_OUTPUT= 10,
+   PERF_COUNT_SW_SPF_DONE  = 11,
 
PERF_COUNT_SW_MAX,  /* non-ABI */
 };
-- 
2.7.4



[PATCH v2 16/20] mm: Adding speculative page fault failure trace events

2017-08-17 Thread Laurent Dufour
This patch a set of new trace events to collect the speculative page fault
event failures.

Signed-off-by: Laurent Dufour 
---
 include/trace/events/pagefault.h | 87 
 mm/memory.c  | 68 ---
 2 files changed, 141 insertions(+), 14 deletions(-)
 create mode 100644 include/trace/events/pagefault.h

diff --git a/include/trace/events/pagefault.h b/include/trace/events/pagefault.h
new file mode 100644
index ..d7d56f8102d1
--- /dev/null
+++ b/include/trace/events/pagefault.h
@@ -0,0 +1,87 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pagefault
+
+#if !defined(_TRACE_PAGEFAULT_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PAGEFAULT_H
+
+#include 
+#include 
+
+DECLARE_EVENT_CLASS(spf,
+
+   TP_PROTO(unsigned long caller,
+struct vm_area_struct *vma, unsigned long address),
+
+   TP_ARGS(caller, vma, address),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, caller)
+   __field(unsigned long, vm_start)
+   __field(unsigned long, vm_end)
+   __field(unsigned long, address)
+   ),
+
+   TP_fast_assign(
+   __entry->caller = caller;
+   __entry->vm_start   = vma->vm_start;
+   __entry->vm_end = vma->vm_end;
+   __entry->address= address;
+   ),
+
+   TP_printk("ip:%lx vma:%lu-%lx address:%lx",
+ __entry->caller, __entry->vm_start, __entry->vm_end,
+ __entry->address)
+);
+
+DEFINE_EVENT(spf, spf_pte_lock,
+
+   TP_PROTO(unsigned long caller,
+struct vm_area_struct *vma, unsigned long address),
+
+   TP_ARGS(caller, vma, address)
+);
+
+DEFINE_EVENT(spf, spf_vma_changed,
+
+   TP_PROTO(unsigned long caller,
+struct vm_area_struct *vma, unsigned long address),
+
+   TP_ARGS(caller, vma, address)
+);
+
+DEFINE_EVENT(spf, spf_vma_dead,
+
+   TP_PROTO(unsigned long caller,
+struct vm_area_struct *vma, unsigned long address),
+
+   TP_ARGS(caller, vma, address)
+);
+
+DEFINE_EVENT(spf, spf_vma_noanon,
+
+   TP_PROTO(unsigned long caller,
+struct vm_area_struct *vma, unsigned long address),
+
+   TP_ARGS(caller, vma, address)
+);
+
+DEFINE_EVENT(spf, spf_vma_notsup,
+
+   TP_PROTO(unsigned long caller,
+struct vm_area_struct *vma, unsigned long address),
+
+   TP_ARGS(caller, vma, address)
+);
+
+DEFINE_EVENT(spf, spf_vma_access,
+
+   TP_PROTO(unsigned long caller,
+struct vm_area_struct *vma, unsigned long address),
+
+   TP_ARGS(caller, vma, address)
+);
+
+#endif /* _TRACE_PAGEFAULT_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/mm/memory.c b/mm/memory.c
index 8c701e4f59d3..549d23583f53 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -79,6 +79,9 @@
 
 #include "internal.h"
 
+#define CREATE_TRACE_POINTS
+#include 
+
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for 
last_cpupid.
 #endif
@@ -2296,15 +2299,20 @@ static bool pte_spinlock(struct vm_fault *vmf)
}
 
local_irq_disable();
-   if (vma_has_changed(vmf))
+   if (vma_has_changed(vmf)) {
+   trace_spf_vma_changed(_RET_IP_, vmf->vma, vmf->address);
goto out;
+   }
 
vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
-   if (unlikely(!spin_trylock(vmf->ptl)))
+   if (unlikely(!spin_trylock(vmf->ptl))) {
+   trace_spf_pte_lock(_RET_IP_, vmf->vma, vmf->address);
goto out;
+   }
 
if (vma_has_changed(vmf)) {
spin_unlock(vmf->ptl);
+   trace_spf_vma_changed(_RET_IP_, vmf->vma, vmf->address);
goto out;
}
 
@@ -2334,8 +2342,10 @@ static bool pte_map_lock(struct vm_fault *vmf)
 * block on the PTL and thus we're safe.
 */
local_irq_disable();
-   if (vma_has_changed(vmf))
+   if (vma_has_changed(vmf)) {
+   trace_spf_vma_changed(_RET_IP_, vmf->vma, vmf->address);
goto out;
+   }
 
/*
 * Same as pte_offset_map_lock() except that we call
@@ -2348,11 +2358,13 @@ static bool pte_map_lock(struct vm_fault *vmf)
pte = pte_offset_map(vmf->pmd, vmf->address);
if (unlikely(!spin_trylock(ptl))) {
pte_unmap(pte);
+   trace_spf_pte_lock(_RET_IP_, vmf->vma, vmf->address);
goto out;
}
 
if (vma_has_changed(vmf)) {
pte_unmap_unlock(pte, ptl);
+   trace_spf_vma_changed(_RET_IP_, vmf->vma, vmf->address);
goto out;
}
 
@@ -3989,27 +4001,40 @@ int handle_speculative_fault(struct mm_struct *mm, 
unsigned long address,
 * Validate the VMA found by the lockless 

[PATCH v2 15/20] mm: Try spin lock in speculative path

2017-08-17 Thread Laurent Dufour
There is a deadlock when a CPU is doing a speculative page fault and
another one is calling do_unmap().

The deadlock occurred because the speculative path try to spinlock the
pte while the interrupt are disabled. When the other CPU in the
unmap's path has locked the pte then is waiting for all the CPU to
invalidate the TLB. As the CPU doing the speculative fault have the
interrupt disable it can't invalidate the TLB, and can't get the lock.

Since we are in a speculative path, we can race with other mm action.
So let assume that the lock may not get acquired and fail the
speculative page fault.

Here are the stacks captured during the deadlock:

CPU 0
native_flush_tlb_others+0x7c/0x260
flush_tlb_mm_range+0x6a/0x220
tlb_flush_mmu_tlbonly+0x63/0xc0
unmap_page_range+0x897/0x9d0
? unmap_single_vma+0x7d/0xe0
? release_pages+0x2b3/0x360
unmap_single_vma+0x7d/0xe0
unmap_vmas+0x51/0xa0
unmap_region+0xbd/0x130
do_munmap+0x279/0x460
SyS_munmap+0x53/0x70

CPU 1
do_raw_spin_lock+0x14e/0x160
_raw_spin_lock+0x5d/0x80
? pte_map_lock+0x169/0x1b0
pte_map_lock+0x169/0x1b0
handle_pte_fault+0xbf2/0xd80
? trace_hardirqs_on+0xd/0x10
handle_speculative_fault+0x272/0x280
handle_speculative_fault+0x5/0x280
__do_page_fault+0x187/0x580
trace_do_page_fault+0x52/0x260
do_async_page_fault+0x19/0x70
async_page_fault+0x28/0x30

Signed-off-by: Laurent Dufour 
---
 mm/memory.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 0ba14a5797b2..8c701e4f59d3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2300,7 +2300,8 @@ static bool pte_spinlock(struct vm_fault *vmf)
goto out;
 
vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
-   spin_lock(vmf->ptl);
+   if (unlikely(!spin_trylock(vmf->ptl)))
+   goto out;
 
if (vma_has_changed(vmf)) {
spin_unlock(vmf->ptl);
@@ -2336,8 +2337,20 @@ static bool pte_map_lock(struct vm_fault *vmf)
if (vma_has_changed(vmf))
goto out;
 
-   pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd,
- vmf->address, );
+   /*
+* Same as pte_offset_map_lock() except that we call
+* spin_trylock() in place of spin_lock() to avoid race with
+* unmap path which may have the lock and wait for this CPU
+* to invalidate TLB but this CPU has irq disabled.
+* Since we are in a speculative patch, accept it could fail
+*/
+   ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
+   pte = pte_offset_map(vmf->pmd, vmf->address);
+   if (unlikely(!spin_trylock(ptl))) {
+   pte_unmap(pte);
+   goto out;
+   }
+
if (vma_has_changed(vmf)) {
pte_unmap_unlock(pte, ptl);
goto out;
-- 
2.7.4



[PATCH v2 14/20] mm: Provide speculative fault infrastructure

2017-08-17 Thread Laurent Dufour
From: Peter Zijlstra 

Provide infrastructure to do a speculative fault (not holding
mmap_sem).

The not holding of mmap_sem means we can race against VMA
change/removal and page-table destruction. We use the SRCU VMA freeing
to keep the VMA around. We use the VMA seqcount to detect change
(including umapping / page-table deletion) and we use gup_fast() style
page-table walking to deal with page-table races.

Once we've obtained the page and are ready to update the PTE, we
validate if the state we started the fault with is still valid, if
not, we'll fail the fault with VM_FAULT_RETRY, otherwise we update the
PTE and we're done.

Signed-off-by: Peter Zijlstra (Intel) 

[Manage the newly introduced pte_spinlock() for speculative page
 fault to fail if the VMA is touched in our back]
[Rename vma_is_dead() to vma_has_changed() and declare it here]
[Call p4d_alloc() as it is safe since pgd is valid]
[Call pud_alloc() as it is safe since p4d is valid]
[Set fe.sequence in __handle_mm_fault()]
[Abort speculative path when handle_userfault() has to be called]
[Add additional VMA's flags checks in handle_speculative_fault()]
[Clear FAULT_FLAG_ALLOW_RETRY in handle_speculative_fault()]
[Don't set vmf->pte and vmf->ptl if pte_map_lock() failed]
[Remove warning comment about waiting for !seq&1 since we don't want
 to wait]
[Remove warning about no huge page support, mention it explictly]
[Don't call do_fault() in the speculative path as __do_fault() calls
 vma->vm_ops->fault() which may want to release mmap_sem]
[Only vm_fault pointer argument for vma_has_changed()]
[Fix check against huge page, calling pmd_trans_huge()]
[Introduce __HAVE_ARCH_CALL_SPF to declare the SPF handler only when
 architecture is supporting it]
[Use READ_ONCE() when reading VMA's fields in the speculative path]
[Explicitly check for __HAVE_ARCH_PTE_SPECIAL as we can't support for
 processing done in vm_normal_page()]
[Check that vma->anon_vma is already set when starting the speculative
 path]
[Check for memory policy as we can't support MPOL_INTERLEAVE case due to
 the processing done in mpol_misplaced()]
[Don't support VMA growing up or down]
[Move check on vm_sequence just before calling handle_pte_fault()]
Signed-off-by: Laurent Dufour 
---
 include/linux/hugetlb_inline.h |   2 +-
 include/linux/mm.h |   5 +
 include/linux/pagemap.h|   4 +-
 mm/internal.h  |  14 +++
 mm/memory.c| 237 -
 5 files changed, 254 insertions(+), 8 deletions(-)

diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
index a4e7ca0f3585..6cfdfca4cc2a 100644
--- a/include/linux/hugetlb_inline.h
+++ b/include/linux/hugetlb_inline.h
@@ -7,7 +7,7 @@
 
 static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma)
 {
-   return !!(vma->vm_flags & VM_HUGETLB);
+   return !!(READ_ONCE(vma->vm_flags) & VM_HUGETLB);
 }
 
 #else
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0f4ddd72b172..0fe0811d304f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -315,6 +315,7 @@ struct vm_fault {
gfp_t gfp_mask; /* gfp mask to be used for allocations 
*/
pgoff_t pgoff;  /* Logical page offset based on vma */
unsigned long address;  /* Faulting virtual address */
+   unsigned int sequence;
pmd_t *pmd; /* Pointer to pmd entry matching
 * the 'address' */
pud_t *pud; /* Pointer to pud entry matching
@@ -1297,6 +1298,10 @@ int invalidate_inode_page(struct page *page);
 #ifdef CONFIG_MMU
 extern int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
unsigned int flags);
+#ifdef __HAVE_ARCH_CALL_SPF
+extern int handle_speculative_fault(struct mm_struct *mm,
+   unsigned long address, unsigned int flags);
+#endif /* __HAVE_ARCH_CALL_SPF */
 extern int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
unsigned long address, unsigned int fault_flags,
bool *unlocked);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 79b36f57c3ba..3a9735dfa6b6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -443,8 +443,8 @@ static inline pgoff_t linear_page_index(struct 
vm_area_struct *vma,
pgoff_t pgoff;
if (unlikely(is_vm_hugetlb_page(vma)))
return linear_hugepage_index(vma, address);
-   pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
-   pgoff += vma->vm_pgoff;
+   pgoff = (address - READ_ONCE(vma->vm_start)) >> PAGE_SHIFT;
+   pgoff += READ_ONCE(vma->vm_pgoff);
return pgoff;
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 736540f15936..9d6347e35747 100644
--- a/mm/internal.h
+++ 

[PATCH v2 13/20] mm: Introduce __page_add_new_anon_rmap()

2017-08-17 Thread Laurent Dufour
When dealing with speculative page fault handler, we may race with VMA
being split or merged. In this case the vma->vm_start and vm->vm_end
fields may not match the address the page fault is occurring.

This can only happens when the VMA is split but in that case, the
anon_vma pointer of the new VMA will be the same as the original one,
because in __split_vma the new->anon_vma is set to src->anon_vma when
*new = *vma.

So even if the VMA boundaries are not correct, the anon_vma pointer is
still valid.

If the VMA has been merged, then the VMA in which it has been merged
must have the same anon_vma pointer otherwise the merge can't be done.

So in all the case we know that the anon_vma is valid, since we have
checked before starting the speculative page fault that the anon_vma
pointer is valid for this VMA and since there is an anon_vma this
means that at one time a page has been backed and that before the VMA
is cleaned, the page table lock would have to be grab to clean the
PTE, and the anon_vma field is checked once the PTE is locked.

This patch introduce a new __page_add_new_anon_rmap() service which
doesn't check for the VMA boundaries, and create a new inline one
which do the check.

When called from a page fault handler, if this is not a speculative one,
there is a guarantee that vm_start and vm_end match the faulting address,
so this check is useless. In the context of the speculative page fault
handler, this check may be wrong but anon_vma is still valid as explained
above.

Signed-off-by: Laurent Dufour 
---
 include/linux/rmap.h | 12 ++--
 mm/memory.c  |  8 
 mm/rmap.c|  5 ++---
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 43ef2c30cb0f..f5cd4dbc78b0 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -170,8 +170,16 @@ void page_add_anon_rmap(struct page *, struct 
vm_area_struct *,
unsigned long, bool);
 void do_page_add_anon_rmap(struct page *, struct vm_area_struct *,
   unsigned long, int);
-void page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
-   unsigned long, bool);
+void __page_add_new_anon_rmap(struct page *, struct vm_area_struct *,
+ unsigned long, bool);
+static inline void page_add_new_anon_rmap(struct page *page,
+ struct vm_area_struct *vma,
+ unsigned long address, bool compound)
+{
+   VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
+   __page_add_new_anon_rmap(page, vma, address, compound);
+}
+
 void page_add_file_rmap(struct page *, bool);
 void page_remove_rmap(struct page *, bool);
 
diff --git a/mm/memory.c b/mm/memory.c
index 9f9e5bb7a556..51bc8315281e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2376,7 +2376,7 @@ static int wp_page_copy(struct vm_fault *vmf)
 * thread doing COW.
 */
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
-   page_add_new_anon_rmap(new_page, vma, vmf->address, false);
+   __page_add_new_anon_rmap(new_page, vma, vmf->address, false);
mem_cgroup_commit_charge(new_page, memcg, false, false);
__lru_cache_add_active_or_unevictable(new_page, vmf->vma_flags);
/*
@@ -2847,7 +2847,7 @@ int do_swap_page(struct vm_fault *vmf)
mem_cgroup_commit_charge(page, memcg, true, false);
activate_page(page);
} else { /* ksm created a completely new copy */
-   page_add_new_anon_rmap(page, vma, vmf->address, false);
+   __page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
__lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
}
@@ -2985,7 +2985,7 @@ static int do_anonymous_page(struct vm_fault *vmf)
}
 
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-   page_add_new_anon_rmap(page, vma, vmf->address, false);
+   __page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
__lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
 setpte:
@@ -3237,7 +3237,7 @@ int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup 
*memcg,
/* copy-on-write page */
if (write && !(vmf->vma_flags & VM_SHARED)) {
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
-   page_add_new_anon_rmap(page, vma, vmf->address, false);
+   __page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
__lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
} else {
diff --git a/mm/rmap.c b/mm/rmap.c
index c1286d47aa1f..0c9f8ded669a 

[PATCH v2 12/20] mm: Introduce __vm_normal_page()

2017-08-17 Thread Laurent Dufour
When dealing with the speculative fault path we should use the VMA's field
cached value stored in the vm_fault structure.

Currently vm_normal_page() is using the pointer to the VMA to fetch the
vm_flags value. This patch provides a new __vm_normal_page() which is
receiving the vm_flags flags value as parameter.

Note: The speculative path is turned on for architecture providing support
for special PTE flag. So only the first block of vm_normal_page is used
during the speculative path.

Signed-off-by: Laurent Dufour 
---
 mm/memory.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index ad7b6372d302..9f9e5bb7a556 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -820,8 +820,9 @@ static void print_bad_pte(struct vm_area_struct *vma, 
unsigned long addr,
 #else
 # define HAVE_PTE_SPECIAL 0
 #endif
-struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
-   pte_t pte)
+static struct page *__vm_normal_page(struct vm_area_struct *vma,
+unsigned long addr,
+pte_t pte, unsigned long vma_flags)
 {
unsigned long pfn = pte_pfn(pte);
 
@@ -830,7 +831,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
goto check_pfn;
if (vma->vm_ops && vma->vm_ops->find_special_page)
return vma->vm_ops->find_special_page(vma, addr);
-   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
+   if (vma_flags & (VM_PFNMAP | VM_MIXEDMAP))
return NULL;
if (!is_zero_pfn(pfn))
print_bad_pte(vma, addr, pte, NULL);
@@ -839,8 +840,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
 
/* !HAVE_PTE_SPECIAL case follows: */
 
-   if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
-   if (vma->vm_flags & VM_MIXEDMAP) {
+   if (unlikely(vma_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
+   if (vma_flags & VM_MIXEDMAP) {
if (!pfn_valid(pfn))
return NULL;
goto out;
@@ -849,7 +850,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
off = (addr - vma->vm_start) >> PAGE_SHIFT;
if (pfn == vma->vm_pgoff + off)
return NULL;
-   if (!is_cow_mapping(vma->vm_flags))
+   if (!is_cow_mapping(vma_flags))
return NULL;
}
}
@@ -870,6 +871,13 @@ struct page *vm_normal_page(struct vm_area_struct *vma, 
unsigned long addr,
return pfn_to_page(pfn);
 }
 
+struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
+   pte_t pte)
+{
+   return __vm_normal_page(vma, addr, pte, vma->vm_flags);
+}
+
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
pmd_t pmd)
@@ -2548,7 +2556,8 @@ static int do_wp_page(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
 
-   vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
+   vmf->page = __vm_normal_page(vma, vmf->address, vmf->orig_pte,
+vmf->vma_flags);
if (!vmf->page) {
/*
 * VM_MIXEDMAP !pfn_valid() case, or VM_SOFTDIRTY clear on a
@@ -3575,7 +3584,7 @@ static int do_numa_page(struct vm_fault *vmf)
ptep_modify_prot_commit(vma->vm_mm, vmf->address, vmf->pte, pte);
update_mmu_cache(vma, vmf->address, vmf->pte);
 
-   page = vm_normal_page(vma, vmf->address, pte);
+   page = __vm_normal_page(vma, vmf->address, pte, vmf->vma_flags);
if (!page) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
return 0;
-- 
2.7.4



[PATCH v2 11/20] mm: Introduce __maybe_mkwrite()

2017-08-17 Thread Laurent Dufour
The current maybe_mkwrite() is getting passed the pointer to the vma
structure to fetch the vm_flags field.

When dealing with the speculative page fault handler, it will be better to
rely on the cached vm_flags value stored in the vm_fault structure.

This patch introduce a __maybe_mkwrite() service which can be called by
passing the value of the vm_flags field.

There is no change functional changes expected for the other callers of
maybe_mkwrite().

Signed-off-by: Laurent Dufour 
---
 include/linux/mm.h | 9 +++--
 mm/memory.c| 6 +++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 43d313ff3a5b..0f4ddd72b172 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -668,13 +668,18 @@ void free_compound_page(struct page *page);
  * pte_mkwrite.  But get_user_pages can cause write faults for mappings
  * that do not have writing enabled, when used by access_process_vm.
  */
-static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
+static inline pte_t __maybe_mkwrite(pte_t pte, unsigned long vma_flags)
 {
-   if (likely(vma->vm_flags & VM_WRITE))
+   if (likely(vma_flags & VM_WRITE))
pte = pte_mkwrite(pte);
return pte;
 }
 
+static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
+{
+   return __maybe_mkwrite(pte, vma->vm_flags);
+}
+
 int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg,
struct page *page);
 int finish_fault(struct vm_fault *vmf);
diff --git a/mm/memory.c b/mm/memory.c
index c6b18cc87e90..ad7b6372d302 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2269,7 +2269,7 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
 
flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
entry = pte_mkyoung(vmf->orig_pte);
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   entry = __maybe_mkwrite(pte_mkdirty(entry), vmf->vma_flags);
if (ptep_set_access_flags(vma, vmf->address, vmf->pte, entry, 1))
update_mmu_cache(vma, vmf->address, vmf->pte);
pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -2359,8 +2359,8 @@ static int wp_page_copy(struct vm_fault *vmf)
inc_mm_counter_fast(mm, MM_ANONPAGES);
}
flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte));
-   entry = mk_pte(new_page, vma->vm_page_prot);
-   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   entry = mk_pte(new_page, vmf->vma_page_prot);
+   entry = __maybe_mkwrite(pte_mkdirty(entry), vmf->vma_flags);
/*
 * Clear the pte entry and flush it first, before updating the
 * pte with the new entry. This will avoid a race condition
-- 
2.7.4



[PATCH v2 10/20] mm: Introduce __lru_cache_add_active_or_unevictable

2017-08-17 Thread Laurent Dufour
The speculative page fault handler which is run without holding the
mmap_sem is calling lru_cache_add_active_or_unevictable() but the vm_flags
is not guaranteed to remain constant.
Introducing __lru_cache_add_active_or_unevictable() which has the vma flags
value parameter instead of the vma pointer.

Signed-off-by: Laurent Dufour 
---
 include/linux/swap.h | 11 +--
 mm/memory.c  |  8 
 mm/swap.c| 12 ++--
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index d83d28e53e62..fdea932fe10f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -285,8 +285,15 @@ extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
 
-extern void lru_cache_add_active_or_unevictable(struct page *page,
-   struct vm_area_struct *vma);
+extern void __lru_cache_add_active_or_unevictable(struct page *page,
+   unsigned long vma_flags);
+
+static inline void lru_cache_add_active_or_unevictable(struct page *page,
+   struct vm_area_struct *vma)
+{
+   return __lru_cache_add_active_or_unevictable(page, vma->vm_flags);
+}
+
 
 /* linux/mm/vmscan.c */
 extern unsigned long zone_reclaimable_pages(struct zone *zone);
diff --git a/mm/memory.c b/mm/memory.c
index 535282b3..c6b18cc87e90 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2370,7 +2370,7 @@ static int wp_page_copy(struct vm_fault *vmf)
ptep_clear_flush_notify(vma, vmf->address, vmf->pte);
page_add_new_anon_rmap(new_page, vma, vmf->address, false);
mem_cgroup_commit_charge(new_page, memcg, false, false);
-   lru_cache_add_active_or_unevictable(new_page, vma);
+   __lru_cache_add_active_or_unevictable(new_page, vmf->vma_flags);
/*
 * We call the notify macro here because, when using secondary
 * mmu page tables (such as kvm shadow page tables), we want the
@@ -2840,7 +2840,7 @@ int do_swap_page(struct vm_fault *vmf)
} else { /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
-   lru_cache_add_active_or_unevictable(page, vma);
+   __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
}
 
swap_free(entry);
@@ -2978,7 +2978,7 @@ static int do_anonymous_page(struct vm_fault *vmf)
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
-   lru_cache_add_active_or_unevictable(page, vma);
+   __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
 setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);
 
@@ -3230,7 +3230,7 @@ int alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup 
*memcg,
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
page_add_new_anon_rmap(page, vma, vmf->address, false);
mem_cgroup_commit_charge(page, memcg, false, false);
-   lru_cache_add_active_or_unevictable(page, vma);
+   __lru_cache_add_active_or_unevictable(page, vmf->vma_flags);
} else {
inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
page_add_file_rmap(page, false);
diff --git a/mm/swap.c b/mm/swap.c
index 60b1d2a75852..ece0826a205b 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -470,21 +470,21 @@ void add_page_to_unevictable_list(struct page *page)
 }
 
 /**
- * lru_cache_add_active_or_unevictable
- * @page:  the page to be added to LRU
- * @vma:   vma in which page is mapped for determining reclaimability
+ * __lru_cache_add_active_or_unevictable
+ * @page:  the page to be added to LRU
+ * @vma_flags:  vma in which page is mapped for determining reclaimability
  *
  * Place @page on the active or unevictable LRU list, depending on its
  * evictability.  Note that if the page is not evictable, it goes
  * directly back onto it's zone's unevictable list, it does NOT use a
  * per cpu pagevec.
  */
-void lru_cache_add_active_or_unevictable(struct page *page,
-struct vm_area_struct *vma)
+void __lru_cache_add_active_or_unevictable(struct page *page,
+  unsigned long vma_flags)
 {
VM_BUG_ON_PAGE(PageLRU(page), page);
 
-   if (likely((vma->vm_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
+   if (likely((vma_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED)) {
SetPageActive(page);
lru_cache_add(page);
return;
-- 
2.7.4



[PATCH v2 09/20] mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()

2017-08-17 Thread Laurent Dufour
migrate_misplaced_page() is only called during the page fault handling so
it's better to pass the pointer to the struct vm_fault instead of the vma.

This way during the speculative page fault path the saved vma->vm_flags
could be used.

Signed-off-by: Laurent Dufour 
---
 include/linux/migrate.h | 4 ++--
 mm/memory.c | 2 +-
 mm/migrate.c| 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 3e0d405dc842..65357105cbab 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -108,14 +108,14 @@ static inline void __ClearPageMovable(struct page *page)
 #ifdef CONFIG_NUMA_BALANCING
 extern bool pmd_trans_migrating(pmd_t pmd);
 extern int migrate_misplaced_page(struct page *page,
- struct vm_area_struct *vma, int node);
+ struct vm_fault *vmf, int node);
 #else
 static inline bool pmd_trans_migrating(pmd_t pmd)
 {
return false;
 }
 static inline int migrate_misplaced_page(struct page *page,
-struct vm_area_struct *vma, int node)
+struct vm_fault *vmf, int node)
 {
return -EAGAIN; /* can't migrate now */
 }
diff --git a/mm/memory.c b/mm/memory.c
index 68e4fdcce692..535282b3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3616,7 +3616,7 @@ static int do_numa_page(struct vm_fault *vmf)
}
 
/* Migrate to the requested node */
-   migrated = migrate_misplaced_page(page, vma, target_nid);
+   migrated = migrate_misplaced_page(page, vmf, target_nid);
if (migrated) {
page_nid = target_nid;
flags |= TNF_MIGRATED;
diff --git a/mm/migrate.c b/mm/migrate.c
index d68a41da6abb..354f74f7dad3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1847,7 +1847,7 @@ bool pmd_trans_migrating(pmd_t pmd)
  * node. Caller is expected to have an elevated reference count on
  * the page that will be dropped by this function before returning.
  */
-int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
+int migrate_misplaced_page(struct page *page, struct vm_fault *vmf,
   int node)
 {
pg_data_t *pgdat = NODE_DATA(node);
@@ -1860,7 +1860,7 @@ int migrate_misplaced_page(struct page *page, struct 
vm_area_struct *vma,
 * with execute permissions as they are probably shared libraries.
 */
if (page_mapcount(page) != 1 && page_is_file_cache(page) &&
-   (vma->vm_flags & VM_EXEC))
+   (vmf->vma_flags & VM_EXEC))
goto out;
 
/*
-- 
2.7.4



[PATCH v2 08/20] mm: Protect SPF handler against anon_vma changes

2017-08-17 Thread Laurent Dufour
The speculative page fault handler must be protected against anon_vma
changes. This is because page_add_new_anon_rmap() is called during the
speculative path.

In addition, don't try speculative page fault if the VMA don't have an
anon_vma structure allocated because its allocation should be
protected by the mmap_sem.

In __vma_adjust() when importer->anon_vma is set, there is no need to
protect against speculative page faults since speculative page fault
is aborted if the vma->anon_vma is not set.

When calling page_add_new_anon_rmap() vma->anon_vma is necessarily
valid since we checked for it when locking the pte and the anon_vma is
removed once the pte is unlocked. So even if the speculative page
fault handler is running concurrently with do_unmap(), as the pte is
locked in unmap_region() - through unmap_vmas() - and the anon_vma
unlinked later, because we check for the vma sequence counter which is
updated in unmap_page_range() before locking the pte, and then in
free_pgtables() so when locking the pte the change will be detected.

Signed-off-by: Laurent Dufour 
---
 mm/memory.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index da3bd07bb052..68e4fdcce692 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -615,7 +615,9 @@ void free_pgtables(struct mmu_gather *tlb, struct 
vm_area_struct *vma,
 * Hide vma from rmap and truncate_pagecache before freeing
 * pgtables
 */
+   write_seqcount_begin(>vm_sequence);
unlink_anon_vmas(vma);
+   write_seqcount_end(>vm_sequence);
unlink_file_vma(vma);
 
if (is_vm_hugetlb_page(vma)) {
@@ -629,7 +631,9 @@ void free_pgtables(struct mmu_gather *tlb, struct 
vm_area_struct *vma,
   && !is_vm_hugetlb_page(next)) {
vma = next;
next = vma->vm_next;
+   write_seqcount_begin(>vm_sequence);
unlink_anon_vmas(vma);
+   write_seqcount_end(>vm_sequence);
unlink_file_vma(vma);
}
free_pgd_range(tlb, addr, vma->vm_end,
-- 
2.7.4



[PATCH v2 07/20] mm: Cache some VMA fields in the vm_fault structure

2017-08-17 Thread Laurent Dufour
When handling speculative page fault, the vma->vm_flags and
vma->vm_page_prot fields are read once the page table lock is released. So
there is no more guarantee that these fields would not change in our back.
They will be saved in the vm_fault structure before the VMA is checked for
changes.

This patch also set the fields in hugetlb_no_page() and
__collapse_huge_page_swapin even if it is not need for the callee.

Signed-off-by: Laurent Dufour 
---
 include/linux/mm.h |  6 ++
 mm/hugetlb.c   |  2 ++
 mm/khugepaged.c|  2 ++
 mm/memory.c| 38 --
 4 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8763ec96dc78..43d313ff3a5b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -345,6 +345,12 @@ struct vm_fault {
 * page table to avoid allocation from
 * atomic context.
 */
+   /*
+* These entries are required when handling speculative page fault.
+* This way the page handling is done using consistent field values.
+*/
+   unsigned long vma_flags;
+   pgprot_t vma_page_prot;
 };
 
 /* page entry size for vm->huge_fault() */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 31e207cb399b..55201b98133e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3676,6 +3676,8 @@ static int hugetlb_no_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
.vma = vma,
.address = address,
.flags = flags,
+   .vma_flags = vma->vm_flags,
+   .vma_page_prot = vma->vm_page_prot,
/*
 * Hard to debug if it ends up being
 * used by a callee that assumes
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 56dd994c05d0..0525a0e74535 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -881,6 +881,8 @@ static bool __collapse_huge_page_swapin(struct mm_struct 
*mm,
.flags = FAULT_FLAG_ALLOW_RETRY,
.pmd = pmd,
.pgoff = linear_page_index(vma, address),
+   .vma_flags = vma->vm_flags,
+   .vma_page_prot = vma->vm_page_prot,
};
 
/* we only decide to swapin, if there is enough young ptes */
diff --git a/mm/memory.c b/mm/memory.c
index 4a2736fe2ef6..da3bd07bb052 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2417,7 +2417,7 @@ static int wp_page_copy(struct vm_fault *vmf)
 * Don't let another task, with possibly unlocked vma,
 * keep the mlocked page.
 */
-   if (page_copied && (vma->vm_flags & VM_LOCKED)) {
+   if (page_copied && (vmf->vma_flags & VM_LOCKED)) {
lock_page(old_page);/* LRU manipulation */
if (PageMlocked(old_page))
munlock_vma_page(old_page);
@@ -2451,7 +2451,7 @@ static int wp_page_copy(struct vm_fault *vmf)
  */
 int finish_mkwrite_fault(struct vm_fault *vmf)
 {
-   WARN_ON_ONCE(!(vmf->vma->vm_flags & VM_SHARED));
+   WARN_ON_ONCE(!(vmf->vma_flags & VM_SHARED));
if (!pte_map_lock(vmf))
return VM_FAULT_RETRY;
/*
@@ -2553,7 +2553,7 @@ static int do_wp_page(struct vm_fault *vmf)
 * We should not cow pages in a shared writeable mapping.
 * Just mark the pages writable and/or call ops->pfn_mkwrite.
 */
-   if ((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
+   if ((vmf->vma_flags & (VM_WRITE|VM_SHARED)) ==
 (VM_WRITE|VM_SHARED))
return wp_pfn_shared(vmf);
 
@@ -2600,7 +2600,7 @@ static int do_wp_page(struct vm_fault *vmf)
return VM_FAULT_WRITE;
}
unlock_page(vmf->page);
-   } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
+   } else if (unlikely((vmf->vma_flags & (VM_WRITE|VM_SHARED)) ==
(VM_WRITE|VM_SHARED))) {
return wp_page_shared(vmf);
}
@@ -2817,7 +2817,7 @@ int do_swap_page(struct vm_fault *vmf)
 
inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS);
-   pte = mk_pte(page, vma->vm_page_prot);
+   pte = mk_pte(page, vmf->vma_page_prot);
if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) {
pte = maybe_mkwrite(pte_mkdirty(pte), vma);
vmf->flags &= ~FAULT_FLAG_WRITE;
@@ -2841,7 +2841,7 @@ int do_swap_page(struct vm_fault *vmf)
 
swap_free(entry);
if (mem_cgroup_swap_full(page) ||
-   (vma->vm_flags & 

[PATCH v2 06/20] mm: RCU free VMAs

2017-08-17 Thread Laurent Dufour
From: Peter Zijlstra 

Manage the VMAs with SRCU such that we can do a lockless VMA lookup.

We put the fput(vma->vm_file) in the SRCU callback, this keeps files
valid during speculative faults, this is possible due to the delayed
fput work by Al Viro -- do we need srcu_barrier() in unmount
someplace?

We guard the mm_rb tree with a seqlock (this could be a seqcount but
we'd have to disable preemption around the write side in order to make
the retry loop in __read_seqcount_begin() work) such that we can know
if the rb tree walk was correct. We cannot trust the restult of a
lockless tree walk in the face of concurrent tree rotations; although
we can trust on the termination of such walks -- tree rotations
guarantee the end result is a tree again after all.

Furthermore, we rely on the WMB implied by the
write_seqlock/count_begin() to separate the VMA initialization and the
publishing stores, analogous to the RELEASE in rcu_assign_pointer().
We also rely on the RMB from read_seqretry() to separate the vma load
from further loads like the smp_read_barrier_depends() in regular
RCU.

We must not touch the vmacache while doing SRCU lookups as that is not
properly serialized against changes. We update gap information after
publishing the VMA, but A) we don't use that and B) the seqlock
read side would fix that anyhow.

We clear vma->vm_rb for nodes removed from the vma tree such that we
can easily detect such 'dead' nodes, we rely on the WMB from
write_sequnlock() to separate the tree removal and clearing the node.

Provide find_vma_srcu() which wraps the required magic.

Signed-off-by: Peter Zijlstra (Intel) 

[Remove the warnings in description about the SRCU global lock which
 has been removed now]
[Rename vma_is_dead() to vma_has_changed() and move its adding to the next
 patch]
Signed-off-by: Laurent Dufour 
---
 include/linux/mm_types.h |   2 +
 kernel/fork.c|   1 +
 mm/init-mm.c |   1 +
 mm/internal.h|   5 +++
 mm/mmap.c| 100 +++
 5 files changed, 83 insertions(+), 26 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 642aad26b32f..f3851b250fde 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -343,6 +343,7 @@ struct vm_area_struct {
 #endif
struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
seqcount_t vm_sequence;
+   struct rcu_head vm_rcu_head;
 } __randomize_layout;
 
 struct core_thread {
@@ -360,6 +361,7 @@ struct kioctx_table;
 struct mm_struct {
struct vm_area_struct *mmap;/* list of VMAs */
struct rb_root mm_rb;
+   seqlock_t mm_seq;
u32 vmacache_seqnum;   /* per-thread vmacache */
 #ifdef CONFIG_MMU
unsigned long (*get_unmapped_area) (struct file *filp,
diff --git a/kernel/fork.c b/kernel/fork.c
index e075b7780421..f28aa54c668c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -791,6 +791,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm->mmap = NULL;
mm->mm_rb = RB_ROOT;
mm->vmacache_seqnum = 0;
+   seqlock_init(>mm_seq);
atomic_set(>mm_users, 1);
atomic_set(>mm_count, 1);
init_rwsem(>mmap_sem);
diff --git a/mm/init-mm.c b/mm/init-mm.c
index 975e49f00f34..2b1fa061684f 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -16,6 +16,7 @@
 
 struct mm_struct init_mm = {
.mm_rb  = RB_ROOT,
+   .mm_seq = __SEQLOCK_UNLOCKED(init_mm.mm_seq),
.pgd= swapper_pg_dir,
.mm_users   = ATOMIC_INIT(2),
.mm_count   = ATOMIC_INIT(1),
diff --git a/mm/internal.h b/mm/internal.h
index 4ef49fc55e58..736540f15936 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -40,6 +40,11 @@ void page_writeback_init(void);
 
 int do_swap_page(struct vm_fault *vmf);
 
+extern struct srcu_struct vma_srcu;
+
+extern struct vm_area_struct *find_vma_srcu(struct mm_struct *mm,
+   unsigned long addr);
+
 void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
unsigned long floor, unsigned long ceiling);
 
diff --git a/mm/mmap.c b/mm/mmap.c
index b480043e38fb..34a7f1bdffe4 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -159,6 +159,23 @@ void unlink_file_vma(struct vm_area_struct *vma)
}
 }
 
+DEFINE_SRCU(vma_srcu);
+
+static void __free_vma(struct rcu_head *head)
+{
+   struct vm_area_struct *vma =
+   container_of(head, struct vm_area_struct, vm_rcu_head);
+
+   if (vma->vm_file)
+   fput(vma->vm_file);
+   kmem_cache_free(vm_area_cachep, vma);
+}
+
+static void free_vma(struct vm_area_struct *vma)
+{
+   call_srcu(_srcu, >vm_rcu_head, __free_vma);
+}
+
 /*
  * Close a vm structure and free it, returning the next.
  */
@@ -169,10 +186,8 @@ static struct vm_area_struct 

Re: [PATCH kernel] PCI: Disable IOV before pcibios_sriov_disable()

2017-08-17 Thread Alexey Kardashevskiy
On 11/08/17 18:19, Alexey Kardashevskiy wrote:
> From: Gavin Shan 
> 
> The PowerNV platform is the only user of pcibios_sriov_disable().
> The IOV BAR could be shifted by pci_iov_update_resource(). The
> warning message in the function is printed if the IOV capability
> is in enabled (PCI_SRIOV_CTRL_VFE && PCI_SRIOV_CTRL_MSE) state.
> 
> This is the backtrace of what is happening:
>pci_disable_sriov
>sriov_disable
>pnv_pci_sriov_disable
>pnv_pci_vf_resource_shift
>pci_update_resource
>pci_iov_update_resource
> 
> This fixes the issue by disabling IOV capability before calling
> pcibios_sriov_disable(). With it, the disabling path matches
> the enabling path: pcibios_sriov_enable() is called before the
> IOV capability is enabled.
> 
> Cc: shan.ga...@gmail.com
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Reported-by: Carol L Soto 
> Signed-off-by: Gavin Shan 
> Tested-by: Carol L Soto 
> Signed-off-by: Alexey Kardashevskiy 
> ---
> 
> This is repost. Since Gavin left the team, I am trying to push it out.
> The previos converstion is here: https://patchwork.ozlabs.org/patch/732653/
> 
> Two questions were raised then. I'll try to comment on this below.

Bjorn, ping? Thanks.

> 
>> 1) "res" is already in the resource tree, so we shouldn't be changing
>>   its start address, because that may make the tree inconsistent,
>>   e.g., the resource may no longer be completely contained in its
>>   parent, it may conflict with a sibling, etc.
> 
> We should not, yes. But...
> 
> At the boot time IOV BAR gets as much MMIO space as it can possibly use.
> (Embarassingly I cannot trace where this is coming from, 8GB is selected
> via pci_assign_unassigned_root_bus_resources() path somehow).
> For example, it is 256*32MB=8GB where 256 is maximum PEs number and 32MB
> is a PF/VF BAR size. Whatever shifting we do afterwards, the boudaries of
> that 8GB area do not change and we test it in pnv_pci_vf_resource_shift():
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/platforms/powernv/pci-ioda.c#n987
> 
>> 2) If we update "res->start", shouldn't we update "res->end"
>>   correspondingly?
> 
> We have to update the PF's IOV BAR address as we allocate PEs dynamically
> and we do not know in advance where our VF numbers start in that
> 8GB window. So we change IOV BASR start. Changing the end may make it
> look more like there is a free area to use but in reality it won't be
> usable as well as the area we "release" by shifting the start address.
> 
> We could probably move that M64 MMIO window by the same delta in
> opposite direction so the IOV BAR start address would remain the same
> but its VF#0 would be mapped to let's say PF#5. I am just afraid there
> is an alignment requirement for these M64 window start address; and this
> would be even more tricky to manage.
> 
> We could also create reserved areas for the amount of space "release" by
> moving the start address, not sure how though.
> 
> So how do we proceed with this particular patch now? Thanks.
> ---
>  drivers/pci/iov.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 120485d6f352..ac41c8be9200 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -331,7 +331,6 @@ static int sriov_enable(struct pci_dev *dev, int 
> nr_virtfn)
>   while (i--)
>   pci_iov_remove_virtfn(dev, i, 0);
>  
> - pcibios_sriov_disable(dev);
>  err_pcibios:
>   iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>   pci_cfg_access_lock(dev);
> @@ -339,6 +338,8 @@ static int sriov_enable(struct pci_dev *dev, int 
> nr_virtfn)
>   ssleep(1);
>   pci_cfg_access_unlock(dev);
>  
> + pcibios_sriov_disable(dev);
> +
>   if (iov->link != dev->devfn)
>   sysfs_remove_link(>dev.kobj, "dep_link");
>  
> @@ -357,14 +358,14 @@ static void sriov_disable(struct pci_dev *dev)
>   for (i = 0; i < iov->num_VFs; i++)
>   pci_iov_remove_virtfn(dev, i, 0);
>  
> - pcibios_sriov_disable(dev);
> -
>   iov->ctrl &= ~(PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
>   pci_cfg_access_lock(dev);
>   pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL, iov->ctrl);
>   ssleep(1);
>   pci_cfg_access_unlock(dev);
>  
> + pcibios_sriov_disable(dev);
> +
>   if (iov->link != dev->devfn)
>   sysfs_remove_link(>dev.kobj, "dep_link");
>  
> 


-- 
Alexey


[PATCH v2 05/20] mm: Protect VMA modifications using VMA sequence count

2017-08-17 Thread Laurent Dufour
The VMA sequence count has been introduced to allow fast detection of
VMA modification when running a page fault handler without holding
the mmap_sem.

This patch provides protection against the VMA modification done in :
- madvise()
- mremap()
- mpol_rebind_policy()
- vma_replace_policy()
- change_prot_numa()
- mlock(), munlock()
- mprotect()
- mmap_region()
- collapse_huge_page()
- userfaultd registering services

In addition, VMA fields which will be read during the speculative fault
path needs to be written using WRITE_ONCE to prevent write to be split
and intermediate values to be pushed to other CPUs.

Signed-off-by: Laurent Dufour 
---
 fs/proc/task_mmu.c |  5 -
 fs/userfaultfd.c   | 17 +
 mm/khugepaged.c|  3 +++
 mm/madvise.c   |  6 +-
 mm/mempolicy.c | 51 ++-
 mm/mlock.c | 13 -
 mm/mmap.c  | 17 ++---
 mm/mprotect.c  |  4 +++-
 mm/mremap.c|  7 +++
 9 files changed, 87 insertions(+), 36 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fe8f3265e877..e682179edaae 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1067,8 +1067,11 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
goto out_mm;
}
for (vma = mm->mmap; vma; vma = vma->vm_next) {
-   vma->vm_flags &= ~VM_SOFTDIRTY;
+   write_seqcount_begin(>vm_sequence);
+   WRITE_ONCE(vma->vm_flags,
+  vma->vm_flags & 
~VM_SOFTDIRTY);
vma_set_page_prot(vma);
+   write_seqcount_end(>vm_sequence);
}
downgrade_write(>mmap_sem);
break;
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index b0d5897bc4e6..77b1e025c88e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -612,8 +612,11 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct 
list_head *fcs)
 
octx = vma->vm_userfaultfd_ctx.ctx;
if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) {
+   write_seqcount_begin(>vm_sequence);
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
-   vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING);
+   WRITE_ONCE(vma->vm_flags,
+  vma->vm_flags & ~(VM_UFFD_WP | VM_UFFD_MISSING));
+   write_seqcount_end(>vm_sequence);
return 0;
}
 
@@ -838,8 +841,10 @@ static int userfaultfd_release(struct inode *inode, struct 
file *file)
vma = prev;
else
prev = vma;
-   vma->vm_flags = new_flags;
+   write_seqcount_begin(>vm_sequence);
+   WRITE_ONCE(vma->vm_flags, new_flags);
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
+   write_seqcount_end(>vm_sequence);
}
up_write(>mmap_sem);
mmput(mm);
@@ -1357,8 +1362,10 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
 * the next vma was merged into the current one and
 * the current one has not been updated yet.
 */
-   vma->vm_flags = new_flags;
+   write_seqcount_begin(>vm_sequence);
+   WRITE_ONCE(vma->vm_flags, new_flags);
vma->vm_userfaultfd_ctx.ctx = ctx;
+   write_seqcount_end(>vm_sequence);
 
skip:
prev = vma;
@@ -1515,8 +1522,10 @@ static int userfaultfd_unregister(struct userfaultfd_ctx 
*ctx,
 * the next vma was merged into the current one and
 * the current one has not been updated yet.
 */
-   vma->vm_flags = new_flags;
+   write_seqcount_begin(>vm_sequence);
+   WRITE_ONCE(vma->vm_flags, new_flags);
vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
+   write_seqcount_end(>vm_sequence);
 
skip:
prev = vma;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c01f177a1120..56dd994c05d0 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1005,6 +1005,7 @@ static void collapse_huge_page(struct mm_struct *mm,
if (mm_find_pmd(mm, address) != pmd)
goto out;
 
+   write_seqcount_begin(>vm_sequence);
anon_vma_lock_write(vma->anon_vma);
 
pte = pte_offset_map(pmd, address);
@@ -1040,6 +1041,7 @@ static void collapse_huge_page(struct mm_struct *mm,
pmd_populate(mm, pmd, pmd_pgtable(_pmd));
 

[PATCH v2 04/20] mm: VMA sequence count

2017-08-17 Thread Laurent Dufour
From: Peter Zijlstra 

Wrap the VMA modifications (vma_adjust/unmap_page_range) with sequence
counts such that we can easily test if a VMA is changed.

The unmap_page_range() one allows us to make assumptions about
page-tables; when we find the seqcount hasn't changed we can assume
page-tables are still valid.

The flip side is that we cannot distinguish between a vma_adjust() and
the unmap_page_range() -- where with the former we could have
re-checked the vma bounds against the address.

Signed-off-by: Peter Zijlstra (Intel) 

[Port to 4.12 kernel]
[Fix lock dependency between mapping->i_mmap_rwsem and vma->vm_sequence]
Signed-off-by: Laurent Dufour 
---
 include/linux/mm_types.h |  1 +
 mm/memory.c  |  2 ++
 mm/mmap.c| 21 ++---
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3cadee0a3508..642aad26b32f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -342,6 +342,7 @@ struct vm_area_struct {
struct mempolicy *vm_policy;/* NUMA policy for the VMA */
 #endif
struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
+   seqcount_t vm_sequence;
 } __randomize_layout;
 
 struct core_thread {
diff --git a/mm/memory.c b/mm/memory.c
index fa598889eb0e..4a2736fe2ef6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1408,6 +1408,7 @@ void unmap_page_range(struct mmu_gather *tlb,
unsigned long next;
 
BUG_ON(addr >= end);
+   write_seqcount_begin(>vm_sequence);
tlb_start_vma(tlb, vma);
pgd = pgd_offset(vma->vm_mm, addr);
do {
@@ -1417,6 +1418,7 @@ void unmap_page_range(struct mmu_gather *tlb,
next = zap_p4d_range(tlb, vma, pgd, addr, next, details);
} while (pgd++, addr = next, addr != end);
tlb_end_vma(tlb, vma);
+   write_seqcount_end(>vm_sequence);
 }
 
 
diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..140b22136cb7 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -557,6 +557,8 @@ void __vma_link_rb(struct mm_struct *mm, struct 
vm_area_struct *vma,
else
mm->highest_vm_end = vm_end_gap(vma);
 
+   seqcount_init(>vm_sequence);
+
/*
 * vma->vm_prev wasn't known when we followed the rbtree to find the
 * correct insertion point for that vma. As a result, we could not
@@ -798,6 +800,11 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long 
start,
}
}
 
+   write_seqcount_begin(>vm_sequence);
+   if (next && next != vma)
+   write_seqcount_begin_nested(>vm_sequence,
+   SINGLE_DEPTH_NESTING);
+
anon_vma = vma->anon_vma;
if (!anon_vma && adjust_next)
anon_vma = next->anon_vma;
@@ -902,6 +909,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long 
start,
mm->map_count--;
mpol_put(vma_policy(next));
kmem_cache_free(vm_area_cachep, next);
+   write_seqcount_end(>vm_sequence);
/*
 * In mprotect's case 6 (see comments on vma_merge),
 * we must remove another next too. It would clutter
@@ -931,11 +939,14 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned 
long start,
if (remove_next == 2) {
remove_next = 1;
end = next->vm_end;
+   write_seqcount_end(>vm_sequence);
goto again;
-   }
-   else if (next)
+   } else if (next) {
+   if (next != vma)
+   write_seqcount_begin_nested(>vm_sequence,
+   
SINGLE_DEPTH_NESTING);
vma_gap_update(next);
-   else {
+   } else {
/*
 * If remove_next == 2 we obviously can't
 * reach this path.
@@ -961,6 +972,10 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long 
start,
if (insert && file)
uprobe_mmap(insert);
 
+   if (next && next != vma)
+   write_seqcount_end(>vm_sequence);
+   write_seqcount_end(>vm_sequence);
+
validate_mm(mm);
 
return 0;
-- 
2.7.4



[PATCH v2 03/20] mm: Introduce pte_spinlock for FAULT_FLAG_SPECULATIVE

2017-08-17 Thread Laurent Dufour
When handling page fault without holding the mmap_sem the fetch of the
pte lock pointer and the locking will have to be done while ensuring
that the VMA is not touched in our back.

So move the fetch and locking operations in a dedicated function.

Signed-off-by: Laurent Dufour 
---
 mm/memory.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 3ed1b00ca841..fa598889eb0e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2269,6 +2269,13 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
 }
 
+static bool pte_spinlock(struct vm_fault *vmf)
+{
+   vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
+   spin_lock(vmf->ptl);
+   return true;
+}
+
 static bool pte_map_lock(struct vm_fault *vmf)
 {
vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, 
>ptl);
@@ -3543,8 +3550,8 @@ static int do_numa_page(struct vm_fault *vmf)
 * validation through pte_unmap_same(). It's of NUMA type but
 * the pfn may be screwed if the read is non atomic.
 */
-   vmf->ptl = pte_lockptr(vma->vm_mm, vmf->pmd);
-   spin_lock(vmf->ptl);
+   if (!pte_spinlock(vmf))
+   return VM_FAULT_RETRY;
if (unlikely(!pte_same(*vmf->pte, vmf->orig_pte))) {
pte_unmap_unlock(vmf->pte, vmf->ptl);
goto out;
@@ -3736,8 +3743,8 @@ static int handle_pte_fault(struct vm_fault *vmf)
if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma))
return do_numa_page(vmf);
 
-   vmf->ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd);
-   spin_lock(vmf->ptl);
+   if (!pte_spinlock(vmf))
+   return VM_FAULT_RETRY;
entry = vmf->orig_pte;
if (unlikely(!pte_same(*vmf->pte, entry)))
goto unlock;
-- 
2.7.4



[PATCH v2 02/20] mm: Prepare for FAULT_FLAG_SPECULATIVE

2017-08-17 Thread Laurent Dufour
From: Peter Zijlstra 

When speculating faults (without holding mmap_sem) we need to validate
that the vma against which we loaded pages is still valid when we're
ready to install the new PTE.

Therefore, replace the pte_offset_map_lock() calls that (re)take the
PTL with pte_map_lock() which can fail in case we find the VMA changed
since we started the fault.

Signed-off-by: Peter Zijlstra (Intel) 

[Port to 4.12 kernel]
[Remove the comment about the fault_env structure which has been
 implemented as the vm_fault structure in the kernel]
Signed-off-by: Laurent Dufour 
---
 include/linux/mm.h |  1 +
 mm/memory.c| 55 ++
 2 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..8763ec96dc78 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -286,6 +286,7 @@ extern pgprot_t protection_map[16];
 #define FAULT_FLAG_USER0x40/* The fault originated in 
userspace */
 #define FAULT_FLAG_REMOTE  0x80/* faulting for non current tsk/mm */
 #define FAULT_FLAG_INSTRUCTION  0x100  /* The fault was during an instruction 
fetch */
+#define FAULT_FLAG_SPECULATIVE 0x200   /* Speculative fault, not holding 
mmap_sem */
 
 #define FAULT_FLAG_TRACE \
{ FAULT_FLAG_WRITE, "WRITE" }, \
diff --git a/mm/memory.c b/mm/memory.c
index 36609c082256..3ed1b00ca841 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2269,6 +2269,12 @@ static inline void wp_page_reuse(struct vm_fault *vmf)
pte_unmap_unlock(vmf->pte, vmf->ptl);
 }
 
+static bool pte_map_lock(struct vm_fault *vmf)
+{
+   vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address, 
>ptl);
+   return true;
+}
+
 /*
  * Handle the case of a page which we actually need to copy to a new page.
  *
@@ -2296,6 +2302,7 @@ static int wp_page_copy(struct vm_fault *vmf)
const unsigned long mmun_start = vmf->address & PAGE_MASK;
const unsigned long mmun_end = mmun_start + PAGE_SIZE;
struct mem_cgroup *memcg;
+   int ret = VM_FAULT_OOM;
 
if (unlikely(anon_vma_prepare(vma)))
goto oom;
@@ -2323,7 +2330,11 @@ static int wp_page_copy(struct vm_fault *vmf)
/*
 * Re-check the pte - we dropped the lock
 */
-   vmf->pte = pte_offset_map_lock(mm, vmf->pmd, vmf->address, >ptl);
+   if (!pte_map_lock(vmf)) {
+   mem_cgroup_cancel_charge(new_page, memcg, false);
+   ret = VM_FAULT_RETRY;
+   goto oom_free_new;
+   }
if (likely(pte_same(*vmf->pte, vmf->orig_pte))) {
if (old_page) {
if (!PageAnon(old_page)) {
@@ -2411,7 +2422,7 @@ static int wp_page_copy(struct vm_fault *vmf)
 oom:
if (old_page)
put_page(old_page);
-   return VM_FAULT_OOM;
+   return ret;
 }
 
 /**
@@ -2432,8 +2443,8 @@ static int wp_page_copy(struct vm_fault *vmf)
 int finish_mkwrite_fault(struct vm_fault *vmf)
 {
WARN_ON_ONCE(!(vmf->vma->vm_flags & VM_SHARED));
-   vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, vmf->address,
-  >ptl);
+   if (!pte_map_lock(vmf))
+   return VM_FAULT_RETRY;
/*
 * We might have raced with another page fault while we released the
 * pte_offset_map_lock.
@@ -2551,8 +2562,11 @@ static int do_wp_page(struct vm_fault *vmf)
get_page(vmf->page);
pte_unmap_unlock(vmf->pte, vmf->ptl);
lock_page(vmf->page);
-   vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-   vmf->address, >ptl);
+   if (!pte_map_lock(vmf)) {
+   unlock_page(vmf->page);
+   put_page(vmf->page);
+   return VM_FAULT_RETRY;
+   }
if (!pte_same(*vmf->pte, vmf->orig_pte)) {
unlock_page(vmf->page);
pte_unmap_unlock(vmf->pte, vmf->ptl);
@@ -2710,8 +2724,10 @@ int do_swap_page(struct vm_fault *vmf)
 * Back out if somebody else faulted in this pte
 * while we released the pte lock.
 */
-   vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
-   vmf->address, >ptl);
+   if (!pte_map_lock(vmf)) {
+   delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
+   return VM_FAULT_RETRY;
+   }
if (likely(pte_same(*vmf->pte, vmf->orig_pte)))
ret = VM_FAULT_OOM;

[PATCH v2 01/20] mm: Dont assume page-table invariance during faults

2017-08-17 Thread Laurent Dufour
From: Peter Zijlstra 

One of the side effects of speculating on faults (without holding
mmap_sem) is that we can race with free_pgtables() and therefore we
cannot assume the page-tables will stick around.

Remove the reliance on the pte pointer.

Signed-off-by: Peter Zijlstra (Intel) 
---
 mm/memory.c | 27 ---
 1 file changed, 27 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index e158f7ac6730..36609c082256 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2131,30 +2131,6 @@ int apply_to_page_range(struct mm_struct *mm, unsigned 
long addr,
 }
 EXPORT_SYMBOL_GPL(apply_to_page_range);
 
-/*
- * handle_pte_fault chooses page fault handler according to an entry which was
- * read non-atomically.  Before making any commitment, on those architectures
- * or configurations (e.g. i386 with PAE) which might give a mix of unmatched
- * parts, do_swap_page must check under lock before unmapping the pte and
- * proceeding (but do_wp_page is only called after already making such a check;
- * and do_anonymous_page can safely check later on).
- */
-static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd,
-   pte_t *page_table, pte_t orig_pte)
-{
-   int same = 1;
-#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
-   if (sizeof(pte_t) > sizeof(unsigned long)) {
-   spinlock_t *ptl = pte_lockptr(mm, pmd);
-   spin_lock(ptl);
-   same = pte_same(*page_table, orig_pte);
-   spin_unlock(ptl);
-   }
-#endif
-   pte_unmap(page_table);
-   return same;
-}
-
 static inline void cow_user_page(struct page *dst, struct page *src, unsigned 
long va, struct vm_area_struct *vma)
 {
debug_dma_assert_idle(src);
@@ -2711,9 +2687,6 @@ int do_swap_page(struct vm_fault *vmf)
int exclusive = 0;
int ret = 0;
 
-   if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte))
-   goto out;
-
entry = pte_to_swp_entry(vmf->orig_pte);
if (unlikely(non_swap_entry(entry))) {
if (is_migration_entry(entry)) {
-- 
2.7.4



[PATCH v2 00/20] Speculative page faults

2017-08-17 Thread Laurent Dufour
This is a port on kernel 4.13 of the work done by Peter Zijlstra to
handle page fault without holding the mm semaphore [1].

The idea is to try to handle user space page faults without holding the
mmap_sem. This should allow better concurrency for massively threaded
process since the page fault handler will not wait for other threads memory
layout change to be done, assuming that this change is done in another part
of the process's memory space. This type page fault is named speculative
page fault. If the speculative page fault fails because of a concurrency is
detected or because underlying PMD or PTE tables are not yet allocating, it
is failing its processing and a classic page fault is then tried.

The speculative page fault (SPF) has to look for the VMA matching the fault
address without holding the mmap_sem, so the VMA list is now managed using
SRCU allowing lockless walking. The only impact would be the deferred file
derefencing in the case of a file mapping, since the file pointer is
released once the SRCU cleaning is done.  This patch relies on the change
done recently by Paul McKenney in SRCU which now runs a callback per CPU
instead of per SRCU structure [1].

The VMA's attributes checked during the speculative page fault processing
have to be protected against parallel changes. This is done by using a per
VMA sequence lock. This sequence lock allows the speculative page fault
handler to fast check for parallel changes in progress and to abort the
speculative page fault in that case.

Once the VMA is found, the speculative page fault handler would check for
the VMA's attributes to verify that the page fault has to be handled
correctly or not. Thus the VMA is protected through a sequence lock which
allows fast detection of concurrent VMA changes. If such a change is
detected, the speculative page fault is aborted and a *classic* page fault
is tried.  VMA sequence locks are added when VMA attributes which are
checked during the page fault are modified.

When the PTE is fetched, the VMA is checked to see if it has been changed,
so once the page table is locked, the VMA is valid, so any other changes
leading to touching this PTE will need to lock the page table, so no
parallel change is possible at this time.

Compared to the Peter's initial work, this series introduces a spin_trylock
when dealing with speculative page fault. This is required to avoid dead
lock when handling a page fault while a TLB invalidate is requested by an
other CPU holding the PTE. Another change due to a lock dependency issue
with mapping->i_mmap_rwsem.

In addition some VMA field values which are used once the PTE is unlocked
at the end the page fault path are saved into the vm_fault structure to
used the values matching the VMA at the time the PTE was locked.

This series builds on top of v4.13-rc5 and is functional on x86 and
PowerPC.

Tests have been made using a large commercial in-memory database on a
PowerPC system with 752 CPU using RFC v5. The results are very encouraging
since the loading of the 2TB database was faster by 14% with the
speculative page fault.

Using ebizzy test [3], which spreads a lot of threads, the result are good
when running on both a large or a small system. When using kernbench, the
result are quite similar which expected as not so much multithreaded
processes are involved. But there is no performance degradation neither
which is good.

--
Benchmarks results

Note these test have been made on top of 4.13-rc3 with the following patch
from Paul McKenney applied: 
 "srcu: Provide ordering for CPU not involved in grace period" [5]

Ebizzy:
---
The test is counting the number of records per second it can manage, the
higher is the best. I run it like this 'ebizzy -mTRp'. To get consistent
result I repeated the test 100 times and measure the average result, mean
deviation, max and min.

- 16 CPUs x86 VM
Records/s   4.13-rc54.13-rc5-spf
Average 11350.2921760.36
Mean deviation  396.56  881.40
Max 13773   26194
Min 10567   19223

- 80 CPUs Power 8 node:
Records/s   4.13-rc54.13-rc5-spf
Average 33904.6758847.91
Mean deviation  789.40  1753.19
Max 36703   68958
Min 31759   55125

The number of record per second is far better with the speculative page
fault.
The mean deviation is higher with the speculative page fault, may be
because sometime the fault are not handled in a speculative way leading to
more variation.


Kernbench:
--
This test is building a 4.12 kernel using platform default config. The
build has been run 5 times each time.

- 16 CPUs x86 VM
Average Half load -j 8 Run (std deviation)
 4.13.0-rc5 4.13.0-rc5-spf
Elapsed Time 166.574 (0.340779) 145.754 (0.776325)  
User Time1080.77 (2.05871)  999.272 (4.12142)   
System Time  204.594 (1.02449) 

Re: [PATCH v7 3/4] lib/cmdline.c Remove quotes symmetrically.

2017-08-17 Thread msuchanek
On Thu, 17 Aug 2017 22:14:30 +0200
Michal Suchanek  wrote:

> Remove quotes from argument value only if there is qoute on both
> sides.
> 
> Signed-off-by: Michal Suchanek 
> ---
>  arch/powerpc/kernel/fadump.c | 6 ++
>  lib/cmdline.c| 7 ++-
>  2 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/fadump.c
> b/arch/powerpc/kernel/fadump.c index a1614d9b8a21..d7da4ce9f7ae 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -489,10 +489,8 @@ static void __init fadump_update_params(struct
> param_info *param_info, *tgt++ = ' ';
>  
>   /* next_arg removes one leading and one trailing '"' */
> - if (*tgt == '"')
> - shortening += 1;
> - if (*(tgt + vallen + shortening) == '"')
> - shortening += 1;
> + if ((*tgt == '"') && (*(tgt + vallen + shortening) == '"'))
s/shortening/1/^

in case somebody would want this patch and not the following one that
removes the code.


Re: [RFC v7 02/25] powerpc: track allocation status of all pkeys

2017-08-17 Thread Thiago Jung Bauermann

Ram Pai  writes:

> On Thu, Aug 10, 2017 at 05:25:39PM -0300, Thiago Jung Bauermann wrote:
>> 
>> Ram Pai  writes:
>> >  static inline void pkey_initialize(void)
>> >  {
>> > +  int os_reserved, i;
>> > +
>> >/* disable the pkey system till everything
>> > * is in place. A patch further down the
>> > * line will enable it.
>> > */
>> >pkey_inited = false;
>> > +
>> > +  /* Lets assume 32 keys */
>> > +  pkeys_total = 32;
>> > +
>> > +#ifdef CONFIG_PPC_4K_PAGES
>> > +  /*
>> > +   * the OS can manage only 8 pkeys
>> > +   * due to its inability to represent
>> > +   * them in the linux 4K-PTE.
>> > +   */
>> > +  os_reserved = pkeys_total-8;
>> > +#else
>> > +  os_reserved = 0;
>> > +#endif
>> > +  /*
>> > +   * Bits are in LE format.
>> > +   * NOTE: 1, 0 are reserved.
>> > +   * key 0 is the default key, which allows read/write/execute.
>> > +   * key 1 is recommended not to be used.
>> > +   * PowerISA(3.0) page 1015, programming note.
>> > +   */
>> > +  initial_allocation_mask = ~0x0;
>> > +  for (i = 2; i < (pkeys_total - os_reserved); i++)
>> > +  initial_allocation_mask &= ~(0x1<> >  }
>> >  #endif /*_ASM_PPC64_PKEYS_H */
>> 
>> In v6, key 31 was also reserved, but it's not in this version. Is this
>> intentional?
>
> On powernv platform, there is no hypervisor and hence the hypervisor
> will not reserve key 31 for its own use. Wherease on PAPR guest
> the hypervisor takes away key 31.  
>
> Its not possible to determine at compile time which keys are used
> or not. Hence the above code.  pkeys_total is 32 in this patch,
> but will be set to whatever value the device tree tells us. That will
> be done in a subsequent patch.

You're right. At the time I made that comment I didn't realize that the
hypervisor would subtract its reserved key from the device property.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [RFC v7 25/25] powerpc: Enable pkey subsystem

2017-08-17 Thread Thiago Jung Bauermann

Ram Pai  writes:

> On Thu, Aug 10, 2017 at 06:27:34PM -0300, Thiago Jung Bauermann wrote:
>> 
>> Ram Pai  writes:
>> > --- a/arch/powerpc/include/asm/cputable.h
>> > +++ b/arch/powerpc/include/asm/cputable.h
>> > @@ -214,6 +214,7 @@ enum {
>> >  #define CPU_FTR_DAWR  
>> > LONG_ASM_CONST(0x0400)
>> >  #define CPU_FTR_DABRX 
>> > LONG_ASM_CONST(0x0800)
>> >  #define CPU_FTR_PMAO_BUG  LONG_ASM_CONST(0x1000)
>> > +#define CPU_FTR_PKEY  
>> > LONG_ASM_CONST(0x2000)
>> >  #define CPU_FTR_POWER9_DD1
>> > LONG_ASM_CONST(0x4000)
>> >
>> >  #ifndef __ASSEMBLY__
>> > @@ -452,7 +453,7 @@ enum {
>> >CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
>> >CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
>> >CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | \
>> > -  CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX)
>> > +  CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY)
>> 
>> P7 supports protection keys for data access (AMR) but not for
>> instruction access (IAMR), right? There's nothing in the code making
>> this distinction, so either CPU_FTR_PKEY shouldn't be enabled in P7 or
>> separate feature bits for AMR and IAMR should be used and checked before
>> trying to access the IAMR.
>
> did'nt David say P7 supports both? P6, i think, only support data.
> my pkey tests have passed on p7.

He said that P7 was the first processor to support 32 keys, but if you
look at the Virtual Page Class Key Protection section in ISA 2.06,
there's no IAMR.

There was a bug in the code where init_iamr was calling write_amr
instead of write_iamr, perhaps that's why it worked when you tested on P7?

>> 
>> >  #define CPU_FTRS_POWER8 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
>> >CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
>> >CPU_FTR_MMCRA | CPU_FTR_SMT | \
>> > @@ -462,7 +463,7 @@ enum {
>> >CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
>> >CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
>> >CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
>> > -  CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
>> > +  CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_PKEY)
>> >  #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
>> >  #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
>> >  #define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
>> > @@ -474,7 +475,8 @@ enum {
>> >CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
>> >CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
>> >CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
>> > -  CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300)
>> > +  CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | \
>> > +  CPU_FTR_PKEY)
>> >  #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
>> > (~CPU_FTR_SAO))
>> >  #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
>> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
>> > b/arch/powerpc/include/asm/mmu_context.h
>> > index a1cfcca..acd59d8 100644
>> > --- a/arch/powerpc/include/asm/mmu_context.h
>> > +++ b/arch/powerpc/include/asm/mmu_context.h
>> > @@ -188,6 +188,7 @@ static inline bool arch_vma_access_permitted(struct 
>> > vm_area_struct *vma,
>> >
>> >  #define pkey_initialize()
>> >  #define pkey_mm_init(mm)
>> > +#define pkey_mmu_values(total_data, total_execute)
>> >
>> >  static inline int vma_pkey(struct vm_area_struct *vma)
>> >  {
>> > diff --git a/arch/powerpc/include/asm/pkeys.h 
>> > b/arch/powerpc/include/asm/pkeys.h
>> > index ba7bff6..e61ed6c 100644
>> > --- a/arch/powerpc/include/asm/pkeys.h
>> > +++ b/arch/powerpc/include/asm/pkeys.h
>> > @@ -1,6 +1,8 @@
>> >  #ifndef _ASM_PPC64_PKEYS_H
>> >  #define _ASM_PPC64_PKEYS_H
>> >
>> > +#include 
>> > +
>> >  extern bool pkey_inited;
>> >  extern int pkeys_total; /* total pkeys as per device tree */
>> >  extern u32 initial_allocation_mask;/* bits set for reserved keys */
>> > @@ -227,6 +229,24 @@ static inline void pkey_mm_init(struct mm_struct *mm)
>> >mm->context.execute_only_pkey = -1;
>> >  }
>> >
>> > +static inline void pkey_mmu_values(int total_data, int total_execute)
>> > +{
>> > +  /*
>> > +   * since any pkey can be used for data or execute, we
>> > +   * will  just  treat all keys as equal and track them
>> > +   * as one entity.
>> > +   */
>> > +  pkeys_total = total_data + total_execute;
>> > +}
>> 
>> Right now this works because the firmware reports 0 execute keys in the
>> device tree, but if (when?) it is fixed to report 32 execute keys as
>> well as 32 data keys (which are the same keys), any place using
>> pkeys_total expecting it to mean the number of keys that are available

[PATCH v7 4/4] boot/param: add pointer to next argument to unknown parameter callback

2017-08-17 Thread Michal Suchanek
The fadump parameter processing re-does the logic of next_arg quote
stripping to determine where the argument ends. Pass pointer to the
next argument instead to make this more robust.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/fadump.c  | 13 +
 arch/powerpc/mm/hugetlbpage.c |  4 ++--
 include/linux/moduleparam.h   |  2 +-
 init/main.c   | 12 ++--
 kernel/module.c   |  4 ++--
 kernel/params.c   | 19 +++
 lib/dynamic_debug.c   |  2 +-
 7 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index d7da4ce9f7ae..6ef96711ee9a 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -474,13 +474,14 @@ struct param_info {
 };
 
 static void __init fadump_update_params(struct param_info *param_info,
-   char *param, char *val)
+   char *param, char *val, char *next)
 {
ptrdiff_t param_offset = param - param_info->tmp_cmdline;
size_t vallen = val ? strlen(val) : 0;
char *tgt = param_info->cmdline + param_offset +
FADUMP_EXTRA_ARGS_LEN - param_info->shortening;
-   int shortening = 0;
+   int shortening = ((next - 1) - (param))
+   - (FADUMP_EXTRA_ARGS_LEN + 1 + vallen);
 
if (!val)
return;
@@ -488,10 +489,6 @@ static void __init fadump_update_params(struct param_info 
*param_info,
/* remove '=' */
*tgt++ = ' ';
 
-   /* next_arg removes one leading and one trailing '"' */
-   if ((*tgt == '"') && (*(tgt + vallen + shortening) == '"'))
-   shortening += 2;
-
/* remove one leading and one trailing quote if both are present */
if ((val[0] == '"') && (val[vallen - 1] == '"')) {
shortening += 2;
@@ -517,7 +514,7 @@ static void __init fadump_update_params(struct param_info 
*param_info,
  * to enforce the parameters passed through it
  */
 static int __init fadump_rework_cmdline_params(char *param, char *val,
-  const char *unused, void *arg)
+   char *next, const char *unused, void *arg)
 {
struct param_info *param_info = (struct param_info *)arg;
 
@@ -525,7 +522,7 @@ static int __init fadump_rework_cmdline_params(char *param, 
char *val,
 strlen(FADUMP_EXTRA_ARGS_PARAM) - 1))
return 0;
 
-   fadump_update_params(param_info, param, val);
+   fadump_update_params(param_info, param, val, next);
 
return 0;
 }
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index e1bf5ca397fe..3a4cce552906 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -268,8 +268,8 @@ int alloc_bootmem_huge_page(struct hstate *hstate)
 
 unsigned long gpage_npages[MMU_PAGE_COUNT];
 
-static int __init do_gpage_early_setup(char *param, char *val,
-  const char *unused, void *arg)
+static int __init do_gpage_early_setup(char *param, char *val, char *unused1,
+  const char *unused2, void *arg)
 {
static phys_addr_t size;
unsigned long npages;
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 1ee7b30dafec..fec05a186c08 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -326,7 +326,7 @@ extern char *parse_args(const char *name,
  s16 level_min,
  s16 level_max,
  void *arg,
- int (*unknown)(char *param, char *val,
+ int (*unknown)(char *param, char *val, char *next,
 const char *doing, void *arg));
 
 /* Called by module remove. */
diff --git a/init/main.c b/init/main.c
index 052481fbe363..920c3564b2f0 100644
--- a/init/main.c
+++ b/init/main.c
@@ -239,7 +239,7 @@ static int __init loglevel(char *str)
 early_param("loglevel", loglevel);
 
 /* Change NUL term back to "=", to make "param" the whole string. */
-static int __init repair_env_string(char *param, char *val,
+static int __init repair_env_string(char *param, char *val, char *unused2,
const char *unused, void *arg)
 {
if (val) {
@@ -257,7 +257,7 @@ static int __init repair_env_string(char *param, char *val,
 }
 
 /* Anything after -- gets handed straight to init. */
-static int __init set_init_arg(char *param, char *val,
+static int __init set_init_arg(char *param, char *val, char *unused2,
   const char *unused, void *arg)
 {
unsigned int i;
@@ -265,7 +265,7 @@ static int __init set_init_arg(char *param, char *val,
if (panic_later)
return 0;
 
-   repair_env_string(param, val, unused, NULL);
+   

[PATCH v7 3/4] lib/cmdline.c Remove quotes symmetrically.

2017-08-17 Thread Michal Suchanek
Remove quotes from argument value only if there is qoute on both sides.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/fadump.c | 6 ++
 lib/cmdline.c| 7 ++-
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index a1614d9b8a21..d7da4ce9f7ae 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -489,10 +489,8 @@ static void __init fadump_update_params(struct param_info 
*param_info,
*tgt++ = ' ';
 
/* next_arg removes one leading and one trailing '"' */
-   if (*tgt == '"')
-   shortening += 1;
-   if (*(tgt + vallen + shortening) == '"')
-   shortening += 1;
+   if ((*tgt == '"') && (*(tgt + vallen + shortening) == '"'))
+   shortening += 2;
 
/* remove one leading and one trailing quote if both are present */
if ((val[0] == '"') && (val[vallen - 1] == '"')) {
diff --git a/lib/cmdline.c b/lib/cmdline.c
index 4c0888c4a68d..01e701b2afe8 100644
--- a/lib/cmdline.c
+++ b/lib/cmdline.c
@@ -227,14 +227,11 @@ char *next_arg(char *args, char **param, char **val)
*val = args + equals + 1;
 
/* Don't include quotes in value. */
-   if (**val == '"') {
+   if ((**val == '"') && (args[i-1] == '"')) {
(*val)++;
-   if (args[i-1] == '"')
-   args[i-1] = '\0';
+   args[i-1] = '\0';
}
}
-   if (quoted && args[i-1] == '"')
-   args[i-1] = '\0';
 
if (args[i]) {
args[i] = '\0';
-- 
2.10.2



[PATCH v7 2/4] powerpc/fadump: update documentation about 'fadump_extra_args=' parameter

2017-08-17 Thread Michal Suchanek
From: Hari Bathini 

With the introduction of 'fadump_extra_args=' parameter to pass additional
parameters to fadump (capture) kernel, update documentation about it.

Signed-off-by: Hari Bathini 
Signed-off-by: Michal Suchanek 
---
 Documentation/powerpc/firmware-assisted-dump.txt | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/Documentation/powerpc/firmware-assisted-dump.txt 
b/Documentation/powerpc/firmware-assisted-dump.txt
index bdd344aa18d9..2df88524d2c7 100644
--- a/Documentation/powerpc/firmware-assisted-dump.txt
+++ b/Documentation/powerpc/firmware-assisted-dump.txt
@@ -162,7 +162,19 @@ How to enable firmware-assisted dump (fadump):
 
 1. Set config option CONFIG_FA_DUMP=y and build kernel.
 2. Boot into linux kernel with 'fadump=on' kernel cmdline option.
-3. Optionally, user can also set 'crashkernel=' kernel cmdline
+3. A user can pass additional command line parameters as a space
+   separated quoted list through 'fadump_extra_args=' parameter,
+   to be enforced when fadump is active. For example, parameter
+   'fadump_extra_args="nr_cpus=1 numa=off udev.children-max=2"'
+   will be changed to 'fadump_extra_args nr_cpus=1  numa=off
+   udev.children-max=2' in-place when fadump is active. This
+   parameter has no affect when fadump is not active. Multiple
+   instances of 'fadump_extra_args=' can be passed. This provision
+   can be used to reduce memory consumption during dump capture by
+   disabling unwarranted resources/subsystems like CPUs, NUMA
+   and such. Value with spaces can be passed as
+   'fadump_extra_args=""parameter="value with spaces"""'
+4. Optionally, user can also set 'crashkernel=' kernel cmdline
to specify size of the memory to reserve for boot memory dump
preservation.
 
@@ -172,6 +184,12 @@ NOTE: 1. 'fadump_reserve_mem=' parameter has been 
deprecated. Instead
   2. If firmware-assisted dump fails to reserve memory then it
  will fallback to existing kdump mechanism if 'crashkernel='
  option is set at kernel cmdline.
+  3. Special parameters like '--' passed inside fadump_extra_args are also
+ just left in-place. So, the user is advised to consider this while
+ specifying such parameters. It may be required to quote the argument
+ to fadump_extra_args when the bootloader uses double-quotes as
+ argument delimiter as well. eg
+append = " fadump_extra_args=\"nr_cpus=1 numa=off 
udev.children-max=2\""
 
 Sysfs/debugfs files:
 
-- 
2.10.2



[PATCH v7 1/4] powerpc/fadump: reduce memory consumption for capture kernel

2017-08-17 Thread Michal Suchanek
From: Hari Bathini 

With fadump (dump capture) kernel booting like a regular kernel, it needs
almost the same amount of memory to boot as the production kernel, which is
unwarranted for a dump capture kernel. But with no option to disable some
of the unnecessary subsystems in fadump kernel, that much memory is wasted
on fadump, depriving the production kernel of that memory.

Introduce kernel parameter 'fadump_extra_args=' that would take regular
parameters as a space separated quoted string, to be enforced when fadump
is active. This 'fadump_extra_args=' parameter can be leveraged to pass
parameters like nr_cpus=1, cgroup_disable=memory and numa=off, to disable
unwarranted resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed early
in the boot process to put the subsequent fadump messages in context.

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
Signed-off-by: Michal Suchanek 
---
Changes from v6:
Correct and simplify quote handling. Ideally I would like to extend
parse_args to give the length of the original quoted value to callback.
However, parse_args removes at most one doubel-quote from the start and
one from the end so that is easy to detect. Otherwise all other users
will have to be updated to trash the new argument.
---
 arch/powerpc/include/asm/fadump.h |   2 +
 arch/powerpc/kernel/fadump.c  | 109 --
 arch/powerpc/kernel/prom.c|   7 +++
 3 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index ce88bbe1d809..98ae00943fb3 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -208,11 +208,13 @@ extern int early_init_dt_scan_fw_dump(unsigned long node,
const char *uname, int depth, void *data);
 extern int fadump_reserve_mem(void);
 extern int setup_fadump(void);
+extern void enforce_fadump_extra_args(char *cmdline);
 extern int is_fadump_active(void);
 extern void crash_fadump(struct pt_regs *, const char *);
 extern void fadump_cleanup(void);
 
 #else  /* CONFIG_FA_DUMP */
+static inline void enforce_fadump_extra_args(char *cmdline) { }
 static inline int is_fadump_active(void) { return 0; }
 static inline void crash_fadump(struct pt_regs *regs, const char *str) { }
 #endif
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index dc0c49cfd90a..a1614d9b8a21 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node,
 * dump data waiting for us.
 */
fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL);
-   if (fdm_active)
+   if (fdm_active) {
+   pr_info("Firmware-assisted dump is active.\n");
fw_dump.dump_active = 1;
+   }
 
/* Get the sizes required to store dump data for the firmware provided
 * dump sections.
@@ -332,8 +334,11 @@ int __init fadump_reserve_mem(void)
 {
unsigned long base, size, memory_boundary;
 
-   if (!fw_dump.fadump_enabled)
+   if (!fw_dump.fadump_enabled) {
+   if (fw_dump.dump_active)
+   pr_warn("Firmware-assisted dump was active but kernel 
booted with fadump disabled!\n");
return 0;
+   }
 
if (!fw_dump.fadump_supported) {
printk(KERN_INFO "Firmware-assisted dump is not supported on"
@@ -373,7 +378,6 @@ int __init fadump_reserve_mem(void)
memory_boundary = memblock_end_of_DRAM();
 
if (fw_dump.dump_active) {
-   printk(KERN_INFO "Firmware-assisted dump is active.\n");
/*
 * If last boot has crashed then reserve all the memory
 * above boot_memory_size so that we don't touch it until
@@ -460,6 +464,105 @@ static int __init early_fadump_reserve_mem(char *p)
 }
 early_param("fadump_reserve_mem", early_fadump_reserve_mem);
 
+#define FADUMP_EXTRA_ARGS_PARAM"fadump_extra_args="
+#define FADUMP_EXTRA_ARGS_LEN  (strlen(FADUMP_EXTRA_ARGS_PARAM) - 1)
+
+struct param_info {
+   char*cmdline;
+   char*tmp_cmdline;
+   int  shortening;
+};
+
+static void __init fadump_update_params(struct param_info *param_info,
+   char *param, char *val)
+{
+   ptrdiff_t param_offset = param - param_info->tmp_cmdline;
+   size_t vallen = val ? strlen(val) : 0;
+   char *tgt = param_info->cmdline + param_offset +
+   FADUMP_EXTRA_ARGS_LEN - param_info->shortening;
+   int shortening = 0;
+
+   if (!val)
+   return;
+
+   /* remove '=' */
+   *tgt++ = ' ';
+
+   /* next_arg removes one leading and one trailing '"' */
+   if 

Re: [PATCH] tpm: vtpm: constify vio_device_id

2017-08-17 Thread Jason Gunthorpe
On Thu, Aug 17, 2017 at 11:04:21PM +0530, Arvind Yadav wrote:
> vio_device_id are not supposed to change at runtime. All functions
> working with vio_device_id provided by  work with
> const vio_device_id. So mark the non-const structs as const.
> 
> Signed-off-by: Arvind Yadav 

Reviewed-by: Jason Gunthorpe 

>  drivers/char/tpm/tpm_ibmvtpm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
> index f01d083..d2ce46b 100644
> +++ b/drivers/char/tpm/tpm_ibmvtpm.c
> @@ -32,7 +32,7 @@
>  
>  static const char tpm_ibmvtpm_driver_name[] = "tpm_ibmvtpm";
>  
> -static struct vio_device_id tpm_ibmvtpm_device_table[] = {
> +static const struct vio_device_id tpm_ibmvtpm_device_table[] = {
>   { "IBM,vtpm", "IBM,vtpm"},
>   { "", "" }
>  };


Re: [RFC Part1 PATCH v3 12/17] x86/mm: DMA support for SEV memory encryption

2017-08-17 Thread Tom Lendacky



On 8/6/2017 10:48 PM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:52PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

DMA access to memory mapped as encrypted while SEV is active can not be
encrypted during device write or decrypted during device read.


Yeah, definitely rewrite that sentence.


Heh, yup.




In order
for DMA to properly work when SEV is active, the SWIOTLB bounce buffers
must be used.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/mm/mem_encrypt.c | 86 +++
  lib/swiotlb.c |  5 +--
  2 files changed, 89 insertions(+), 2 deletions


...


@@ -202,6 +280,14 @@ void __init mem_encrypt_init(void)
/* Call into SWIOTLB to update the SWIOTLB DMA buffers */
swiotlb_update_mem_attributes();
  
+	/*

+* With SEV, DMA operations cannot use encryption. New DMA ops
+* are required in order to mark the DMA areas as decrypted or
+* to use bounce buffers.
+*/
+   if (sev_active())
+   dma_ops = _dma_ops;


Well, we do differentiate between SME and SEV and the check is
sev_active but the ops are called sme_dma_ops. Call them sev_dma_ops
instead for less confusion.


Yup, will do.

Thanks,
Tom





Re: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-17 Thread Alex Williamson
On Thu, 17 Aug 2017 10:56:35 +
David Laight  wrote:

> From: Alex Williamson
> > Sent: 16 August 2017 17:56  
> ...
> > Firmware pissing match...  Processors running with 8k or less page size
> > fall within the recommendations of the PCI spec for register alignment
> > of MMIO regions of the device and this whole problem becomes less of an
> > issue.  
> 
> Actually if qemu is causing the MSI-X table accesses to fault, why doesn't
> it just lie to the guest about the physical address of the MSI-X table?
> Then mmio access to anything in the same physical page will just work.

That's an interesting idea, but now you need to add a BAR for the
virtualized vector table, but you'll also need to support extending a
BAR because there won't necessarily be a BAR available to add.  Of
course PCI requires natural alignment of BARs, thus an extra few bytes
on the end doubles the BAR size.  So also hope that if we need to
extend a BAR that there's a relatively small one available.  In either
case you're changing the layout of the device from what the driver might
expect.  We try pretty hard with device assignment to leave things in
the same place as they appear on bare metal, perhaps removing things,
but not actually moving things.  It might work in the majority of
cases, but it seems a bit precarious overall.  Thanks,

Alex


Re: [RFC Part1 PATCH v3 11/17] x86/mm, resource: Use PAGE_KERNEL protection for ioremap of memory pages

2017-08-17 Thread Tom Lendacky

On 8/1/2017 11:02 PM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:51PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

In order for memory pages to be properly mapped when SEV is active, we
need to use the PAGE_KERNEL protection attribute as the base protection.
This will insure that memory mapping of, e.g. ACPI tables, receives the
proper mapping attributes.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/mm/ioremap.c  | 28 
  include/linux/ioport.h |  3 +++
  kernel/resource.c  | 17 +
  3 files changed, 48 insertions(+)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index c0be7cf..7b27332 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -69,6 +69,26 @@ static int __ioremap_check_ram(unsigned long start_pfn, 
unsigned long nr_pages,
return 0;
  }
  
+static int __ioremap_res_desc_other(struct resource *res, void *arg)

+{
+   return (res->desc != IORES_DESC_NONE);
+}
+
+/*
+ * This function returns true if the target memory is marked as
+ * IORESOURCE_MEM and IORESOURCE_BUSY and described as other than
+ * IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
+ */
+static bool __ioremap_check_if_mem(resource_size_t addr, unsigned long size)
+{
+   u64 start, end;
+
+   start = (u64)addr;
+   end = start + size - 1;
+
+   return (walk_mem_res(start, end, NULL, __ioremap_res_desc_other) == 1);
+}
+
  /*
   * Remap an arbitrary physical address space into the kernel virtual
   * address space. It transparently creates kernel huge I/O mapping when
@@ -146,7 +166,15 @@ static void __iomem *__ioremap_caller(resource_size_t 
phys_addr,
pcm = new_pcm;
}
  
+	/*

+* If the page being mapped is in memory and SEV is active then
+* make sure the memory encryption attribute is enabled in the
+* resulting mapping.
+*/
prot = PAGE_KERNEL_IO;
+   if (sev_active() && __ioremap_check_if_mem(phys_addr, size))
+   prot = pgprot_encrypted(prot);


Hmm, so this function already does walk_system_ram_range() a bit
earlier and now on SEV systems we're going to do it again. Can we make
walk_system_ram_range() return a distinct value for SEV systems and act
accordingly in __ioremap_caller() instead of repeating the operation?

It looks to me like we could...


Let me look into this.  I can probably come up with something that does
the walk once.

Thanks,
Tom





Re: WARNING: CPU: 15 PID: 0 at block/blk-mq.c:1111 __blk_mq_run_hw_queue+0x1d8/0x1f0

2017-08-17 Thread Brian King
On 08/17/2017 10:32 AM, Bart Van Assche wrote:
> On Wed, 2017-08-16 at 15:10 -0500, Brian King wrote:
>> On 08/16/2017 01:15 PM, Bart Van Assche wrote:
>>> On Wed, 2017-08-16 at 23:37 +0530, Abdul Haleem wrote:
 Linux-next booted with the below warnings on powerpc

 [ ... ]

 boot warnings:
 --
 kvm: exiting hardware virtualization
 [ cut here ]
 WARNING: CPU: 15 PID: 0 at block/blk-mq.c: __blk_mq_run_hw_queue
 +0x1d8/0x1f0
 Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
 Call Trace:
 [c0037990] [c088f7b0] __blk_mq_delay_run_hw_queue
 +0x1f0/0x210
 [c00379d0] [c088fcb8] blk_mq_start_hw_queue+0x58/0x80
 [c00379f0] [c088fd40] blk_mq_start_hw_queues+0x60/0xb0
 [c0037a30] [c0ae2b54] scsi_kick_queue+0x34/0xa0
 [c0037a50] [c0ae2f70] scsi_run_queue+0x3b0/0x660
 [c0037ac0] [c0ae7ed4] scsi_run_host_queues+0x64/0xc0
 [c0037b00] [c0ae7f64] scsi_unblock_requests+0x34/0x60
 [c0037b20] [c0b14998] ipr_ioa_bringdown_done+0xf8/0x3a0
 [c0037bc0] [c0b12528] ipr_reset_ioa_job+0xd8/0x170
 [c0037c00] [c0b18790] ipr_reset_timer_done+0x110/0x160
 [c0037c50] [c024db50] call_timer_fn+0xa0/0x3a0
 [c0037ce0] [c024e058] expire_timers+0x1b8/0x350
 [c0037d50] [c024e2f0] run_timer_softirq+0x100/0x3e0
 [c0037df0] [c0162edc] __do_softirq+0x20c/0x620
 [c0037ee0] [c0163a80] irq_exit+0x230/0x290
 [c0037f10] [c001d770] __do_irq+0x170/0x410
 [c0037f90] [c003ea20] call_do_irq+0x14/0x24
 [c007f84e3a70] [c001dae0] do_IRQ+0xd0/0x190
 [c007f84e3ac0] [c0008c58] hardware_interrupt_common
 +0x158/0x160
>>>
>>> Hello Brian,
>>>
>>> In the MAINTAINERS file I found the following:
>>>
>>> IBM Power Linux RAID adapter
>>> M:  Brian King 
>>> S:  Supported
>>> F:  drivers/scsi/ipr.*
>>>
>>> Is that information up-to-date? Do you agree that the above message 
>>> indicates
>>> a bug in the ipr driver?
>>
>> Yes. Can you try with this patch that is in 4.13/scsi-fixes:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.13/scsi-fixes=b0e17a9b0df29590c45dfb296f541270a5941f41
> 
> Hello Brian,
> 
> Sorry but I don't have access to a setup on which I can test the ipr driver 
> ...

Understood. That request was intended for Abdul, who reported the issue.
I should have just cc'ed you...

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center



Re: [RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

2017-08-17 Thread Tom Lendacky

On 8/17/2017 1:55 PM, Tom Lendacky wrote:

On 7/28/2017 10:23 AM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:49PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

The walk_iomem_res_desc(), walk_system_ram_res() and 
walk_system_ram_range()

functions each have much of the same code.  Create a new function that
consolidates the common code from these functions in one place to reduce
the amount of duplicated code.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  kernel/resource.c | 53 
++---

  1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..7b20b3e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource 
*res, unsigned long desc,

  res->start = p->start;
  if (res->end > p->end)
  res->end = p->end;
+res->desc = p->desc;
  return 0;


I must be going blind: where are we using that res->desc?


I think that was left-over from the initial consolidation work I was
doing.  I'll remove it.


I spoke too soon...  I use it in a later patch as part of a callback.
But instead of putting it here, I'll add it to the patch that actually
needs it.

Thanks,
Tom





+static int __walk_iomem_res_desc(struct resource *res, unsigned long 
desc,

+ bool first_level_children_only,


Btw, that variable name is insanely long.


I know, but I'm maintaining consistency with the name that was already
present vs. changing it.



The rest looks ok to me, thanks for the cleanup!


Thanks,
Tom





Re: [RFC Part1 PATCH v3 09/17] resource: Consolidate resource walking code

2017-08-17 Thread Tom Lendacky

On 7/28/2017 10:23 AM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:49PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

The walk_iomem_res_desc(), walk_system_ram_res() and walk_system_ram_range()
functions each have much of the same code.  Create a new function that
consolidates the common code from these functions in one place to reduce
the amount of duplicated code.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  kernel/resource.c | 53 ++---
  1 file changed, 26 insertions(+), 27 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 9b5f044..7b20b3e 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -397,9 +397,30 @@ static int find_next_iomem_res(struct resource *res, 
unsigned long desc,
res->start = p->start;
if (res->end > p->end)
res->end = p->end;
+   res->desc = p->desc;
return 0;


I must be going blind: where are we using that res->desc?


I think that was left-over from the initial consolidation work I was
doing.  I'll remove it.




+static int __walk_iomem_res_desc(struct resource *res, unsigned long desc,
+bool first_level_children_only,


Btw, that variable name is insanely long.


I know, but I'm maintaining consistency with the name that was already
present vs. changing it.



The rest looks ok to me, thanks for the cleanup!


Thanks,
Tom





Re: [RFC Part1 PATCH v3 08/17] x86/efi: Access EFI data as encrypted when SEV is active

2017-08-17 Thread Tom Lendacky

On 7/28/2017 5:31 AM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:48PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

EFI data is encrypted when the kernel is run under SEV. Update the
page table references to be sure the EFI memory areas are accessed
encrypted.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/platform/efi/efi_64.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 12e8388..1ecb3f6 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -32,6 +32,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -369,7 +370,10 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, 
unsigned num_pages)
 * as trim_bios_range() will reserve the first page and isolate it away
 * from memory allocators anyway.
 */
-   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, _PAGE_RW)) {
+   pf = _PAGE_RW;
+   if (sev_active())
+   pf |= _PAGE_ENC;


\n here


+   if (kernel_map_pages_in_pgd(pgd, 0x0, 0x0, 1, pf)) {
pr_err("Failed to create 1:1 mapping for the first page!\n");
return 1;
}
@@ -412,6 +416,9 @@ static void __init __map_region(efi_memory_desc_t *md, u64 
va)
if (!(md->attribute & EFI_MEMORY_WB))
flags |= _PAGE_PCD;
  
+	if (sev_active())

+   flags |= _PAGE_ENC;
+
pfn = md->phys_addr >> PAGE_SHIFT;
if (kernel_map_pages_in_pgd(pgd, pfn, va, md->num_pages, flags))
pr_warn("Error mapping PA 0x%llx -> VA 0x%llx!\n",
@@ -511,6 +518,9 @@ static int __init efi_update_mappings(efi_memory_desc_t 
*md, unsigned long pf)
pgd_t *pgd = efi_pgd;
int err1, err2;
  
+	if (sev_active())

+   pf |= _PAGE_ENC;


Move this assignment to the caller efi_update_mem_attr() where pf is being
set...


Will do.




+
/* Update the 1:1 mapping */
pfn = md->phys_addr >> PAGE_SHIFT;
err1 = kernel_map_pages_in_pgd(pgd, pfn, md->phys_addr, md->num_pages, 
pf);
@@ -589,6 +599,9 @@ void __init efi_runtime_update_mappings(void)
(md->type != EFI_RUNTIME_SERVICES_CODE))
pf |= _PAGE_RW;
  
+		if (sev_active())

+   pf |= _PAGE_ENC;


... just like here.


Yup.

Thanks,
Tom




+
efi_update_mappings(md, pf);


In general, I'm not totally excited about that sprinkling of if
(sev_active())... :-\



Re: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

2017-08-17 Thread Tom Lendacky

On 7/28/2017 3:47 AM, David Laight wrote:

From: Borislav Petkov

Sent: 27 July 2017 15:59
On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

The current code checks only for sme_active() when determining whether
to perform the encryption attribute change.  Include sev_active() in this
check so that memory attribute changes can occur under SME and SEV.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/mm/pageattr.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dfb7d65..b726b23 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int 
numpages, bool enc)
unsigned long start;
int ret;

-   /* Nothing to do if the SME is not active */
-   if (!sme_active())
+   /* Nothing to do if SME and SEV are not active */
+   if (!sme_active() && !sev_active())


This is the second place which does

if (!SME && !SEV)

I wonder if, instead of sprinking those, we should have a

if (mem_enc_active())

or so which unifies all those memory encryption logic tests and makes
the code more straightforward for readers who don't have to pay
attention to SME vs SEV ...


If any of the code paths are 'hot' it would make sense to be checking
a single memory location.


The function would check a single variable/memory location and making it
an inline function would accomplish that.

Thanks,
Tom



David



Re: [PATCH v6 1/2] powerpc/fadump: reduce memory consumption for capture kernel

2017-08-17 Thread Hari Bathini

Hello Michal,


Thanks for the review..


On Tuesday 15 August 2017 04:26 PM, Michal Suchánek wrote:

Hello,

sorry about the late reply.

Looks like I had too much faith in the parse_args sanity.

Looking closely the parsing happens in next_arg and only outermost
quotes are removed.

So presumably >>foo="bar baz"<< gives >>bar baz<< as value and
  >>foo=bar" baz"<< gives >>bar" baz<< as value.


Yeah, with no such thing as nested quotes, it can get tricky if
quoted params are put inside fadump_extra_args= (fadump_extra_args="a "b 
c" d e" f g)



And presumably you can do fadump_extra_args="par1=val1 par2=val2
pa3=val3" and fadump_extra_args=""par="value with
spaces""" (each parameter which needs space in separate
fadump_extra_args parameter) provided you remove the outermost quotes
in the fadump_extra_args handler as well.

Wanted to run some tests but did not get around to do it yet.

On Sat, 29 Jul 2017 02:27:22 +0530
Hari Bathini  wrote:


With fadump (dump capture) kernel booting like a regular kernel, it
almost needs the same amount of memory to boot as the production
kernel, which is unwarranted for a dump capture kernel. But with no
option to disable some of the unnecessary subsystems in fadump
kernel, that much memory is wasted on fadump, depriving the
production kernel of that memory.

Introduce kernel parameter 'fadump_extra_args=' that would take
regular parameters as a space separated quoted string, to be enforced
when fadump is active. This 'fadump_extra_args=' parameter can be
leveraged to pass parameters like nr_cpus=1, cgroup_disable=memory
and numa=off, to disable unwarranted resources/subsystems.

Also, ensure the log "Firmware-assisted dump is active" is printed
early in the boot process to put the subsequent fadump messages in
context.

Suggested-by: Michael Ellerman 
Signed-off-by: Hari Bathini 
---

Changes from v5:
* Using 'fadump_extra_args=' instead of 'fadump_append=' to pass
   additional parameters, to be enforced when fadump is active.
* Using space-separated quoted list as syntax for 'fadump_extra_args='
   parameter.


  arch/powerpc/include/asm/fadump.h |2 +
  arch/powerpc/kernel/fadump.c  |  125
-
arch/powerpc/kernel/prom.c|7 ++ 3 files changed, 131
insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h
b/arch/powerpc/include/asm/fadump.h index ce88bbe..98ae009 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -208,11 +208,13 @@ extern int early_init_dt_scan_fw_dump(unsigned
long node, const char *uname, int depth, void *data);
  extern int fadump_reserve_mem(void);
  extern int setup_fadump(void);
+extern void enforce_fadump_extra_args(char *cmdline);
  extern int is_fadump_active(void);
  extern void crash_fadump(struct pt_regs *, const char *);
  extern void fadump_cleanup(void);
  
  #else	/* CONFIG_FA_DUMP */

+static inline void enforce_fadump_extra_args(char *cmdline) { }
  static inline int is_fadump_active(void) { return 0; }
  static inline void crash_fadump(struct pt_regs *regs, const char
*str) { } #endif
diff --git a/arch/powerpc/kernel/fadump.c
b/arch/powerpc/kernel/fadump.c index dc0c49c..d8cb829 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -78,8 +78,10 @@ int __init early_init_dt_scan_fw_dump(unsigned
long node,
 * dump data waiting for us.
 */
fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump",
NULL);
-   if (fdm_active)
+   if (fdm_active) {
+   pr_info("Firmware-assisted dump is active.\n");
fw_dump.dump_active = 1;
+   }
  
  	/* Get the sizes required to store dump data for the

firmware provided
 * dump sections.
@@ -332,8 +334,11 @@ int __init fadump_reserve_mem(void)
  {
unsigned long base, size, memory_boundary;
  
-	if (!fw_dump.fadump_enabled)

+   if (!fw_dump.fadump_enabled) {
+   if (fw_dump.dump_active)
+   pr_warn("Firmware-assisted dump was active
but kernel booted with fadump disabled!\n"); return 0;
+   }
  
  	if (!fw_dump.fadump_supported) {

printk(KERN_INFO "Firmware-assisted dump is not
supported on" @@ -373,7 +378,6 @@ int __init fadump_reserve_mem(void)
memory_boundary = memblock_end_of_DRAM();
  
  	if (fw_dump.dump_active) {

-   printk(KERN_INFO "Firmware-assisted dump is
active.\n"); /*
 * If last boot has crashed then reserve all the
memory
 * above boot_memory_size so that we don't touch it
until @@ -460,6 +464,121 @@ static int __init
early_fadump_reserve_mem(char *p) }
  early_param("fadump_reserve_mem", early_fadump_reserve_mem);
  
+#define FADUMP_EXTRA_ARGS_PARAM		"fadump_extra_args="

+#define INIT_ARGS_START"-- "
+#define INIT_ARGS_START_LEN

Re: [RFC Part1 PATCH v3 07/17] x86/mm: Include SEV for encryption memory attribute changes

2017-08-17 Thread Tom Lendacky

On 7/27/2017 9:58 AM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:47PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

The current code checks only for sme_active() when determining whether
to perform the encryption attribute change.  Include sev_active() in this
check so that memory attribute changes can occur under SME and SEV.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/mm/pageattr.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dfb7d65..b726b23 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1781,8 +1781,8 @@ static int __set_memory_enc_dec(unsigned long addr, int 
numpages, bool enc)
unsigned long start;
int ret;
  
-	/* Nothing to do if the SME is not active */

-   if (!sme_active())
+   /* Nothing to do if SME and SEV are not active */
+   if (!sme_active() && !sev_active())


This is the second place which does

if (!SME && !SEV)

I wonder if, instead of sprinking those, we should have a

if (mem_enc_active())

or so which unifies all those memory encryption logic tests and makes
the code more straightforward for readers who don't have to pay
attention to SME vs SEV ...


Yup, that will make things look cleaner and easier to understand.

Thanks,
Tom



Just a thought.



Re: [RFC Part1 PATCH v3 06/17] x86/mm: Use encrypted access of boot related data with SEV

2017-08-17 Thread Tom Lendacky

On 7/27/2017 8:31 AM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:46PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

When Secure Encrypted Virtualization (SEV) is active, boot data (such as
EFI related data, setup data) is encrypted and needs to be accessed as
such when mapped. Update the architecture override in early_memremap to
keep the encryption attribute when mapping this data.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/mm/ioremap.c | 44 
  1 file changed, 32 insertions(+), 12 deletions(-)


...


@@ -590,10 +598,15 @@ bool arch_memremap_can_ram_remap(resource_size_t 
phys_addr, unsigned long size,
if (flags & MEMREMAP_DEC)
return false;
  
-	if (memremap_is_setup_data(phys_addr, size) ||

-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   return false;
+   if (sme_active()) {
+   if (memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size) ||
+   memremap_should_map_decrypted(phys_addr, size))
+   return false;
+   } else if (sev_active()) {
+   if (memremap_should_map_decrypted(phys_addr, size))
+   return false;
+   }
  
  	return true;

  }


I guess this function's hind part can be simplified to:

 if (sme_active()) {
 if (memremap_is_setup_data(phys_addr, size) ||
 memremap_is_efi_data(phys_addr, size))
 return false;
 }

 return ! memremap_should_map_decrypted(phys_addr, size);
}



Ok, definitely cleaner.


@@ -608,15 +621,22 @@ pgprot_t __init 
early_memremap_pgprot_adjust(resource_size_t phys_addr,
 unsigned long size,
 pgprot_t prot)


And this one in a similar manner...


  {
-   if (!sme_active())
+   if (!sme_active() && !sev_active())
return prot;


... and you don't need that check...


-   if (early_memremap_is_setup_data(phys_addr, size) ||
-   memremap_is_efi_data(phys_addr, size) ||
-   memremap_should_map_decrypted(phys_addr, size))
-   prot = pgprot_decrypted(prot);
-   else
-   prot = pgprot_encrypted(prot);
+   if (sme_active()) {


... if you're going to do it here too.


+   if (early_memremap_is_setup_data(phys_addr, size) ||
+   memremap_is_efi_data(phys_addr, size) ||
+   memremap_should_map_decrypted(phys_addr, size))
+   prot = pgprot_decrypted(prot);
+   else
+   prot = pgprot_encrypted(prot);
+   } else if (sev_active()) {


And here.


Will do.

Thanks,
Tom




+   if (memremap_should_map_decrypted(phys_addr, size))
+   prot = pgprot_decrypted(prot);
+   else
+   prot = pgprot_encrypted(prot);
+   }




Applied "ASoC: qcom: make snd_pcm_hardware const" to the asoc tree

2017-08-17 Thread Mark Brown
The patch

   ASoC: qcom: make snd_pcm_hardware const

has been applied to the asoc tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 193e25e1fe822f30a02aa9bf99880345eb242d35 Mon Sep 17 00:00:00 2001
From: Bhumika Goyal 
Date: Thu, 17 Aug 2017 15:46:12 +0530
Subject: [PATCH] ASoC: qcom: make snd_pcm_hardware const

Make this const as it is either passed as the 2nd argument
to the function snd_soc_set_runtime_hwparams, which is const or used
in a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
Signed-off-by: Mark Brown 
---
 sound/soc/qcom/lpass-platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/qcom/lpass-platform.c b/sound/soc/qcom/lpass-platform.c
index fb3576af7911..e1945e1772cd 100644
--- a/sound/soc/qcom/lpass-platform.c
+++ b/sound/soc/qcom/lpass-platform.c
@@ -32,7 +32,7 @@ struct lpass_pcm_data {
 #define LPASS_PLATFORM_BUFFER_SIZE (16 * 1024)
 #define LPASS_PLATFORM_PERIODS 2
 
-static struct snd_pcm_hardware lpass_platform_pcm_hardware = {
+static const struct snd_pcm_hardware lpass_platform_pcm_hardware = {
.info   =   SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_INTERLEAVED |
-- 
2.13.3



Applied "ASoC: sh: make snd_pcm_hardware const" to the asoc tree

2017-08-17 Thread Mark Brown
The patch

   ASoC: sh: make snd_pcm_hardware const

has been applied to the asoc tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 5c2e035e694c8b3f0a2ddfefa102ad9d9d42 Mon Sep 17 00:00:00 2001
From: Bhumika Goyal 
Date: Thu, 17 Aug 2017 15:46:11 +0530
Subject: [PATCH] ASoC: sh: make snd_pcm_hardware const

Make these const as they are only passed as the 2nd argument to the
function snd_soc_set_runtime_hwparams, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
Signed-off-by: Mark Brown 
---
 sound/soc/sh/dma-sh7760.c | 2 +-
 sound/soc/sh/fsi.c| 2 +-
 sound/soc/sh/rcar/core.c  | 2 +-
 sound/soc/sh/siu_dai.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/sh/dma-sh7760.c b/sound/soc/sh/dma-sh7760.c
index 35788a6db963..1e7d417b53ef 100644
--- a/sound/soc/sh/dma-sh7760.c
+++ b/sound/soc/sh/dma-sh7760.c
@@ -89,7 +89,7 @@ struct camelot_pcm {
 #define DMABRG_PREALLOC_BUFFER 32 * 1024
 #define DMABRG_PREALLOC_BUFFER_MAX 32 * 1024
 
-static struct snd_pcm_hardware camelot_pcm_hardware = {
+static const struct snd_pcm_hardware camelot_pcm_hardware = {
.info = (SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
diff --git a/sound/soc/sh/fsi.c b/sound/soc/sh/fsi.c
index 5d7d9fe8bc98..39ec772912d5 100644
--- a/sound/soc/sh/fsi.c
+++ b/sound/soc/sh/fsi.c
@@ -1710,7 +1710,7 @@ static const struct snd_soc_dai_ops fsi_dai_ops = {
  * pcm ops
  */
 
-static struct snd_pcm_hardware fsi_pcm_hardware = {
+static const struct snd_pcm_hardware fsi_pcm_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED  |
SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID,
diff --git a/sound/soc/sh/rcar/core.c b/sound/soc/sh/rcar/core.c
index df39831b8e8f..107133297e8d 100644
--- a/sound/soc/sh/rcar/core.c
+++ b/sound/soc/sh/rcar/core.c
@@ -843,7 +843,7 @@ static int rsnd_soc_hw_rule_channels(struct 
snd_pcm_hw_params *params,
ir, );
 }
 
-static struct snd_pcm_hardware rsnd_pcm_hardware = {
+static const struct snd_pcm_hardware rsnd_pcm_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED  |
SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID,
diff --git a/sound/soc/sh/siu_dai.c b/sound/soc/sh/siu_dai.c
index 4a22aadac294..160502947da2 100644
--- a/sound/soc/sh/siu_dai.c
+++ b/sound/soc/sh/siu_dai.c
@@ -333,7 +333,7 @@ static void siu_dai_spbstop(struct siu_port *port_info)
 /* API functions   */
 
 /* Playback and capture hardware properties are identical */
-static struct snd_pcm_hardware siu_dai_pcm_hw = {
+static const struct snd_pcm_hardware siu_dai_pcm_hw = {
.info   = SNDRV_PCM_INFO_INTERLEAVED,
.formats= SNDRV_PCM_FMTBIT_S16,
.rates  = SNDRV_PCM_RATE_8000_48000,
-- 
2.13.3



Applied "ASoC: Intel: Skylake: make snd_pcm_hardware const" to the asoc tree

2017-08-17 Thread Mark Brown
The patch

   ASoC: Intel: Skylake: make snd_pcm_hardware const

has been applied to the asoc tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 8df397ff0e281d5324d4b7a7e9fa56c4188e0a66 Mon Sep 17 00:00:00 2001
From: Bhumika Goyal 
Date: Thu, 17 Aug 2017 15:46:09 +0530
Subject: [PATCH] ASoC: Intel: Skylake: make snd_pcm_hardware const

Make this const as it is only passed as the 2nd argument to the
function snd_soc_set_runtime_hwparams, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
Signed-off-by: Mark Brown 
---
 sound/soc/intel/skylake/skl-pcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/intel/skylake/skl-pcm.c 
b/sound/soc/intel/skylake/skl-pcm.c
index e98d8252a026..f6c9adb6faa2 100644
--- a/sound/soc/intel/skylake/skl-pcm.c
+++ b/sound/soc/intel/skylake/skl-pcm.c
@@ -33,7 +33,7 @@
 #define HDA_STEREO 2
 #define HDA_QUAD 4
 
-static struct snd_pcm_hardware azx_pcm_hw = {
+static const struct snd_pcm_hardware azx_pcm_hw = {
.info = (SNDRV_PCM_INFO_MMAP |
 SNDRV_PCM_INFO_INTERLEAVED |
 SNDRV_PCM_INFO_BLOCK_TRANSFER |
-- 
2.13.3



Applied "ASoC: Intel: Atom: make snd_pcm_hardware const" to the asoc tree

2017-08-17 Thread Mark Brown
The patch

   ASoC: Intel: Atom: make snd_pcm_hardware const

has been applied to the asoc tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From a7468e478a95db31c0f64e8497fadd9df5c49789 Mon Sep 17 00:00:00 2001
From: Bhumika Goyal 
Date: Thu, 17 Aug 2017 15:46:08 +0530
Subject: [PATCH] ASoC: Intel: Atom: make snd_pcm_hardware const

Make this const as it is only used in a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
Signed-off-by: Mark Brown 
---
 sound/soc/intel/atom/sst-mfld-platform-pcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/intel/atom/sst-mfld-platform-pcm.c 
b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
index b272df5ce0e8..43e7fdd19f29 100644
--- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c
+++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
@@ -76,7 +76,7 @@ int sst_unregister_dsp(struct sst_device *dev)
 }
 EXPORT_SYMBOL_GPL(sst_unregister_dsp);
 
-static struct snd_pcm_hardware sst_platform_pcm_hw = {
+static const struct snd_pcm_hardware sst_platform_pcm_hw = {
.info = (SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_DOUBLE |
SNDRV_PCM_INFO_PAUSE |
-- 
2.13.3



Applied "ASoC: kirkwood: make snd_pcm_hardware const" to the asoc tree

2017-08-17 Thread Mark Brown
The patch

   ASoC: kirkwood: make snd_pcm_hardware const

has been applied to the asoc tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 636d7e289c0a59dd3febfe6a2e7e5bd56eaa91b8 Mon Sep 17 00:00:00 2001
From: Bhumika Goyal 
Date: Thu, 17 Aug 2017 15:46:10 +0530
Subject: [PATCH] ASoC: kirkwood: make snd_pcm_hardware const

Make this const as it is either passed as the 2nd argument
to the function snd_soc_set_runtime_hwparams, which is const or used in
a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
Signed-off-by: Mark Brown 
---
 sound/soc/kirkwood/kirkwood-dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/kirkwood/kirkwood-dma.c 
b/sound/soc/kirkwood/kirkwood-dma.c
index dafd22e874e9..cf23af159acf 100644
--- a/sound/soc/kirkwood/kirkwood-dma.c
+++ b/sound/soc/kirkwood/kirkwood-dma.c
@@ -27,7 +27,7 @@ static struct kirkwood_dma_data *kirkwood_priv(struct 
snd_pcm_substream *subs)
return snd_soc_dai_get_drvdata(soc_runtime->cpu_dai);
 }
 
-static struct snd_pcm_hardware kirkwood_dma_snd_hw = {
+static const struct snd_pcm_hardware kirkwood_dma_snd_hw = {
.info = SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID |
-- 
2.13.3



Applied "ASoC: fsl: make snd_pcm_hardware const" to the asoc tree

2017-08-17 Thread Mark Brown
The patch

   ASoC: fsl: make snd_pcm_hardware const

has been applied to the asoc tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From f77bb3b778e85d3477a9e3e0236545663f16793e Mon Sep 17 00:00:00 2001
From: Bhumika Goyal 
Date: Thu, 17 Aug 2017 15:46:07 +0530
Subject: [PATCH] ASoC: fsl: make snd_pcm_hardware const

Make these const as they are only passed as the 2nd argument to the
function snd_soc_set_runtime_hwparams, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/fsl_asrc_dma.c | 2 +-
 sound/soc/fsl/imx-pcm-fiq.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
index 2baf19608bd0..e1b97e59275a 100644
--- a/sound/soc/fsl/fsl_asrc_dma.c
+++ b/sound/soc/fsl/fsl_asrc_dma.c
@@ -20,7 +20,7 @@
 
 #define FSL_ASRC_DMABUF_SIZE   (256 * 1024)
 
-static struct snd_pcm_hardware snd_imx_hardware = {
+static const struct snd_pcm_hardware snd_imx_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_MMAP |
diff --git a/sound/soc/fsl/imx-pcm-fiq.c b/sound/soc/fsl/imx-pcm-fiq.c
index aef1f7819c40..4e5fefee111e 100644
--- a/sound/soc/fsl/imx-pcm-fiq.c
+++ b/sound/soc/fsl/imx-pcm-fiq.c
@@ -154,7 +154,7 @@ static snd_pcm_uframes_t snd_imx_pcm_pointer(struct 
snd_pcm_substream *substream
return bytes_to_frames(substream->runtime, iprtd->offset);
 }
 
-static struct snd_pcm_hardware snd_imx_hardware = {
+static const struct snd_pcm_hardware snd_imx_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_MMAP |
-- 
2.13.3



Re: [RFC v7 25/25] powerpc: Enable pkey subsystem

2017-08-17 Thread Ram Pai
On Thu, Aug 10, 2017 at 06:27:34PM -0300, Thiago Jung Bauermann wrote:
> 
> Ram Pai  writes:
> > --- a/arch/powerpc/include/asm/cputable.h
> > +++ b/arch/powerpc/include/asm/cputable.h
> > @@ -214,6 +214,7 @@ enum {
> >  #define CPU_FTR_DAWR   
> > LONG_ASM_CONST(0x0400)
> >  #define CPU_FTR_DABRX  
> > LONG_ASM_CONST(0x0800)
> >  #define CPU_FTR_PMAO_BUG   LONG_ASM_CONST(0x1000)
> > +#define CPU_FTR_PKEY   
> > LONG_ASM_CONST(0x2000)
> >  #define CPU_FTR_POWER9_DD1 LONG_ASM_CONST(0x4000)
> >
> >  #ifndef __ASSEMBLY__
> > @@ -452,7 +453,7 @@ enum {
> > CPU_FTR_DSCR | CPU_FTR_SAO  | CPU_FTR_ASYM_SMT | \
> > CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
> > CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | \
> > -   CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX)
> > +   CPU_FTR_VMX_COPY | CPU_FTR_HAS_PPR | CPU_FTR_DABRX | CPU_FTR_PKEY)
> 
> P7 supports protection keys for data access (AMR) but not for
> instruction access (IAMR), right? There's nothing in the code making
> this distinction, so either CPU_FTR_PKEY shouldn't be enabled in P7 or
> separate feature bits for AMR and IAMR should be used and checked before
> trying to access the IAMR.

did'nt David say P7 supports both? P6, i think, only support data.
my pkey tests have passed on p7.

> 
> >  #define CPU_FTRS_POWER8 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
> > CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | CPU_FTR_ARCH_206 |\
> > CPU_FTR_MMCRA | CPU_FTR_SMT | \
> > @@ -462,7 +463,7 @@ enum {
> > CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
> > CPU_FTR_ICSWX | CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
> > CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
> > -   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP)
> > +   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_PKEY)
> >  #define CPU_FTRS_POWER8E (CPU_FTRS_POWER8 | CPU_FTR_PMAO_BUG)
> >  #define CPU_FTRS_POWER8_DD1 (CPU_FTRS_POWER8 & ~CPU_FTR_DBELL)
> >  #define CPU_FTRS_POWER9 (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
> > @@ -474,7 +475,8 @@ enum {
> > CPU_FTR_STCX_CHECKS_ADDRESS | CPU_FTR_POPCNTB | CPU_FTR_POPCNTD | \
> > CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
> > CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_DAWR | \
> > -   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300)
> > +   CPU_FTR_ARCH_207S | CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | \
> > +   CPU_FTR_PKEY)
> >  #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
> >  (~CPU_FTR_SAO))
> >  #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \
> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
> > b/arch/powerpc/include/asm/mmu_context.h
> > index a1cfcca..acd59d8 100644
> > --- a/arch/powerpc/include/asm/mmu_context.h
> > +++ b/arch/powerpc/include/asm/mmu_context.h
> > @@ -188,6 +188,7 @@ static inline bool arch_vma_access_permitted(struct 
> > vm_area_struct *vma,
> >
> >  #define pkey_initialize()
> >  #define pkey_mm_init(mm)
> > +#define pkey_mmu_values(total_data, total_execute)
> >
> >  static inline int vma_pkey(struct vm_area_struct *vma)
> >  {
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index ba7bff6..e61ed6c 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -1,6 +1,8 @@
> >  #ifndef _ASM_PPC64_PKEYS_H
> >  #define _ASM_PPC64_PKEYS_H
> >
> > +#include 
> > +
> >  extern bool pkey_inited;
> >  extern int pkeys_total; /* total pkeys as per device tree */
> >  extern u32 initial_allocation_mask;/* bits set for reserved keys */
> > @@ -227,6 +229,24 @@ static inline void pkey_mm_init(struct mm_struct *mm)
> > mm->context.execute_only_pkey = -1;
> >  }
> >
> > +static inline void pkey_mmu_values(int total_data, int total_execute)
> > +{
> > +   /*
> > +* since any pkey can be used for data or execute, we
> > +* will  just  treat all keys as equal and track them
> > +* as one entity.
> > +*/
> > +   pkeys_total = total_data + total_execute;
> > +}
> 
> Right now this works because the firmware reports 0 execute keys in the
> device tree, but if (when?) it is fixed to report 32 execute keys as
> well as 32 data keys (which are the same keys), any place using
> pkeys_total expecting it to mean the number of keys that are available
> will be broken. This includes pkey_initialize and mm_pkey_is_allocated.

Good point. we should just ignore total_execute. It should
be the same value as total_data on the latest platforms.
On older platforms it will continue to be zero.

> 
> Perhaps pkeys_total should use total_data as the number of keys
> supported in the system, and total_execute just as a flag to say whether
> there's a IAMR? Or, since P8 and later 

[PATCH] hwrng: pseries: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/char/hw_random/pseries-rng.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/pseries-rng.c 
b/drivers/char/hw_random/pseries-rng.c
index d9f46b4..4e2a3f6 100644
--- a/drivers/char/hw_random/pseries-rng.c
+++ b/drivers/char/hw_random/pseries-rng.c
@@ -72,7 +72,7 @@ static int pseries_rng_remove(struct vio_dev *dev)
return 0;
 }
 
-static struct vio_device_id pseries_rng_driver_ids[] = {
+static const struct vio_device_id pseries_rng_driver_ids[] = {
{ "ibm,random-v1", "ibm,random"},
{ "", "" }
 };
-- 
2.7.4



[PATCH] tpm: vtpm: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/char/tpm/tpm_ibmvtpm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c
index f01d083..d2ce46b 100644
--- a/drivers/char/tpm/tpm_ibmvtpm.c
+++ b/drivers/char/tpm/tpm_ibmvtpm.c
@@ -32,7 +32,7 @@
 
 static const char tpm_ibmvtpm_driver_name[] = "tpm_ibmvtpm";
 
-static struct vio_device_id tpm_ibmvtpm_device_table[] = {
+static const struct vio_device_id tpm_ibmvtpm_device_table[] = {
{ "IBM,vtpm", "IBM,vtpm"},
{ "", "" }
 };
-- 
2.7.4



Re: [PATCH] net: ibm: ibmvnic: constify vio_device_id

2017-08-17 Thread David Miller
From: Arvind Yadav 
Date: Thu, 17 Aug 2017 18:52:54 +0530

> vio_device_id are not supposed to change at runtime. All functions
> working with vio_device_id provided by  work with
> const vio_device_id. So mark the non-const structs as const.
> 
> Signed-off-by: Arvind Yadav 

Applied.


Re: [PATCH] net: ibm: ibmveth: constify vio_device_id

2017-08-17 Thread David Miller
From: Arvind Yadav 
Date: Thu, 17 Aug 2017 18:52:53 +0530

> vio_device_id are not supposed to change at runtime. All functions
> working with vio_device_id provided by  work with
> const vio_device_id. So mark the non-const structs as const.
> 
> Signed-off-by: Arvind Yadav 

Applied.


Re: [RFC v7 24/25] powerpc: Deliver SEGV signal on pkey violation

2017-08-17 Thread Ram Pai
On Fri, Aug 11, 2017 at 08:26:30PM +1000, Michael Ellerman wrote:
> Thiago Jung Bauermann  writes:
> 
> > Ram Pai  writes:
> >
> >> The value of the AMR register at the time of exception
> >> is made available in gp_regs[PT_AMR] of the siginfo.
> >>
> >> The value of the pkey, whose protection got violated,
> >> is made available in si_pkey field of the siginfo structure.
> >
> > Should the IAMR also be made available?
> >
> > Also, should the AMR and IAMR be accesible to userspace (e.g., to GDB)
> > via ptrace and the core file?
> 
> Yes if they're part of the thread's context they should be accessible
> via ptrace and in core files.

ok. Some more code needed. :(

> 
> >> --- a/arch/powerpc/kernel/signal_32.c
> >> +++ b/arch/powerpc/kernel/signal_32.c
> >> @@ -500,6 +500,11 @@ static int save_user_regs(struct pt_regs *regs, 
> >> struct mcontext __user *frame,
> >>   (unsigned long) >tramp[2]);
> >>}
> >>
> >> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> >> +  if (__put_user(get_paca()->paca_amr, >mc_gregs[PT_AMR]))
> >> +  return 1;
> >> +#endif /*  CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> >> +
> >>return 0;
> >>  }
> >
> > frame->mc_gregs[PT_AMR] has 32 bits, but paca_amr has 64 bits. Does this
> > work as intended?

hmm..i think we should just disable pkey support for 32 bit apps, till
we figure out all the edge cases.

> 
> I don't understand why we are putting it in there at all?
> 
> Is there some special handling of the actual register on signals? I
> haven't seen it. In which case the process can get the value of AMR by
> reading the register. ??

The value of AMR register at the time of the key-exception may not be
the same when the signal handler is invoked. 

RP



Re: [RFC v7 09/25] powerpc: store and restore the pkey state across context switches

2017-08-17 Thread Ram Pai
On Fri, Aug 11, 2017 at 04:34:19PM +1000, Michael Ellerman wrote:
> Thiago Jung Bauermann  writes:
> 
> > Ram Pai  writes:
> >> --- a/arch/powerpc/kernel/process.c
> >> +++ b/arch/powerpc/kernel/process.c
> >> @@ -42,6 +42,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>
> >>  #include 
> >>  #include 
> >> @@ -1096,6 +1097,13 @@ static inline void save_sprs(struct thread_struct 
> >> *t)
> >>t->tar = mfspr(SPRN_TAR);
> >>}
> >>  #endif
> >> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> >> +  if (arch_pkeys_enabled()) {
> >> +  t->amr = mfspr(SPRN_AMR);
> >> +  t->iamr = mfspr(SPRN_IAMR);
> >> +  t->uamor = mfspr(SPRN_UAMOR);
> >> +  }
> >> +#endif
> >>  }
> >
> > Is it worth having a flag in thread_struct saying whether it has every
> > called pkey_alloc and only do the mfsprs if it did?

Yes. This will further optimize the code; a great thing!

> 
> Yes, in fact there's a programming note in the UAMOR section of the arch
> that says exactly that.
> 
> On the write side you have to be a bit more careful. You have to make
> sure you set the UAMOR to 0 when you're switching from a process that
> has used keys to one that isn't.

Currently we save and restore AMR/IAMR/UAMOR if the OS has enabled pkeys.
This means the UAMOR will get restored to 0 if the application has not
used any keys.

But if we do optimize the code further; as suggested by Thiago, we will
have to be careful with initializing UAMOR while switching back task
that has not used the keys yet.

RP

> 
> cheers

-- 
Ram Pai



Re: [RFC v7 02/25] powerpc: track allocation status of all pkeys

2017-08-17 Thread Ram Pai
On Fri, Aug 11, 2017 at 03:39:14PM +1000, Michael Ellerman wrote:
> Thiago Jung Bauermann  writes:
> 
> > Ram Pai  writes:
> >>  static inline void pkey_initialize(void)
> >>  {
> >> +  int os_reserved, i;
> >> +
> >>/* disable the pkey system till everything
> >> * is in place. A patch further down the
> >> * line will enable it.
> >> */
> >>pkey_inited = false;
> >> +
> >> +  /* Lets assume 32 keys */
> >> +  pkeys_total = 32;
> >> +
> >> +#ifdef CONFIG_PPC_4K_PAGES
> >> +  /*
> >> +   * the OS can manage only 8 pkeys
> >> +   * due to its inability to represent
> >> +   * them in the linux 4K-PTE.
> >> +   */
> >> +  os_reserved = pkeys_total-8;
> >> +#else
> >> +  os_reserved = 0;
> >> +#endif
> >> +  /*
> >> +   * Bits are in LE format.
> >> +   * NOTE: 1, 0 are reserved.
> >> +   * key 0 is the default key, which allows read/write/execute.
> >> +   * key 1 is recommended not to be used.
> >> +   * PowerISA(3.0) page 1015, programming note.
> >> +   */
> >> +  initial_allocation_mask = ~0x0;
> >> +  for (i = 2; i < (pkeys_total - os_reserved); i++)
> >> +  initial_allocation_mask &= ~(0x1< >>  }
> >>  #endif /*_ASM_PPC64_PKEYS_H */
> >
> > In v6, key 31 was also reserved, but it's not in this version. Is this
> > intentional?
> 
> That whole thing could be replaced with two constants.
> 
> Except it can't, because we can't just hard code the number of keys. It
> needs to come either from the device tree or be based on the CPU we're
> running on.
> 
> > Isn't it better for this function to be in pkeys.c? Ideally, functions
> > should be in .c files not in headers unless they're very small or
> > performance sensitive IMHO.
> 
> Yes. No reason for that to be in a header AFAICS.

Yes can be moved into the pkeys.c file.  It was a simple function
to begin withbut not so any more.

RP



Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc

2017-08-17 Thread Bart Van Assche
On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote:
> On 08/16/2017 12:21 PM, Bart Van Assche wrote:
> > On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote:
> > > As of next-20170809, linux-next on powerpc boot hung with below trace
> > > message.
> > > 
> > > [ ... ]
> > > 
> > > A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq:
> > > Always unprepare ...) in the merge branch 'scsi/for-next'
> > > 
> > > System booted fine when the below commit is reverted: 
> > > 
> > > commit 270065e92c317845d69095ec8e3d18616b5b39d5
> > > Author: Bart Van Assche 
> > > Date:   Thu Aug 3 14:40:14 2017 -0700
> > > 
> > > scsi: scsi-mq: Always unprepare before requeuing a request
> > 
> > Hello Brian and Michael,
> > 
> > Do you agree that this probably indicates a bug in the PowerPC block driver
> > that is used to access the boot disk? Anyway, since a solution is not yet
> > available, I will submit a revert for this patch.
> 
> I've been looking at this a bit, and can recreate the issue, but haven't
> got to root cause of the issue as of yet. If I do a sysrq-w while the system 
> is hung
> during boot I see this:
> 
> [   25.561523] Workqueue: events_unbound async_run_entry_fn
> [   25.561527] Call Trace:
> [   25.561529] [c001697873f0] [c00169701600] 0xc00169701600 
> (unreliable)
> [   25.561534] [c001697875c0] [c001ab78] __switch_to+0x2e8/0x430
> [   25.561539] [c00169787620] [c091ccb0] __schedule+0x310/0xa00
> [   25.561543] [c001697876f0] [c091d3e0] schedule+0x40/0xb0
> [   25.561548] [c00169787720] [c0921e40] 
> schedule_timeout+0x200/0x430
> [   25.561553] [c00169787810] [c091db10] 
> io_schedule_timeout+0x30/0x70
> [   25.561558] [c00169787840] [c091e978] 
> wait_for_common_io.constprop.3+0x178/0x280
> [   25.561563] [c001697878c0] [c047f7ec] blk_execute_rq+0x7c/0xd0
> [   25.561567] [c00169787910] [c0614cd0] scsi_execute+0x100/0x230
> [   25.561572] [c00169787990] [c060d29c] 
> scsi_report_opcode+0xbc/0x170
> [   25.561577] [c00169787a50] [d4fe6404] 
> sd_revalidate_disk+0xe04/0x1620 [sd_mod]
> [   25.561583] [c00169787b80] [d4fe6d84] 
> sd_probe_async+0xb4/0x230 [sd_mod]
> [   25.561588] [c00169787c00] [c010fc44] 
> async_run_entry_fn+0x74/0x210
> [   25.561593] [c00169787c90] [c0102f48] 
> process_one_work+0x198/0x480
> [   25.561598] [c00169787d30] [c01032b8] worker_thread+0x88/0x510
> [   25.561603] [c00169787dc0] [c010b030] kthread+0x160/0x1a0
> [   25.561608] [c00169787e30] [c000b3a4] 
> ret_from_kernel_thread+0x5c/0xb8
> 
> I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID 
> arrays don't support
> the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting 
> sdev->no_report_opcodes = 1
> in ipr's slave configure. This seems to eliminate the boot hang for me, but 
> is only working around
> the issue. Since this command is not supported by ipr, it should return with 
> an illegal request.
> When I'm hung at this point, there is nothing outstanding to the adapter / 
> driver. I'll continue
> debugging...

(+linux-scsi)

Hello Brian,

Is kernel debugging enabled on your test system? Is lockdep enabled?
Anyway, stack traces like the above usually mean that a request got stuck in
a block or scsi driver (ipr in this case). Information about pending requests,
including the SCSI CDB, is available under /sys/kernel/debug/block (see also
commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")).

Bart.

Re: [RFC v7 02/25] powerpc: track allocation status of all pkeys

2017-08-17 Thread Ram Pai
On Thu, Aug 10, 2017 at 05:25:39PM -0300, Thiago Jung Bauermann wrote:
> 
> Ram Pai  writes:
> >  static inline void pkey_initialize(void)
> >  {
> > +   int os_reserved, i;
> > +
> > /* disable the pkey system till everything
> >  * is in place. A patch further down the
> >  * line will enable it.
> >  */
> > pkey_inited = false;
> > +
> > +   /* Lets assume 32 keys */
> > +   pkeys_total = 32;
> > +
> > +#ifdef CONFIG_PPC_4K_PAGES
> > +   /*
> > +* the OS can manage only 8 pkeys
> > +* due to its inability to represent
> > +* them in the linux 4K-PTE.
> > +*/
> > +   os_reserved = pkeys_total-8;
> > +#else
> > +   os_reserved = 0;
> > +#endif
> > +   /*
> > +* Bits are in LE format.
> > +* NOTE: 1, 0 are reserved.
> > +* key 0 is the default key, which allows read/write/execute.
> > +* key 1 is recommended not to be used.
> > +* PowerISA(3.0) page 1015, programming note.
> > +*/
> > +   initial_allocation_mask = ~0x0;
> > +   for (i = 2; i < (pkeys_total - os_reserved); i++)
> > +   initial_allocation_mask &= ~(0x1< >  }
> >  #endif /*_ASM_PPC64_PKEYS_H */
> 
> In v6, key 31 was also reserved, but it's not in this version. Is this
> intentional?

On powernv platform, there is no hypervisor and hence the hypervisor
will not reserve key 31 for its own use. Wherease on PAPR guest
the hypervisor takes away key 31.  

Its not possible to determine at compile time which keys are used
or not. Hence the above code.  pkeys_total is 32 in this patch,
but will be set to whatever value the device tree tells us. That will
be done in a subsequent patch.


RP



Re: [PATCH v4 7/7] ima: Support module-style appended signatures for appraisal

2017-08-17 Thread Mimi Zohar

> diff --git a/security/integrity/ima/ima_appraise.c 
> b/security/integrity/ima/ima_appraise.c
> index 87d2b601cf8e..5a244ebc61d9 100644
> --- a/security/integrity/ima/ima_appraise.c
> +++ b/security/integrity/ima/ima_appraise.c
> @@ -190,6 +190,64 @@ int ima_read_xattr(struct dentry *dentry,
>   return ret;
>  }
> 
> +static void process_xattr_error(int rc, struct integrity_iint_cache *iint,
> + int opened, char const **cause,
> + enum integrity_status *status)
> +{
> + if (rc && rc != -ENODATA)
> + return;
> +
> + *cause = iint->flags & IMA_DIGSIG_REQUIRED ?
> + "IMA-signature-required" : "missing-hash";
> + *status = INTEGRITY_NOLABEL;
> +
> + if (opened & FILE_CREATED)
> + iint->flags |= IMA_NEW_FILE;
> +
> + if ((iint->flags & IMA_NEW_FILE) &&
> + !(iint->flags & IMA_DIGSIG_REQUIRED))
> + *status = INTEGRITY_PASS;
> +}
> +
> +static int appraise_modsig(struct integrity_iint_cache *iint,
> +struct evm_ima_xattr_data *xattr_value,
> +int xattr_len)
> +{
> + enum hash_algo algo;
> + const void *digest;
> + void *buf;
> + int rc, len;
> + u8 dig_len;
> +
> + rc = ima_modsig_verify(INTEGRITY_KEYRING_IMA, xattr_value);
> + if (rc)
> + return rc;
> +
> + /*
> +  * The signature is good. Now let's put the sig hash
> +  * into the iint cache so that it gets stored in the
> +  * measurement list.
> +  */
> +
> + rc = ima_get_modsig_hash(xattr_value, , , _len);
> + if (rc)
> + return rc;
> +
> + len = sizeof(iint->ima_hash) + dig_len;
> + buf = krealloc(iint->ima_hash, len, GFP_NOFS);
> + if (!buf)
> + return -ENOMEM;
> +
> + iint->ima_hash = buf;
> + iint->flags |= IMA_DIGSIG;
> + iint->ima_hash->algo = algo;
> + iint->ima_hash->length = dig_len;
> +
> + memcpy(iint->ima_hash->digest, digest, dig_len);
> +
> + return 0;
> +}

Depending on the IMA policy, the file could already have been
measured.  That measurement list entry might include the file
signature, as stored in the xattr, in the ima-sig template data.

I think even if a measurement list entry exists, we would want an
additional measurement list entry, which includes the appended
signature in the ima-sig template data.

Mimi



Re: [v6 05/15] mm: don't accessed uninitialized struct pages

2017-08-17 Thread Michal Hocko
On Thu 17-08-17 11:28:23, Pasha Tatashin wrote:
> Hi Michal,
> 
> I've been looking through this code again, and I think your suggestion will
> work. I did not realize this iterator already exist:
> 
> for_each_free_mem_range() basically iterates through (memory && !reserved)
> 
> This is exactly what we need here. So, I will update this patch to use this
> iterator, which will simplify it.

Please have a look at
http://lkml.kernel.org/r/20170815093306.gc29...@dhcp22.suse.cz
I believe we can simply drop the check altogether.
-- 
Michal Hocko
SUSE Labs


Re: [v6 01/15] x86/mm: reserve only exiting low pages

2017-08-17 Thread Pasha Tatashin

Hi Michal,

While working on a bug that was reported to me by "kernel test robot".

 unable to handle kernel NULL pointer dereference at   (null)

The issue was that page_to_pfn() on that configuration was looking for a 
section inside flags fields in "struct page". So, reserved but 
unavailable memory should have its "struct page" zeroed.


Therefore, I am going to remove this patch from my series, but instead 
have a new patch that iterates through:


reserved && !memory memblocks, and zeroes struct pages for them. Since 
for that memory struct pages will never go through __init_single_page(), 
yet some fields might still be accessed.


Pasha

On 08/14/2017 09:55 AM, Michal Hocko wrote:

Let's CC Hpa on this one. I am still not sure it is correct. The full
series is here
http://lkml.kernel.org/r/1502138329-123460-1-git-send-email-pasha.tatas...@oracle.com

On Mon 07-08-17 16:38:35, Pavel Tatashin wrote:

Struct pages are initialized by going through __init_single_page(). Since
the existing physical memory in memblock is represented in memblock.memory
list, struct page for every page from this list goes through
__init_single_page().

The second memblock list: memblock.reserved, manages the allocated memory.
The memory that won't be available to kernel allocator. So, every page from
this list goes through reserve_bootmem_region(), where certain struct page
fields are set, the assumption being that the struct pages have been
initialized beforehand.

In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but
memblock.memory might start at a later PFN. For example, in QEMU,
e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory,
so PFN 0 is not on memblock.memory (and hence isn't initialized via
__init_single_page) but is on memblock.reserved (and hence we set fields in
the uninitialized struct page).

Currently, the struct page memory is always zeroed during allocation,
which prevents this problem from being detected. But, if some asserts
provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become
visible in existing kernels.

In this patchset we will stop zeroing struct page memory during allocation.
Therefore, this bug must be fixed in order to avoid random assert failures
caused by CONFIG_DEBUG_VM_PGFLAGS triggers.

The fix is to reserve memory from the first existing PFN.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
---
  arch/x86/kernel/setup.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3486d0498800..489cdc141bcb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow);
  
  static void __init trim_low_memory_range(void)

  {
-   memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
+   unsigned long min_pfn = find_min_pfn_with_active_regions();
+   phys_addr_t base = min_pfn << PAGE_SHIFT;
+
+   memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE));
  }

  /*
--
2.14.0




Re: WARNING: CPU: 15 PID: 0 at block/blk-mq.c:1111 __blk_mq_run_hw_queue+0x1d8/0x1f0

2017-08-17 Thread Bart Van Assche
On Wed, 2017-08-16 at 15:10 -0500, Brian King wrote:
> On 08/16/2017 01:15 PM, Bart Van Assche wrote:
> > On Wed, 2017-08-16 at 23:37 +0530, Abdul Haleem wrote:
> > > Linux-next booted with the below warnings on powerpc
> > > 
> > > [ ... ]
> > > 
> > > boot warnings:
> > > --
> > > kvm: exiting hardware virtualization
> > > [ cut here ]
> > > WARNING: CPU: 15 PID: 0 at block/blk-mq.c: __blk_mq_run_hw_queue
> > > +0x1d8/0x1f0
> > > Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
> > > Call Trace:
> > > [c0037990] [c088f7b0] __blk_mq_delay_run_hw_queue
> > > +0x1f0/0x210
> > > [c00379d0] [c088fcb8] blk_mq_start_hw_queue+0x58/0x80
> > > [c00379f0] [c088fd40] blk_mq_start_hw_queues+0x60/0xb0
> > > [c0037a30] [c0ae2b54] scsi_kick_queue+0x34/0xa0
> > > [c0037a50] [c0ae2f70] scsi_run_queue+0x3b0/0x660
> > > [c0037ac0] [c0ae7ed4] scsi_run_host_queues+0x64/0xc0
> > > [c0037b00] [c0ae7f64] scsi_unblock_requests+0x34/0x60
> > > [c0037b20] [c0b14998] ipr_ioa_bringdown_done+0xf8/0x3a0
> > > [c0037bc0] [c0b12528] ipr_reset_ioa_job+0xd8/0x170
> > > [c0037c00] [c0b18790] ipr_reset_timer_done+0x110/0x160
> > > [c0037c50] [c024db50] call_timer_fn+0xa0/0x3a0
> > > [c0037ce0] [c024e058] expire_timers+0x1b8/0x350
> > > [c0037d50] [c024e2f0] run_timer_softirq+0x100/0x3e0
> > > [c0037df0] [c0162edc] __do_softirq+0x20c/0x620
> > > [c0037ee0] [c0163a80] irq_exit+0x230/0x290
> > > [c0037f10] [c001d770] __do_irq+0x170/0x410
> > > [c0037f90] [c003ea20] call_do_irq+0x14/0x24
> > > [c007f84e3a70] [c001dae0] do_IRQ+0xd0/0x190
> > > [c007f84e3ac0] [c0008c58] hardware_interrupt_common
> > > +0x158/0x160
> > 
> > Hello Brian,
> > 
> > In the MAINTAINERS file I found the following:
> > 
> > IBM Power Linux RAID adapter
> > M:  Brian King 
> > S:  Supported
> > F:  drivers/scsi/ipr.*
> > 
> > Is that information up-to-date? Do you agree that the above message 
> > indicates
> > a bug in the ipr driver?
> 
> Yes. Can you try with this patch that is in 4.13/scsi-fixes:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.13/scsi-fixes=b0e17a9b0df29590c45dfb296f541270a5941f41

Hello Brian,

Sorry but I don't have access to a setup on which I can test the ipr driver ...

Bart.

Re: [v6 05/15] mm: don't accessed uninitialized struct pages

2017-08-17 Thread Pasha Tatashin

Hi Michal,

I've been looking through this code again, and I think your suggestion 
will work. I did not realize this iterator already exist:


for_each_free_mem_range() basically iterates through (memory && !reserved)

This is exactly what we need here. So, I will update this patch to use 
this iterator, which will simplify it.


Pasha

On 08/14/2017 09:51 AM, Pasha Tatashin wrote:

mem_init()
  free_all_bootmem()
   free_low_memory_core_early()
for_each_reserved_mem_region()
 reserve_bootmem_region()
  init_reserved_page() <- if this is deferred reserved page
   __init_single_pfn()
__init_single_page()

So, currently, we are using the value of page->flags to figure out if 
this
page has been initialized while being part of deferred page, but this 
is not
going to work for this project, as we do not zero the memory that is 
backing

the struct pages, and size the value of page->flags can be anything.


True, this is the initialization part I've missed in one of the previous
patches already. Would it be possible to only iterate over !reserved
memory blocks instead? Now that we discard all the metadata later it
should be quite easy to do for_each_memblock_type, no?


Hi Michal,

Clever suggestion to add a new iterator to go through unreserved 
existing memory, I do not think there is this iterator available, so it 
would need to be implemented, using similar approach to what I have done 
with a call back.


However, there is a different reason, why I took this current approach.

Daniel Jordan is working on a ktask support:
https://lkml.org/lkml/2017/7/14/666

He and I discussed on how to multi-thread struct pages initialization 
within memory nodes using ktasks. Having this callback interface makes 
that multi-threading quiet easy, improving the boot performance further, 
with his prototype we saw x4-6 improvements (using 4-8 threads per 
node). Reducing the total time it takes to initialize all struct pages 
on machines with terabytes of memory to less than one second.


Pasha


Re: [PATCH v4 7/7] ima: Support module-style appended signatures for appraisal

2017-08-17 Thread Mimi Zohar
On Fri, 2017-08-04 at 19:03 -0300, Thiago Jung Bauermann wrote:
> This patch introduces the modsig keyword to the IMA policy syntax to
> specify that a given hook should expect the file to have the IMA signature
> appended to it. Here is how it can be used in a rule:
> 
> appraise func=KEXEC_KERNEL_CHECK appraise_type=modsig|imasig
> 
> With this rule, IMA will accept either an appended signature or a signature
> stored in the extended attribute. In that case, it will first check whether
> there is an appended signature, and if not it will read it from the
> extended attribute.
> 
> The format of the appended signature is the same used for signed kernel
> modules. This means that the file can be signed with the scripts/sign-file
> tool, with a command line such as this:
> 
> $ sign-file sha256 privkey_ima.pem x509_ima.der vmlinux
> 
> This code only works for files that are hashed from a memory buffer, not
> for files that are read from disk at the time of hash calculation. In other
> words, only hooks that use kernel_read_file can support appended
> signatures. This means that only FIRMWARE_CHECK, KEXEC_KERNEL_CHECK,
> KEXEC_INITRAMFS_CHECK and POLICY_CHECK can be supported.
> 
> This feature warrants a separate config option because enabling it brings
> in many other config options.
> 
> Signed-off-by: Thiago Jung Bauermann 

Other than the appended signature not being properly included in the
measurement list, the patch seems to be working.  This patch is on the
rather large size. Could you go back and break this patch up into
smaller, more concise patches, with clear patch descriptions (eg.
separate code cleanup from changes, new policy option, code for
appraising an attached signature, storing the appended signature in
the measurement list, etc)?

thanks!

Mimi

> ---
>  security/integrity/ima/Kconfig|  13 +++
>  security/integrity/ima/Makefile   |   1 +
>  security/integrity/ima/ima.h  |  70 +++-
>  security/integrity/ima/ima_appraise.c | 178 
> +-
>  security/integrity/ima/ima_main.c |   7 +-
>  security/integrity/ima/ima_modsig.c   | 178 
> ++
>  security/integrity/ima/ima_policy.c   |  26 +++--
>  security/integrity/ima/ima_template_lib.c |  14 ++-
>  security/integrity/integrity.h|   4 +-
>  9 files changed, 443 insertions(+), 48 deletions(-)
> 
> diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
> index 35ef69312811..55f734a6124b 100644
> --- a/security/integrity/ima/Kconfig
> +++ b/security/integrity/ima/Kconfig
> @@ -163,6 +163,19 @@ config IMA_APPRAISE_BOOTPARAM
> This option enables the different "ima_appraise=" modes
> (eg. fix, log) from the boot command line.
> 
> +config IMA_APPRAISE_MODSIG
> + bool "Support module-style signatures for appraisal"
> + depends on IMA_APPRAISE
> + depends on INTEGRITY_ASYMMETRIC_KEYS
> + select PKCS7_MESSAGE_PARSER
> + select MODULE_SIG_FORMAT
> + default n
> + help
> +Adds support for signatures appended to files. The format of the
> +appended signature is the same used for signed kernel modules.
> +The modsig keyword can be used in the IMA policy to allow a hook
> +to accept such signatures.
> +
>  config IMA_TRUSTED_KEYRING
>   bool "Require all keys on the .ima keyring be signed (deprecated)"
>   depends on IMA_APPRAISE && SYSTEM_TRUSTED_KEYRING
> diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
> index 29f198bde02b..c72026acecc3 100644
> --- a/security/integrity/ima/Makefile
> +++ b/security/integrity/ima/Makefile
> @@ -8,5 +8,6 @@ obj-$(CONFIG_IMA) += ima.o
>  ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
>ima_policy.o ima_template.o ima_template_lib.o
>  ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
> +ima-$(CONFIG_IMA_APPRAISE_MODSIG) += ima_modsig.o
>  ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
>  obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index d52b487ad259..5492af2cd7c7 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -190,6 +190,8 @@ enum ima_hooks {
>   __ima_hooks(__ima_hook_enumify)
>  };
> 
> +extern const char *const func_tokens[];
> +
>  /* LIM API function definitions */
>  int ima_get_action(struct inode *inode, int mask,
>  enum ima_hooks func, int *pcr);
> @@ -236,9 +238,10 @@ int ima_policy_show(struct seq_file *m, void *v);
>  #ifdef CONFIG_IMA_APPRAISE
>  int ima_appraise_measurement(enum ima_hooks func,
>struct integrity_iint_cache *iint,
> -  struct file *file, const unsigned char *filename,
> -  struct evm_ima_xattr_data *xattr_value,
> -  int xattr_len, int 

Re: [PATCH] powerpc: powernv: Fix build error on const discarding

2017-08-17 Thread Madhavan Srinivasan



On Wednesday 16 August 2017 06:04 PM, Corentin Labbe wrote:

When building a random powerpc kernel I hit this build error:
   CC  arch/powerpc/platforms/powernv/opal-imc.o
arch/powerpc/platforms/powernv/opal-imc.c: In function « 
disable_nest_pmu_counters »:
arch/powerpc/platforms/powernv/opal-imc.c:130:13: error : assignment discards « 
const » qualifier from pointer target type [-Werror=discarded-qualifiers]
l_cpumask = cpumask_of_node(nid);
  ^
This patch simply add const to l_cpumask to fix this issue.


Reviewed-by: Madhavan Srinivasan 

Signed-off-by: Corentin Labbe 
---
  arch/powerpc/platforms/powernv/opal-imc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index b903bf5e6006..21f6531fae20 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -123,7 +123,7 @@ static int imc_pmu_create(struct device_node *parent, int 
pmu_index, int domain)
  static void disable_nest_pmu_counters(void)
  {
int nid, cpu;
-   struct cpumask *l_cpumask;
+   const struct cpumask *l_cpumask;

get_online_cpus();
for_each_online_node(nid) {




Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-08-17 Thread Michael Ellerman
"Paul E. McKenney"  writes:

> On Wed, Aug 16, 2017 at 05:56:17AM -0700, Paul E. McKenney wrote:
>> On Wed, Aug 16, 2017 at 10:43:52PM +1000, Michael Ellerman wrote:
>> > "Paul E. McKenney"  writes:
>> > ...
>> > >
>> > > commit 33103e7b1f89ef432dfe3337d2a6932cdf5c1312
>> > > Author: Paul E. McKenney 
>> > > Date:   Mon Aug 14 08:54:39 2017 -0700
>> > >
>> > > EXP: Trace tick return from tick_nohz_stop_sched_tick
>> > > 
>> > > Signed-off-by: Paul E. McKenney 
>> > >
>> > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>> > > index c7a899c5ce64..7358a5073dfb 100644
>> > > --- a/kernel/time/tick-sched.c
>> > > +++ b/kernel/time/tick-sched.c
>> > > @@ -817,6 +817,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct 
>> > > tick_sched *ts,
>> > >   * (not only the tick).
>> > >   */
>> > >  ts->sleep_length = ktime_sub(dev->next_event, now);
>> > > +trace_printk("tick_nohz_stop_sched_tick: %lld\n", (tick - 
>> > > ktime_get()) / 1000);
>> > >  return tick;
>> > >  }
>> > 
>> > Should I be seeing negative values? A small sample:
>> 
>> Maybe due to hypervisor preemption delays, but I confess that I am
>> surprised to see them this large.  1,602,250,019 microseconds is something
>> like a half hour, which could result in stall warnings all by itself.

Hmm. This is a bare metal machine. So no hypervisor.

>> I will take a look!
>
> And from your ps output, PID 9 is rcu_sched, which is the RCU grace-period
> kthread that stalled.  This kthread was starved, based on this from your
> dmesg:
>
> [ 1602.067008] rcu_sched kthread starved for 2603 jiffies! g7275 c7274 f0x0 
> RCU_GP_WAIT_FQS(3) ->state=0x1
>
> The RCU_GP_WAIT_FQS says that this kthread is periodically scanning for
> idle-CPU and offline-CPU quiescent states, which means that its waits
> will be accompanied by short timeouts.  The "starved for 2603 jiffies"
> says that it has not run for one good long time.  The ->state is its
> task_struct ->state field.
>
> The immediately preceding dmesg line is as follows:
>
> [ 1602.063851]  (detected by 53, t=2603 jiffies, g=7275, c=7274, q=608)
>
> In other words, the rcu_sched grace-period kthread has been starved
> for the entire duration of the current grace period, as shown by the
> t=2603.
>
> Lets turn now to the trace output, looking for the last bit of the
> rcu_sched task's activity:
>
>rcu_sched-9 [054] d...  1576.030096: timer_start: 
> timer=c007fae1bc20 function=process_timeout expires=4295094922 
> [timeout=1] cpu=54 idx=0 flags=
> ksoftirqd/53-276   [053] ..s.  1576.030097: rcu_invoke_callback: 
> rcu_sched rhp=c00fcf8c4eb0 func=__d_free
>rcu_sched-9 [054] d...  1576.030097: rcu_utilization: Start 
> context switch
> ksoftirqd/53-276   [053] ..s.  1576.030098: rcu_invoke_callback: 
> rcu_sched rhp=c00fcff74ee0 func=proc_i_callback
>rcu_sched-9 [054] d...  1576.030098: rcu_grace_period: rcu_sched 
> 7275 cpuqs
>rcu_sched-9 [054] d...  1576.030099: rcu_utilization: End context 
> switch
>
> So this task set up a timer ("timer_start:") for one jiffy ("[timeout=1]",
> but what is with "expires=4295094922"?)

That's a good one.

I have HZ=100, and therefore:

INITIAL_JIFFIES = (1 << 32) - (300 * 100) = 4294937296

So the expires value of 4295094922 is:

4295094922 - 4294937296 = 157626

Jiffies since boot.

Or 1576,260,000,000 ns == 1576.26 s.

> Of course, the timer will have expired in the context of some other task,
> but a search for "c007fae1bc20" (see the "timer=" in the first trace
> line above) shows nothing (to be painfully accurate, the search wraps back
> to earlier uses of this timer by rcu_sched).  So the timer never did fire.

Or it just wasn't in the trace ?

I'll try and get it to trace a bit longer and see if that is helpful.

cheers


[PATCH 0/3] constify scsi vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Arvind Yadav (3):
  [PATCH 1/3] scsi: ibmvfc: constify vio_device_id
  [PATCH 2/3] scsi: ibmvscsi: constify vio_device_id
  [PATCH 3/3] scsi: ibmvscsi_tgt: constify vio_device_id

 drivers/scsi/ibmvscsi/ibmvfc.c   | 2 +-
 drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH 3/3] scsi: ibmvscsi_tgt: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
index 659ab48..7575276 100644
--- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
+++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
@@ -3956,7 +3956,7 @@ static struct class ibmvscsis_class = {
.dev_groups = ibmvscsis_dev_groups,
 };
 
-static struct vio_device_id ibmvscsis_device_table[] = {
+static const struct vio_device_id ibmvscsis_device_table[] = {
{ "v-scsi-host", "IBM,v-scsi-host" },
{ "", "" }
 };
-- 
2.7.4



[PATCH 2/3] scsi: ibmvscsi: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c b/drivers/scsi/ibmvscsi/ibmvscsi.c
index da22b36..7d156b161 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -2330,7 +2330,7 @@ static int ibmvscsi_resume(struct device *dev)
  * ibmvscsi_device_table: Used by vio.c to match devices in the device tree we 
  * support.
  */
-static struct vio_device_id ibmvscsi_device_table[] = {
+static const struct vio_device_id ibmvscsi_device_table[] = {
{"vscsi", "IBM,v-scsi"},
{ "", "" }
 };
-- 
2.7.4



[PATCH 1/3] scsi: ibmvfc: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index cc4e05b..87d83b1 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4929,7 +4929,7 @@ static unsigned long ibmvfc_get_desired_dma(struct 
vio_dev *vdev)
return pool_dma + ((512 * 1024) * driver_template.cmd_per_lun);
 }
 
-static struct vio_device_id ibmvfc_device_table[] = {
+static const struct vio_device_id ibmvfc_device_table[] = {
{"fcp", "IBM,vfc-client"},
{ "", "" }
 };
-- 
2.7.4



[PATCH] tty: hvcs: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/tty/hvc/hvcs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 79cc5be..7320190 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -675,7 +675,7 @@ static int khvcsd(void *unused)
return 0;
 }
 
-static struct vio_device_id hvcs_driver_table[] = {
+static const struct vio_device_id hvcs_driver_table[] = {
{"serial-server", "hvterm2"},
{ "", "" }
 };
-- 
2.7.4



[PATCH] tty: hvc_vio: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/tty/hvc/hvc_vio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/hvc/hvc_vio.c b/drivers/tty/hvc/hvc_vio.c
index b05dc50..39002a1 100644
--- a/drivers/tty/hvc/hvc_vio.c
+++ b/drivers/tty/hvc/hvc_vio.c
@@ -53,7 +53,7 @@
 
 static const char hvc_driver_name[] = "hvc_console";
 
-static struct vio_device_id hvc_driver_table[] = {
+static const struct vio_device_id hvc_driver_table[] = {
{"serial", "hvterm1"},
 #ifndef HVC_OLD_HVSI
{"serial", "hvterm-protocol"},
-- 
2.7.4



[PATCH] net: ibm: ibmvnic: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index a3e6946..d5372b5 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3859,7 +3859,7 @@ static int ibmvnic_resume(struct device *dev)
return 0;
 }
 
-static struct vio_device_id ibmvnic_device_table[] = {
+static const struct vio_device_id ibmvnic_device_table[] = {
{"network", "IBM,vnic"},
{"", "" }
 };
-- 
2.7.4



[PATCH] net: ibm: ibmveth: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/net/ethernet/ibm/ibmveth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index d17c2b0..f210398 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1897,7 +1897,7 @@ static int ibmveth_resume(struct device *dev)
return 0;
 }
 
-static struct vio_device_id ibmveth_device_table[] = {
+static const struct vio_device_id ibmveth_device_table[] = {
{ "network", "IBM,l-lan"},
{ "", "" }
 };
-- 
2.7.4



[PATCH] crypto: nx: 842: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/crypto/nx/nx-842-pseries.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/nx/nx-842-pseries.c 
b/drivers/crypto/nx/nx-842-pseries.c
index cddc6d8..bf52cd1 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -1082,7 +1082,7 @@ static int nx842_remove(struct vio_dev *viodev)
return 0;
 }
 
-static struct vio_device_id nx842_vio_driver_ids[] = {
+static const struct vio_device_id nx842_vio_driver_ids[] = {
{"ibm,compression-v1", "ibm,compression"},
{"", ""},
 };
-- 
2.7.4



[PATCH] crypto: nx: constify vio_device_id

2017-08-17 Thread Arvind Yadav
vio_device_id are not supposed to change at runtime. All functions
working with vio_device_id provided by  work with
const vio_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
 drivers/crypto/nx/nx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/nx/nx.c b/drivers/crypto/nx/nx.c
index 036057a..3a5e31b 100644
--- a/drivers/crypto/nx/nx.c
+++ b/drivers/crypto/nx/nx.c
@@ -833,7 +833,7 @@ static void __exit nx_fini(void)
vio_unregister_driver(_driver.viodriver);
 }
 
-static struct vio_device_id nx_crypto_driver_ids[] = {
+static const struct vio_device_id nx_crypto_driver_ids[] = {
{ "ibm,sym-encryption-v1", "ibm,sym-encryption" },
{ "", "" }
 };
-- 
2.7.4



Re: [1/5] powerpc: Test MSR_FP and MSR_VEC when enabling/flushing VSX

2017-08-17 Thread Michael Ellerman
On Wed, 2017-08-16 at 06:01:14 UTC, Benjamin Herrenschmidt wrote:
> VSX uses a combination of the old vector registers, the old FP registers
> and new "second halves" of the FP registers.
> 
> Thus when we need to see the VSX state in the thread struct
> (flush_vsx_to_thread) or when we'll use the VSX in the kernel
> (enable_kernel_vsx) we need to ensure they are all flushed into
> the thread struct if either of them is individually enabled.
> 
> Unfortunately we only tested if the whole VSX was enabled, not
> if they were individually enabled.
> 
> Signed-off-by: Benjamin Herrenschmidt 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/5a69aec945d27e78abac9fd032533d

cheers


Re: [PATCH 3/6] powerpc/mm: Ensure cpumask update is ordered

2017-08-17 Thread Michael Ellerman
Nicholas Piggin  writes:

> On Mon, 24 Jul 2017 21:20:07 +1000
> Nicholas Piggin  wrote:
>
>> On Mon, 24 Jul 2017 14:28:00 +1000
>> Benjamin Herrenschmidt  wrote:
>> 
>> > There is no guarantee that the various isync's involved with
>> > the context switch will order the update of the CPU mask with
>> > the first TLB entry for the new context being loaded by the HW.
>> > 
>> > Be safe here and add a memory barrier to order any subsequent
>> > load/store which may bring entries into the TLB.
>> > 
>> > The corresponding barrier on the other side already exists as
>> > pte updates use pte_xchg() which uses __cmpxchg_u64 which has
>> > a sync after the atomic operation.
>> > 
>> > Signed-off-by: Benjamin Herrenschmidt 
>> > ---
>> >  arch/powerpc/include/asm/mmu_context.h | 1 +
>> >  1 file changed, 1 insertion(+)
>> > 
>> > diff --git a/arch/powerpc/include/asm/mmu_context.h 
>> > b/arch/powerpc/include/asm/mmu_context.h
>> > index ed9a36ee3107..ff1aeb2cd19f 100644
>> > --- a/arch/powerpc/include/asm/mmu_context.h
>> > +++ b/arch/powerpc/include/asm/mmu_context.h
>> > @@ -110,6 +110,7 @@ static inline void switch_mm_irqs_off(struct mm_struct 
>> > *prev,
>> >/* Mark this context has been used on the new CPU */
>> >if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
>> >cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));
>> > +  smp_mb();
>> >new_on_cpu = true;
>> >}
>> >
>> 
>> I think this is the right thing to do, but it should be commented.
>> Is hwsync the right barrier? (i.e., it will order the page table walk)
>
> After some offline discussion, I think we have an agreement that
> this is the right barrier, as it orders with the subsequent load
> of next->context.id that the mtpid depends on (or slbmte for HPT).
>
> So we should have a comment here to that effect, and including
> the pte_xchg comments from your changelog. Some comment (at least
> refer back to here) added at pte_xchg too please.
>
> Other than that your series seems good to me if you repost it you
> can add
>
> Reviewed-by: Nicholas Piggin 
>
> This one out of the series is the bugfix so it should go to stable
> as well, right?

So I'm waiting on a v2?

cheers


Re: [PATCH] powerpc/xmon: Exclude all of xmon/ from ftrace

2017-08-17 Thread Michael Ellerman
"Naveen N. Rao"  writes:

> Hi Michael,
> Sorry -- was off since last week.

No worries.

cheers


Re: [PATCH] powerpc: powernv: Fix build error on const discarding

2017-08-17 Thread Michael Ellerman
Corentin Labbe  writes:

> When building a random powerpc kernel I hit this build error:
>   CC  arch/powerpc/platforms/powernv/opal-imc.o
> arch/powerpc/platforms/powernv/opal-imc.c: In function « 
> disable_nest_pmu_counters »:
> arch/powerpc/platforms/powernv/opal-imc.c:130:13: error : assignment discards 
> « const » qualifier from pointer target type [-Werror=discarded-qualifiers]
>l_cpumask = cpumask_of_node(nid);
>  ^
> This patch simply add const to l_cpumask to fix this issue.

Thanks. I'm not sure why we haven't seen that.

Do you mind attaching your .config ?

cheers


[PATCH v2 1/1] Split VGA default device handler out of VGA arbiter

2017-08-17 Thread Daniel Axtens
A system without PCI legacy resources (e.g. ARM64) may find that no
default/boot VGA device has been marked, because the VGA arbiter
checks for legacy resource decoding before marking a card as default.

Split the small bit of code that does default VGA handling out from
the arbiter. Add a Kconfig option to allow the kernel to be built
with just the default handling, or the arbiter and default handling.

Add handling for devices that should be marked as default but aren't
handled by the vga arbiter by adding a late initcall and a class
enable hook. If there is no default from vgaarb then the first card
that is enabled, has a driver bound, and can decode memory or I/O
will be marked as default.

Signed-off-by: Daniel Axtens 

---

v2: Tested on:
 - x86_64 laptop
 - arm64 D05 board with hibmc card
 - qemu powerpc with tcg and bochs std-vga

I know this adds another config option and that's a bit sad, but
we can't include it unconditionally as it depends on PCI.
Suggestions welcome.
---
 arch/ia64/pci/fixup.c|   2 +-
 arch/powerpc/kernel/pci-common.c |   2 +-
 arch/x86/pci/fixup.c |   2 +-
 arch/x86/video/fbdev.c   |   2 +-
 drivers/gpu/vga/Kconfig  |  12 +++
 drivers/gpu/vga/Makefile |   1 +
 drivers/gpu/vga/vga_default.c| 159 +++
 drivers/gpu/vga/vga_switcheroo.c |   2 +-
 drivers/gpu/vga/vgaarb.c |  41 +-
 drivers/pci/pci-sysfs.c  |   2 +-
 include/linux/vga_default.h  |  44 +++
 include/linux/vgaarb.h   |  14 
 12 files changed, 225 insertions(+), 58 deletions(-)
 create mode 100644 drivers/gpu/vga/vga_default.c
 create mode 100644 include/linux/vga_default.h

diff --git a/arch/ia64/pci/fixup.c b/arch/ia64/pci/fixup.c
index 41caa99add51..b35d1cf4501a 100644
--- a/arch/ia64/pci/fixup.c
+++ b/arch/ia64/pci/fixup.c
@@ -5,7 +5,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 
 #include 
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 341a7469cab8..4fd890a51d18 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -31,7 +31,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include 
 #include 
diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index 11e407489db0..b1254bc09a45 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -5,7 +5,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 
diff --git a/arch/x86/video/fbdev.c b/arch/x86/video/fbdev.c
index 9fd24846d094..62cfa74ea86e 100644
--- a/arch/x86/video/fbdev.c
+++ b/arch/x86/video/fbdev.c
@@ -9,7 +9,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 int fb_is_primary_device(struct fb_info *info)
 {
diff --git a/drivers/gpu/vga/Kconfig b/drivers/gpu/vga/Kconfig
index 29437eabe095..81d4105aecf6 100644
--- a/drivers/gpu/vga/Kconfig
+++ b/drivers/gpu/vga/Kconfig
@@ -1,3 +1,14 @@
+config VGA_DEFAULT
+   bool "VGA Default Device Support" if EXPERT
+   default y
+   depends on PCI
+   help
+ Some programs find it helpful to know what VGA device is the default.
+ On platforms like x86 this means the device used by the BIOS to show
+ early boot messages. On other platforms this may be an arbitrary PCI
+ graphics card. Select this to have a default device recorded within
+ the kernel and exposed to userspace through sysfs.
+
 config VGA_ARB
bool "VGA Arbitration" if EXPERT
default y
@@ -22,6 +33,7 @@ config VGA_SWITCHEROO
depends on X86
depends on ACPI
select VGA_ARB
+   select VGA_DEFAULT
help
  Many laptops released in 2008/9/10 have two GPUs with a multiplexer
  to switch between them. This adds support for dynamic switching when
diff --git a/drivers/gpu/vga/Makefile b/drivers/gpu/vga/Makefile
index 14ca30b75d0a..1e30f90d40fb 100644
--- a/drivers/gpu/vga/Makefile
+++ b/drivers/gpu/vga/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_VGA_ARB)  += vgaarb.o
+obj-$(CONFIG_VGA_DEFAULT) += vga_default.o
 obj-$(CONFIG_VGA_SWITCHEROO) += vga_switcheroo.o
diff --git a/drivers/gpu/vga/vga_default.c b/drivers/gpu/vga/vga_default.c
new file mode 100644
index ..f6fcb0eb1507
--- /dev/null
+++ b/drivers/gpu/vga/vga_default.c
@@ -0,0 +1,159 @@
+/*
+ * vga_default.c: What is the default/boot PCI VGA device?
+ *
+ * (C) Copyright 2005 Benjamin Herrenschmidt 
+ * (C) Copyright 2007 Paulo R. Zanoni 
+ * (C) Copyright 2007, 2009 Tiago Vignatti 
+ * (C) Copyright 2017 Canonical Ltd. (Author: Daniel Axtens )
+ *
+ * (License from vgaarb.c)
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without 

[PATCH v2 0/1] Split VGA default nomination out from VGA arbiter

2017-08-17 Thread Daniel Axtens
This is approach 3, version 2, of my patch series to sort out Xorg
autoconfiguration for the Hibmc card beind a Hisilicon bridge on
arm64.

Approach 1 was a simple quirk for the card+bridge to mark it as
default. This higlighted the fact that the default card was picked by
the arbiter, which assumed legacy resources. The lack of legacy
resources leads to quirks in ppc and concerns in arm land, so a more
generic approach was desired.
Link: https://www.spinics.net/lists/linux-pci/msg62865.html

Approach 2 allowed platforms to opt in to a class enable hook that
added a card as default if there was no default. This:

 - was possibly racy as ACPI PCI init and vgaarb are both subsys
   initcalls.

 - didn't check to see if a card had a driver.

 - meant that platforms for which the vga arbiter didn't make sense
   still needed it.

Links: https://www.spinics.net/lists/linux-pci/msg63092.html
   https://www.spinics.net/lists/linux-pci/msg63083.html

This is approach 3. It pulls the default handling out of the arbiter,
into its own file and behind its own Kconfig option. It adds the extra
detection as a late initcall and an enable hook that only operates
after the initcall, so it's not racy. It checks for drivers. It means
people can turn off the vga arbiter. It works sensibly for modules
too.

v1: https://www.spinics.net/lists/linux-pci/msg63581.html

Changes in v2:

Drop all the powerpc patches.

Including just the new handler doesn't change behaviour on powerpc.

This is because - as Bjorn pointed out on v1 - I had not fully
understood how fixup_vga worked. fixup_vga is quite aggressive: if
there is no default, and it finds a VGA card, it will mark that card
as default. Later on, if it finds a card with decoding enabled, it
will update the default.

This means that if there is any vga card in the system at all, a
default will be marked. This all happens at the FIXUP_CLASS_FINAL
stage, so if there is a vga card, a default will be marked before the
late_initcall that kicks off this new discovery process. This will
completely prevent my code from firing.

Once this is merged I will discuss with the ppc folks if they want to
move to this approach or if ppc should continue to be very optimistic
about the cards it marks as default.

Regards,
Daniel


Daniel Axtens (1):
  Split VGA default device handler out of VGA arbiter

 arch/ia64/pci/fixup.c|   2 +-
 arch/powerpc/kernel/pci-common.c |   2 +-
 arch/x86/pci/fixup.c |   2 +-
 arch/x86/video/fbdev.c   |   2 +-
 drivers/gpu/vga/Kconfig  |  12 +++
 drivers/gpu/vga/Makefile |   1 +
 drivers/gpu/vga/vga_default.c| 159 +++
 drivers/gpu/vga/vga_switcheroo.c |   2 +-
 drivers/gpu/vga/vgaarb.c |  41 +-
 drivers/pci/pci-sysfs.c  |   2 +-
 include/linux/vga_default.h  |  44 +++
 include/linux/vgaarb.h   |  14 
 12 files changed, 225 insertions(+), 58 deletions(-)
 create mode 100644 drivers/gpu/vga/vga_default.c
 create mode 100644 include/linux/vga_default.h

-- 
2.11.0



RE: [RFC PATCH v5 0/5] vfio-pci: Add support for mmapping MSI-X table

2017-08-17 Thread David Laight
From: Alex Williamson
> Sent: 16 August 2017 17:56
...
> Firmware pissing match...  Processors running with 8k or less page size
> fall within the recommendations of the PCI spec for register alignment
> of MMIO regions of the device and this whole problem becomes less of an
> issue.

Actually if qemu is causing the MSI-X table accesses to fault, why doesn't
it just lie to the guest about the physical address of the MSI-X table?
Then mmio access to anything in the same physical page will just work.

It has already been pointed out that you can't actually police the
interrupts that are raised without host hardware support.

Actually, putting other vectors in the MSI-X table is boring, most
drivers will ignore unexpected interrupts.
Much more interesting are physical memory addresses and accessible IO
addresses.
Of course, a lot of boards have PCI master capability and can probably
be persuaded to do writes to specific location anyway.

David



Re: [PATCH 00/12] ALSA: make snd_pcm_hardware const

2017-08-17 Thread Takashi Iwai
On Thu, 17 Aug 2017 11:15:48 +0200,
Bhumika Goyal wrote:
> 
> Make these const.
> 
> Bhumika Goyal (12):
>   ALSA: arm: make snd_pcm_hardware const
>   ALSA: atmel: make snd_pcm_hardware const
>   ALSA: drivers: make snd_pcm_hardware const
>   ALSA: isa: make snd_pcm_hardware const
>   ALSA: mips: make snd_pcm_hardware const
>   ALSA: pci: make snd_pcm_hardware const
>   ALSA: pcmcia: make snd_pcm_hardware const
>   ALSA: ppc: make snd_pcm_hardware const
>   ALSA: sh: make snd_pcm_hardware const
>   ALSA: sparc: make snd_pcm_hardware const
>   ALSA: usb: make snd_pcm_hardware const
>   ALSA: parisc: make snd_pcm_hardware const

Applied all patches now.  Thanks.


Takashi


[PATCH 6/6] ASoC: qcom: make snd_pcm_hardware const

2017-08-17 Thread Bhumika Goyal
Make this const as it is either passed as the 2nd argument
to the function snd_soc_set_runtime_hwparams, which is const or used
in a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
---
 sound/soc/qcom/lpass-platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/qcom/lpass-platform.c b/sound/soc/qcom/lpass-platform.c
index fb3576a..e1945e1 100644
--- a/sound/soc/qcom/lpass-platform.c
+++ b/sound/soc/qcom/lpass-platform.c
@@ -32,7 +32,7 @@ struct lpass_pcm_data {
 #define LPASS_PLATFORM_BUFFER_SIZE (16 * 1024)
 #define LPASS_PLATFORM_PERIODS 2
 
-static struct snd_pcm_hardware lpass_platform_pcm_hardware = {
+static const struct snd_pcm_hardware lpass_platform_pcm_hardware = {
.info   =   SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID |
SNDRV_PCM_INFO_INTERLEAVED |
-- 
1.9.1



[PATCH 5/6] ASoC: sh: make snd_pcm_hardware const

2017-08-17 Thread Bhumika Goyal
Make these const as they are only passed as the 2nd argument to the
function snd_soc_set_runtime_hwparams, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
---
 sound/soc/sh/dma-sh7760.c | 2 +-
 sound/soc/sh/fsi.c| 2 +-
 sound/soc/sh/rcar/core.c  | 2 +-
 sound/soc/sh/siu_dai.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/sound/soc/sh/dma-sh7760.c b/sound/soc/sh/dma-sh7760.c
index 35788a6..1e7d417 100644
--- a/sound/soc/sh/dma-sh7760.c
+++ b/sound/soc/sh/dma-sh7760.c
@@ -89,7 +89,7 @@ struct camelot_pcm {
 #define DMABRG_PREALLOC_BUFFER 32 * 1024
 #define DMABRG_PREALLOC_BUFFER_MAX 32 * 1024
 
-static struct snd_pcm_hardware camelot_pcm_hardware = {
+static const struct snd_pcm_hardware camelot_pcm_hardware = {
.info = (SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
diff --git a/sound/soc/sh/fsi.c b/sound/soc/sh/fsi.c
index 7cf0edb..6d3c770 100644
--- a/sound/soc/sh/fsi.c
+++ b/sound/soc/sh/fsi.c
@@ -1710,7 +1710,7 @@ static int fsi_dai_hw_params(struct snd_pcm_substream 
*substream,
  * pcm ops
  */
 
-static struct snd_pcm_hardware fsi_pcm_hardware = {
+static const struct snd_pcm_hardware fsi_pcm_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED  |
SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID,
diff --git a/sound/soc/sh/rcar/core.c b/sound/soc/sh/rcar/core.c
index df39831..1071332 100644
--- a/sound/soc/sh/rcar/core.c
+++ b/sound/soc/sh/rcar/core.c
@@ -843,7 +843,7 @@ static int rsnd_soc_hw_rule_channels(struct 
snd_pcm_hw_params *params,
ir, );
 }
 
-static struct snd_pcm_hardware rsnd_pcm_hardware = {
+static const struct snd_pcm_hardware rsnd_pcm_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED  |
SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID,
diff --git a/sound/soc/sh/siu_dai.c b/sound/soc/sh/siu_dai.c
index 4a22aad..1605029 100644
--- a/sound/soc/sh/siu_dai.c
+++ b/sound/soc/sh/siu_dai.c
@@ -333,7 +333,7 @@ static void siu_dai_spbstop(struct siu_port *port_info)
 /* API functions   */
 
 /* Playback and capture hardware properties are identical */
-static struct snd_pcm_hardware siu_dai_pcm_hw = {
+static const struct snd_pcm_hardware siu_dai_pcm_hw = {
.info   = SNDRV_PCM_INFO_INTERLEAVED,
.formats= SNDRV_PCM_FMTBIT_S16,
.rates  = SNDRV_PCM_RATE_8000_48000,
-- 
1.9.1



[PATCH 4/6] ASoC: kirkwood: make snd_pcm_hardware const

2017-08-17 Thread Bhumika Goyal
Make this const as it is either passed as the 2nd argument
to the function snd_soc_set_runtime_hwparams, which is const or used in
a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
---
 sound/soc/kirkwood/kirkwood-dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/kirkwood/kirkwood-dma.c 
b/sound/soc/kirkwood/kirkwood-dma.c
index dafd22e..cf23af1 100644
--- a/sound/soc/kirkwood/kirkwood-dma.c
+++ b/sound/soc/kirkwood/kirkwood-dma.c
@@ -27,7 +27,7 @@ static struct kirkwood_dma_data *kirkwood_priv(struct 
snd_pcm_substream *subs)
return snd_soc_dai_get_drvdata(soc_runtime->cpu_dai);
 }
 
-static struct snd_pcm_hardware kirkwood_dma_snd_hw = {
+static const struct snd_pcm_hardware kirkwood_dma_snd_hw = {
.info = SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_MMAP |
SNDRV_PCM_INFO_MMAP_VALID |
-- 
1.9.1



[PATCH 3/6] ASoC: Intel: Skylake: make snd_pcm_hardware const

2017-08-17 Thread Bhumika Goyal
Make this const as it is only passed as the 2nd argument to the
function snd_soc_set_runtime_hwparams, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
---
 sound/soc/intel/skylake/skl-pcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/intel/skylake/skl-pcm.c 
b/sound/soc/intel/skylake/skl-pcm.c
index e98d825..f6c9adb 100644
--- a/sound/soc/intel/skylake/skl-pcm.c
+++ b/sound/soc/intel/skylake/skl-pcm.c
@@ -33,7 +33,7 @@
 #define HDA_STEREO 2
 #define HDA_QUAD 4
 
-static struct snd_pcm_hardware azx_pcm_hw = {
+static const struct snd_pcm_hardware azx_pcm_hw = {
.info = (SNDRV_PCM_INFO_MMAP |
 SNDRV_PCM_INFO_INTERLEAVED |
 SNDRV_PCM_INFO_BLOCK_TRANSFER |
-- 
1.9.1



[PATCH 2/6] ASoC: Intel: Atom: make snd_pcm_hardware const

2017-08-17 Thread Bhumika Goyal
Make this const as it is only used in a copy operation.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
---
 sound/soc/intel/atom/sst-mfld-platform-pcm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/intel/atom/sst-mfld-platform-pcm.c 
b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
index b272df5..43e7fdd 100644
--- a/sound/soc/intel/atom/sst-mfld-platform-pcm.c
+++ b/sound/soc/intel/atom/sst-mfld-platform-pcm.c
@@ -76,7 +76,7 @@ int sst_unregister_dsp(struct sst_device *dev)
 }
 EXPORT_SYMBOL_GPL(sst_unregister_dsp);
 
-static struct snd_pcm_hardware sst_platform_pcm_hw = {
+static const struct snd_pcm_hardware sst_platform_pcm_hw = {
.info = (SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_DOUBLE |
SNDRV_PCM_INFO_PAUSE |
-- 
1.9.1



[PATCH 1/6] ASoC: fsl: make snd_pcm_hardware const

2017-08-17 Thread Bhumika Goyal
Make these const as they are only passed as the 2nd argument to the
function snd_soc_set_runtime_hwparams, which is const.
Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
---
 sound/soc/fsl/fsl_asrc_dma.c | 2 +-
 sound/soc/fsl/imx-pcm-fiq.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc_dma.c b/sound/soc/fsl/fsl_asrc_dma.c
index 2baf196..e1b97e5 100644
--- a/sound/soc/fsl/fsl_asrc_dma.c
+++ b/sound/soc/fsl/fsl_asrc_dma.c
@@ -20,7 +20,7 @@
 
 #define FSL_ASRC_DMABUF_SIZE   (256 * 1024)
 
-static struct snd_pcm_hardware snd_imx_hardware = {
+static const struct snd_pcm_hardware snd_imx_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_MMAP |
diff --git a/sound/soc/fsl/imx-pcm-fiq.c b/sound/soc/fsl/imx-pcm-fiq.c
index aef1f78..4e5fefe 100644
--- a/sound/soc/fsl/imx-pcm-fiq.c
+++ b/sound/soc/fsl/imx-pcm-fiq.c
@@ -154,7 +154,7 @@ static snd_pcm_uframes_t snd_imx_pcm_pointer(struct 
snd_pcm_substream *substream
return bytes_to_frames(substream->runtime, iprtd->offset);
 }
 
-static struct snd_pcm_hardware snd_imx_hardware = {
+static const struct snd_pcm_hardware snd_imx_hardware = {
.info = SNDRV_PCM_INFO_INTERLEAVED |
SNDRV_PCM_INFO_BLOCK_TRANSFER |
SNDRV_PCM_INFO_MMAP |
-- 
1.9.1



[PATCH 0/6] ASoC: make snd_pcm_hardware const

2017-08-17 Thread Bhumika Goyal
Make these const. Done using Coccinelle

Bhumika Goyal (6):
  ASoC: fsl: make snd_pcm_hardware const
  ASoC: Intel: Atom: make snd_pcm_hardware const
  ASoC: Intel: Skylake: make snd_pcm_hardware const
  ASoC: kirkwood: make snd_pcm_hardware const
  ASoC: sh: make snd_pcm_hardware const
  ASoC: qcom: make snd_pcm_hardware const

 sound/soc/fsl/fsl_asrc_dma.c | 2 +-
 sound/soc/fsl/imx-pcm-fiq.c  | 2 +-
 sound/soc/intel/atom/sst-mfld-platform-pcm.c | 2 +-
 sound/soc/intel/skylake/skl-pcm.c| 2 +-
 sound/soc/kirkwood/kirkwood-dma.c| 2 +-
 sound/soc/qcom/lpass-platform.c  | 2 +-
 sound/soc/sh/dma-sh7760.c| 2 +-
 sound/soc/sh/fsi.c   | 2 +-
 sound/soc/sh/rcar/core.c | 2 +-
 sound/soc/sh/siu_dai.c   | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

-- 
1.9.1



  1   2   >