date:20190515

Re: [RFC PATCH] powerpc/64/ftrace: mprofile-kernel patch out mflr

2019-05-15 Thread Nicholas Piggin

Naveen N. Rao's on May 14, 2019 6:32 pm:
> Michael Ellerman wrote:
>> "Naveen N. Rao"  writes:
>>> Michael Ellerman wrote:
 Nicholas Piggin  writes:
> The new mprofile-kernel mcount sequence is
>
>   mflrr0
>   bl  _mcount
>
> Dynamic ftrace patches the branch instruction with a noop, but leaves
> the mflr. mflr is executed by the branch unit that can only execute one
> per cycle on POWER9 and shared with branches, so it would be nice to
> avoid it where possible.
>
> This patch is a hacky proof of concept to nop out the mflr. Can we do
> this or are there races or other issues with it?
 
 There's a race, isn't there?
 
 We have a function foo which currently has tracing disabled, so the mflr
 and bl are nop'ed out.
 
   CPU 0CPU 1
   ==
   bl foo
   nop (ie. not mflr)
   -> interrupt
   something else   enable tracing for foo
   ...  patch mflr and branch
   <- rfi
   bl _mcount
 
 So we end up in _mcount() but with r0 not populated.
>>>
>>> Good catch! Looks like we need to patch the mflr with a "b +8" similar 
>>> to what we do in __ftrace_make_nop().
>> 
>> Would that actually make it any faster though? Nick?
> 
> Ok, how about doing this as a 2-step process?
> 1. patch 'mflr r0' with a 'b +8'
>synchronize_rcu_tasks()
> 2. convert 'b +8' to a 'nop'

Good idea. Well the mflr r0 is harmless, so you can leave that in.
You just need to ensure it's not removed before the bl is. So nop
the bl _mcount, then synchronize_rcu_tasks(), then nop the mflr?

Thanks,
Nick

Re: [PATCH 0/1] Forced-wakeup for stop lite states on Powernv

2019-05-15 Thread Gautham R Shenoy

Hello Nicholas,


On Thu, May 16, 2019 at 02:55:42PM +1000, Nicholas Piggin wrote:
> Abhishek's on May 13, 2019 7:49 pm:
> > On 05/08/2019 10:29 AM, Nicholas Piggin wrote:
> >> Abhishek Goel's on April 22, 2019 4:32 pm:
> >>> Currently, the cpuidle governors determine what idle state a idling CPU
> >>> should enter into based on heuristics that depend on the idle history on
> >>> that CPU. Given that no predictive heuristic is perfect, there are cases
> >>> where the governor predicts a shallow idle state, hoping that the CPU will
> >>> be busy soon. However, if no new workload is scheduled on that CPU in the
> >>> near future, the CPU will end up in the shallow state.
> >>>
> >>> Motivation
> >>> --
> >>> In case of POWER, this is problematic, when the predicted state in the
> >>> aforementioned scenario is a lite stop state, as such lite states will
> >>> inhibit SMT folding, thereby depriving the other threads in the core from
> >>> using the core resources.
> >>>
> >>> So we do not want to get stucked in such states for longer duration. To
> >>> address this, the cpuidle-core can queue timer to correspond with the
> >>> residency value of the next available state. This timer will forcefully
> >>> wakeup the cpu. Few such iterations will essentially train the governor to
> >>> select a deeper state for that cpu, as the timer here corresponds to the
> >>> next available cpuidle state residency. Cpu will be kicked out of the lite
> >>> state and end up in a non-lite state.
> >>>
> >>> Experiment
> >>> --
> >>> I performed experiments for three scenarios to collect some data.
> >>>
> >>> case 1 :
> >>> Without this patch and without tick retained, i.e. in a upstream kernel,
> >>> It would spend more than even a second to get out of stop0_lite.
> >>>
> >>> case 2 : With tick retained in a upstream kernel -
> >>>
> >>> Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
> >>> it to take 8 sched tick to get out of stop0_lite. Experimentally,
> >>> observation was
> >>>
> >>> =
> >>> sample  minmax   99percentile
> >>> 20  4ms12ms  4ms
> >>> =
> >>>
> >>> It would take atleast one sched tick to get out of stop0_lite.
> >>>
> >>> case 2 :  With this patch (not stopping tick, but explicitly queuing a
> >>>timer)
> >>>
> >>> 
> >>> sample  min max 99percentile
> >>> 
> >>> 20  144us   192us   144us
> >>> 
> >>>
> >>> In this patch, we queue a timer just before entering into a stop0_lite
> >>> state. The timer fires at (residency of next available state + exit 
> >>> latency
> >>> of next available state * 2). Let's say if next state(stop0) is available
> >>> which has residency of 20us, it should get out in as low as (20+2*2)*8
> >>> [Based on the forumla (residency + 2xlatency)*history length] microseconds
> >>> = 192us. Ideally we would expect 8 iterations, it was observed to get out
> >>> in 6-7 iterations. Even if let's say stop2 is next available state(stop0
> >>> and stop1 both are unavailable), it would take (100+2*10)*8 = 960us to get
> >>> into stop2.
> >>>
> >>> So, We are able to get out of stop0_lite generally in 150us(with this
> >>> patch) as compared to 4ms(with tick retained). As stated earlier, we do 
> >>> not
> >>> want to get stuck into stop0_lite as it inhibits SMT folding for other
> >>> sibling threads, depriving them of core resources. Current patch is using
> >>> forced-wakeup only for stop0_lite, as it gives performance benefit(primary
> >>> reason) along with lowering down power consumption. We may extend this
> >>> model for other states in future.
> >> I still have to wonder, between our snooze loop and stop0, what does
> >> stop0_lite buy us.
> >>
> >> That said, the problem you're solving here is a generic one that all
> >> stop states have, I think. Doesn't the same thing apply going from
> >> stop0 to stop5? You might under estimate the sleep time and lose power
> >> savings and therefore performance there too. Shouldn't we make it
> >> generic for all stop states?
> >>
> >> Thanks,
> >> Nick
> >>
> >>
> > When a cpu is in snooze, it takes both space and time of core. When in 
> > stop0_lite,
> > it free up time but it still takes space.
> 
> True, but snooze should only be taking less than 1% of front end
> cycles. I appreciate there is some non-zero difference here, I just
> wonder in practice what exactly we gain by it.

The idea behind implementing a lite-state was that on the future
platforms it can be made to wait on a flag and hence act as a
replacement for snooze. On POWER9 we don't have this feature.

The motivation

Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode

2019-05-15 Thread Daniel Axtens

Eric Biggers  writes:

> On Thu, May 16, 2019 at 12:12:48PM +1000, Daniel Axtens wrote:
>> 
>> I'm also seeing issues with ghash with the extended tests:
>> 
>> [7.582926] alg: hash: p8_ghash test failed (wrong result) on test vector 
>> 0, cfg="random: use_final src_divs=[9.72%@+39832, 
>> 18.2%@+65504, 45.57%@alignmask+18, 
>> 15.6%@+65496, 6.83%@+65514, 1.2%@+25, 
>> > 
>> It seems to happen when one of the source divisions has nosimd and the
>> final result uses the simd finaliser, so that's interesting.
>> 
>
> The bug is that p8_ghash uses different shash_descs for the SIMD and no-SIMD
> cases.  So if you start out doing the hash in SIMD context but then switch to
> no-SIMD context or vice versa, the digest will be wrong.  Note that there can 
> be
> an ->export() and ->import() in between, so it's not quite as obscure a case 
> as
> one might think.

Ah cool, I was just in the process of figuring this out for myself -
always lovely to have my theory confirmed!

> To fix it I think you'll need to make p8_ghash use 'struct ghash_desc_ctx' 
> just
> like ghash-generic so that the two code paths can share the same shash_desc.
> That's similar to what the various SHA hash algorithms do.

This is very helpful, thank you. I guess I will do that then.

Regards,
Daniel

>
> - Eric

Re: [PATCH] powerpc/book3s/mm: Clear MMU_FTR_HPTE_TABLE when radix is enabled.

2019-05-15 Thread Nicholas Piggin

Aneesh Kumar K.V's on May 14, 2019 4:02 pm:
> Avoids confusion when printing Oops message like below
> 
>  Faulting instruction address: 0xc008bdb4
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> 
> Either ibm,pa-features or ibm,powerpc-cpu-features can be used to enable the
> MMU features. We don't clear related MMU feature bits there. We use the kernel
> commandline to determine what translation mode we want to use and clear the
> HPTE or radix bit accordingly. On LPAR we do have to renable HASH bit if the
> hypervisor can't do radix.

Well we have the HPTE feature: the CPU supports hash MMU mode. It's
just the the kernel is booted in radix mode.

Could make a difference for KVM, if it will support an HPT guest or
not.

That's all highly theoretical and we have other inconsistencies
already in this stuff, I'd just like to try make things a bit better
in the long term.

Can we just add an early_radix_enabled() in the oops printing code
to select radix or hash MMU?

> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index b97aee03924f..0fa6cac3fe82 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -77,9 +77,6 @@ static struct page *maybe_pte_to_page(pte_t pte)
>  
>  static pte_t set_pte_filter_hash(pte_t pte)
>  {
> - if (radix_enabled())
> - return pte;
> -
>   pte = __pte(pte_val(pte) & ~_PAGE_HPTEFLAGS);
>   if (pte_looks_normal(pte) && !(cpu_has_feature(CPU_FTR_COHERENT_ICACHE) 
> ||
>  cpu_has_feature(CPU_FTR_NOEXECUTE))) {
> @@ -110,6 +107,8 @@ static pte_t set_pte_filter(pte_t pte)
>  
>   if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
>   return set_pte_filter_hash(pte);
> + else if (radix_enabled())
> + return pte;
>  
>   /* No exec permission in the first place, move on */
>   if (!pte_exec(pte) || !pte_looks_normal(pte))
> @@ -140,7 +139,7 @@ static pte_t set_access_flags_filter(pte_t pte, struct 
> vm_area_struct *vma,
>  {
>   struct page *pg;
>  
> - if (mmu_has_feature(MMU_FTR_HPTE_TABLE))
> + if (mmu_has_feature(MMU_FTR_HPTE_TABLE) || radix_enabled())
>   return pte;
>  
>   /* So here, we only care about exec faults, as we use them

These would still be good cleanup to make the HPTE_TABLE feature
independent from radix.

Thanks,
Nick

Re: [PATCH 0/1] Forced-wakeup for stop lite states on Powernv

2019-05-15 Thread Nicholas Piggin

Abhishek's on May 13, 2019 7:49 pm:
> On 05/08/2019 10:29 AM, Nicholas Piggin wrote:
>> Abhishek Goel's on April 22, 2019 4:32 pm:
>>> Currently, the cpuidle governors determine what idle state a idling CPU
>>> should enter into based on heuristics that depend on the idle history on
>>> that CPU. Given that no predictive heuristic is perfect, there are cases
>>> where the governor predicts a shallow idle state, hoping that the CPU will
>>> be busy soon. However, if no new workload is scheduled on that CPU in the
>>> near future, the CPU will end up in the shallow state.
>>>
>>> Motivation
>>> --
>>> In case of POWER, this is problematic, when the predicted state in the
>>> aforementioned scenario is a lite stop state, as such lite states will
>>> inhibit SMT folding, thereby depriving the other threads in the core from
>>> using the core resources.
>>>
>>> So we do not want to get stucked in such states for longer duration. To
>>> address this, the cpuidle-core can queue timer to correspond with the
>>> residency value of the next available state. This timer will forcefully
>>> wakeup the cpu. Few such iterations will essentially train the governor to
>>> select a deeper state for that cpu, as the timer here corresponds to the
>>> next available cpuidle state residency. Cpu will be kicked out of the lite
>>> state and end up in a non-lite state.
>>>
>>> Experiment
>>> --
>>> I performed experiments for three scenarios to collect some data.
>>>
>>> case 1 :
>>> Without this patch and without tick retained, i.e. in a upstream kernel,
>>> It would spend more than even a second to get out of stop0_lite.
>>>
>>> case 2 : With tick retained in a upstream kernel -
>>>
>>> Generally, we have a sched tick at 4ms(CONF_HZ = 250). Ideally I expected
>>> it to take 8 sched tick to get out of stop0_lite. Experimentally,
>>> observation was
>>>
>>> =
>>> sample  minmax   99percentile
>>> 20  4ms12ms  4ms
>>> =
>>>
>>> It would take atleast one sched tick to get out of stop0_lite.
>>>
>>> case 2 :  With this patch (not stopping tick, but explicitly queuing a
>>>timer)
>>>
>>> 
>>> sample  min max 99percentile
>>> 
>>> 20  144us   192us   144us
>>> 
>>>
>>> In this patch, we queue a timer just before entering into a stop0_lite
>>> state. The timer fires at (residency of next available state + exit latency
>>> of next available state * 2). Let's say if next state(stop0) is available
>>> which has residency of 20us, it should get out in as low as (20+2*2)*8
>>> [Based on the forumla (residency + 2xlatency)*history length] microseconds
>>> = 192us. Ideally we would expect 8 iterations, it was observed to get out
>>> in 6-7 iterations. Even if let's say stop2 is next available state(stop0
>>> and stop1 both are unavailable), it would take (100+2*10)*8 = 960us to get
>>> into stop2.
>>>
>>> So, We are able to get out of stop0_lite generally in 150us(with this
>>> patch) as compared to 4ms(with tick retained). As stated earlier, we do not
>>> want to get stuck into stop0_lite as it inhibits SMT folding for other
>>> sibling threads, depriving them of core resources. Current patch is using
>>> forced-wakeup only for stop0_lite, as it gives performance benefit(primary
>>> reason) along with lowering down power consumption. We may extend this
>>> model for other states in future.
>> I still have to wonder, between our snooze loop and stop0, what does
>> stop0_lite buy us.
>>
>> That said, the problem you're solving here is a generic one that all
>> stop states have, I think. Doesn't the same thing apply going from
>> stop0 to stop5? You might under estimate the sleep time and lose power
>> savings and therefore performance there too. Shouldn't we make it
>> generic for all stop states?
>>
>> Thanks,
>> Nick
>>
>>
> When a cpu is in snooze, it takes both space and time of core. When in 
> stop0_lite,
> it free up time but it still takes space.

True, but snooze should only be taking less than 1% of front end
cycles. I appreciate there is some non-zero difference here, I just
wonder in practice what exactly we gain by it.

We should always have fewer states unless proven otherwise.

That said, we enable it today so I don't want to argue this point
here, because it is a different issue from your patch.

> When it is in stop0 or deeper, 
> it free up both
> space and time slice of core.
> In stop0_lite, cpu doesn't free up the core resources and thus inhibits 
> thread
> folding. When a cpu goes to stop0, it will free up the core resources 
> thus increasing
> the single thread performance of other

Re: [PATCH] powerpc/64s: Make boot look nice(r)

2019-05-15 Thread Christophe Leroy





Le 16/05/2019 à 04:04, Nicholas Piggin a écrit :

Radix boot looks like this:

  -
  phys_mem_size = 0x2
  dcache_bsize  = 0x80
  icache_bsize  = 0x80
  cpu_features  = 0xc06f8f5fb1a7
possible= 0xfbffcf5fb1a7
always  = 0x0003800081a1
  cpu_user_features = 0xdc0065c2 0xaee0
  mmu_features  = 0xbc006041
  firmware_features = 0x1000
  hash-mmu: ppc64_pft_size= 0x0
  hash-mmu: kernel vmalloc start   = 0xc008
  hash-mmu: kernel IO start= 0xc00a
  hash-mmu: kernel vmemmap start   = 0xc00c
  -

Fix:

  -
  phys_mem_size = 0x2
  dcache_bsize  = 0x80
  icache_bsize  = 0x80
  cpu_features  = 0xc06f8f5fb1a7
possible= 0xfbffcf5fb1a7
always  = 0x0003800081a1
  cpu_user_features = 0xdc0065c2 0xaee0
  mmu_features  = 0xbc006041
  firmware_features = 0x1000
  vmalloc start = 0xc008
  IO start  = 0xc00a
  vmemmap start = 0xc00c
  -

Signed-off-by: Nicholas Piggin 


I fear your change defeats most of the purpose of commit 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20190515=e4dccf9092ab48a6f902003b9558c0e45d0e849a


As far as I understand, the main issue is the "hash-mmu:" prefix ?
That's due to the following define in top of book3s64/hash_utils.c:

#define pr_fmt(fmt) "hash-mmu: " fmt

Could we simply undef it just before print_system_hash_info() ?
Or move print_system_hash_info() in another book3s64 specific file which 
doesn't set pr_fmt ?


Christophe


---
  arch/powerpc/kernel/setup-common.c| 8 +++-
  arch/powerpc/mm/book3s64/hash_utils.c | 3 ---
  2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index aad9f5df6ab6..f2da8c809c85 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -810,9 +810,15 @@ static __init void print_system_info(void)
pr_info("mmu_features  = 0x%08x\n", cur_cpu_spec->mmu_features);
  #ifdef CONFIG_PPC64
pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features);
+#ifdef CONFIG_PPC_BOOK3S
+   pr_info("vmalloc start = 0x%lx\n", KERN_VIRT_START);
+   pr_info("IO start  = 0x%lx\n", KERN_IO_START);
+   pr_info("vmemmap start = 0x%lx\n", (unsigned long)vmemmap);
+#endif
  #endif
  
-	print_system_hash_info();

+   if (!early_radix_enabled())
+   print_system_hash_info();
  
  	if (PHYSICAL_START > 0)

pr_info("physical_start= 0x%llx\n",
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 919a861a8ec0..8b307b796b83 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1954,7 +1954,4 @@ void __init print_system_hash_info(void)
  
  	if (htab_hash_mask)

pr_info("htab_hash_mask= 0x%lx\n", htab_hash_mask);
-   pr_info("kernel vmalloc start   = 0x%lx\n", KERN_VIRT_START);
-   pr_info("kernel IO start= 0x%lx\n", KERN_IO_START);
-   pr_info("kernel vmemmap start   = 0x%lx\n", (unsigned long)vmemmap);
  }

Re: [PATCH v10 2/2] powerpc/64s: KVM update for reimplement book3s idle code in C

2019-05-15 Thread Nicholas Piggin

Paul Mackerras's on May 13, 2019 4:42 pm:
> On Sun, Apr 28, 2019 at 09:45:15PM +1000, Nicholas Piggin wrote:
>> This is the KVM update to the new idle code. A few improvements:
>> 
>> - Idle sleepers now always return to caller rather than branch out
>>   to KVM first.
>> - This allows optimisations like very fast return to caller when no
>>   state has been lost.
>> - KVM no longer requires nap_state_lost because it controls NVGPR
>>   save/restore itself on the way in and out.
>> - The heavy idle wakeup KVM request check can be moved out of the
>>   normal host idle code and into the not-performance-critical offline
>>   code.
>> - KVM nap code now returns from where it is called, which makes the
>>   flow a bit easier to follow.
> 
> One question below...
> 
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
>> b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index 58d0f1ba845d..f66191d8f841 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> ...
>> @@ -2656,6 +2662,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
>>  
>>  lis r3, LPCR_PECEDP@h   /* Do wake on privileged doorbell */
>>  
>> +/* Go back to host stack */
>> +ld  r1, HSTATE_HOST_R1(r13)
> 
> At this point we are in kvmppc_h_cede, which we branched to from
> hcall_try_real_mode, which came from the guest exit path, where we
> have already loaded r1 from HSTATE_HOST_R1(r13).  So if there is a
> path to get here with r1 not already set to HSTATE_HOST_R1(r13), then
> I missed it - please point it out to me.  Otherwise this statement
> seems superfluous.

I'm not sure why I put that there. I think you're right it could
be removed.

Thanks,
Nick

Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode

2019-05-15 Thread Eric Biggers

On Thu, May 16, 2019 at 12:12:48PM +1000, Daniel Axtens wrote:
> 
> I'm also seeing issues with ghash with the extended tests:
> 
> [7.582926] alg: hash: p8_ghash test failed (wrong result) on test vector 
> 0, cfg="random: use_final src_divs=[9.72%@+39832, 
> 18.2%@+65504, 45.57%@alignmask+18, 
> 15.6%@+65496, 6.83%@+65514, 1.2%@+25,  
> It seems to happen when one of the source divisions has nosimd and the
> final result uses the simd finaliser, so that's interesting.
> 

The bug is that p8_ghash uses different shash_descs for the SIMD and no-SIMD
cases.  So if you start out doing the hash in SIMD context but then switch to
no-SIMD context or vice versa, the digest will be wrong.  Note that there can be
an ->export() and ->import() in between, so it's not quite as obscure a case as
one might think.

To fix it I think you'll need to make p8_ghash use 'struct ghash_desc_ctx' just
like ghash-generic so that the two code paths can share the same shash_desc.
That's similar to what the various SHA hash algorithms do.

- Eric

[PATCH 2/3] powerpc/pseries: Disable PRRN memory device tree trigger

2019-05-15 Thread Tyrel Datwyler

Memory affintiy updates as currently implemented have proved unstable.

This patch comments out the PRRN hook for the time being while we
investigate how to either stablize the current implementation or find a
better approach.

Signed-off-by: Tyrel Datwyler 
---
 arch/powerpc/platforms/pseries/mobility.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 88925f8ca8a0..660a2dbc43d7 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,13 +242,15 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
 
 static void prrn_update_node(__be32 phandle)
 {
+   /* PRRN Memory Updates have proved unstable. Disable for the time being.
+*
struct pseries_hp_errorlog hp_elog;
struct device_node *dn;
 
-   /*
+*
 * If a node is found from a the given phandle, the phandle does not
 * represent the drc index of an LMB and we can ignore.
-*/
+*
dn = of_find_node_by_phandle(be32_to_cpu(phandle));
if (dn) {
of_node_put(dn);
@@ -261,6 +263,7 @@ static void prrn_update_node(__be32 phandle)
hp_elog._drc_u.drc_index = phandle;
 
handle_dlpar_errorlog(_elog);
+   */
 }
 
 int pseries_devicetree_update(s32 scope)
-- 
2.18.1

[PATCH 3/3] powerpc/pseries: Don't update cpu topology after PRRN event

2019-05-15 Thread Tyrel Datwyler

When we receive a PRRN event through the event-scan interface we call
pseries_devicetree_update() to update the affinty properties in our
device tree via RTAS. Following this our implemenation attempts to both
frob the existing kernel cpu numa affinities of the live system with the
new device tree properties while also performing a full cpu hotplug
readd of the affected cpus in resposne to the a OF property notifier
triggerd by the device tree update.

This patch does away with the topology update call to frob the
associativity since the DLPAR readd will put the cpu in the proper
numa node when its added back.

Signed-off-by: Tyrel Datwyler 
---
 arch/powerpc/kernel/rtasd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 8a1746d755c9..d3aa3a056d8e 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -285,7 +285,6 @@ static void handle_prrn_event(s32 scope)
 * the RTAS event.
 */
pseries_devicetree_update(-scope);
-   numa_update_cpu_topology(false);
 }
 
 static void handle_rtas_event(const struct rtas_error_log *log)
-- 
2.18.1

[PATCH 1/3] powerpc/pseries: Simplify cpu readd to use drc_index

2019-05-15 Thread Tyrel Datwyler

The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
the cpus device_node so that we can get at the ibm,my-drc-index
property. The only user of cpu readd is an OF notifier call back. This
call back already has a reference to the device_node and therefore can
retrieve the drc_index from the device_node.

This patch simplifies dlpar_cpu_readd() to take a drc_index directly and
does away with an uneccsary device_node lookup.

Signed-off-by: Tyrel Datwyler 
---
 arch/powerpc/include/asm/topology.h  |  2 +-
 arch/powerpc/mm/numa.c   |  6 +++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +-
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index f85e2b01c3df..c906d9ec9013 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -133,7 +133,7 @@ static inline void shared_proc_topology_init(void) {}
 #define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
 #define topology_core_id(cpu)  (cpu_to_core_id(cpu))
 
-int dlpar_cpu_readd(int cpu);
+int dlpar_cpu_readd(u32 drc_index);
 #endif
 #endif
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 57e64273cb33..40c0b6da12c2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1479,9 +1479,9 @@ static int dt_update_callback(struct notifier_block *nb,
case OF_RECONFIG_UPDATE_PROPERTY:
if (of_node_is_type(update->dn, "cpu") &&
!of_prop_cmp(update->prop->name, "ibm,associativity")) {
-   u32 core_id;
-   of_property_read_u32(update->dn, "reg", _id);
-   rc = dlpar_cpu_readd(core_id);
+   u32 drc_index;
+   of_property_read_u32(update->dn, "ibm,my-drc-index", 
_index);
+   rc = dlpar_cpu_readd(drc_index);
rc = NOTIFY_OK;
}
break;
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..2dfa9416ce54 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,18 +802,10 @@ static int dlpar_cpu_add_by_count(u32 cpus_to_add)
return rc;
 }
 
-int dlpar_cpu_readd(int cpu)
+int dlpar_cpu_readd(u32 drc_index)
 {
-   struct device_node *dn;
-   struct device *dev;
-   u32 drc_index;
int rc;
 
-   dev = get_cpu_device(cpu);
-   dn = dev->of_node;
-
-   rc = of_property_read_u32(dn, "ibm,my-drc-index", _index);
-
rc = dlpar_cpu_remove_by_index(drc_index);
if (!rc)
rc = dlpar_cpu_add(drc_index);
-- 
2.18.1

Re: [PATCH] crypto: talitos - fix skcipher failure due to wrong output IV

2019-05-15 Thread Eric Biggers

On Wed, May 15, 2019 at 08:49:48PM +0200, Christophe Leroy wrote:
> 
> 
> Le 15/05/2019 à 16:05, Horia Geanta a écrit :
> > On 5/15/2019 3:29 PM, Christophe Leroy wrote:
> > > Selftests report the following:
> > > 
> > > [2.984845] alg: skcipher: cbc-aes-talitos encryption test failed 
> > > (wrong output IV) on test vector 0, cfg="in-place"
> > > [2.995377] : 3d af ba 42 9d 9e b4 30 b4 22 da 80 2c 9f ac 41
> > > [3.032673] alg: skcipher: cbc-des-talitos encryption test failed 
> > > (wrong output IV) on test vector 0, cfg="in-place"
> > > [3.043185] : fe dc ba 98 76 54 32 10
> > > [3.063238] alg: skcipher: cbc-3des-talitos encryption test failed 
> > > (wrong output IV) on test vector 0, cfg="in-place"
> > > [3.073818] : 7d 33 88 93 0f 93 b2 42
> > > 
> > > This above dumps show that the actual output IV is indeed the input IV.
> > > This is due to the IV not being copied back into the request.
> > > 
> > > This patch fixes that.
> > > 
> > > Signed-off-by: Christophe Leroy 
> > Reviewed-by: Horia Geantă 
> 
> It's missing a Fixes: tag and a Cc: to stable.
> 
> I'll resend tomorrow.
> 
> > 
> > While here, could you please check ecb mode (which by definition does not 
> > have
> > an IV) is behaving correctly?
> > Looking in driver_algs[] list of crypto algorithms supported by talitos,
> > ecb(aes,des,3des) are declared with ivsize != 0.
> 
> According to /proc/crypto, test are passed for ecb.
> 

Did you try enabling CONFIG_CRYPTO_MANAGER_EXTRA_TESTS?  There is now a check
that the driver's ivsize matches the generic implementation's:

if (ivsize != crypto_skcipher_ivsize(generic_tfm)) {
pr_err("alg: skcipher: ivsize for %s (%u) doesn't match generic 
impl (%u)\n",
   driver, ivsize, crypto_skcipher_ivsize(generic_tfm));
err = -EINVAL;
goto out;
}

For ECB that means the ivsize must be 0.

AFAICS the talitos driver even accesses the IV for ECB, which is wrong; and the
only reason this isn't crashing the self-tests already is that they are confused
by the declared ivsize being nonzero so they don't pass NULL as they should.

- Eric

Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode

2019-05-15 Thread Daniel Axtens

Daniel Axtens  writes:

> Herbert Xu  writes:
>
>> On Wed, May 15, 2019 at 03:35:51AM +1000, Daniel Axtens wrote:
>>>
>>> By all means disable vmx ctr if I don't get an answer to you in a
>>> timeframe you are comfortable with, but I am going to at least try to
>>> have a look.
>>
>> I'm happy to give you guys more time.  How much time do you think
>> you will need?
>>
> Give me till the end of the week: if I haven't solved it by then I will
> probably have to give up and go on to other things anyway.

So as you've hopefully seen, I've nailed it down and posted a patch.
(http://patchwork.ozlabs.org/patch/1099934/)

I'm also seeing issues with ghash with the extended tests:

[7.582926] alg: hash: p8_ghash test failed (wrong result) on test vector 0, 
cfg="random: use_final src_divs=[9.72%@+39832, 
18.2%@+65504, 45.57%@alignmask+18, 
15.6%@+65496, 6.83%@+65514, 1.2%@+25, 
> (FWIW, it seems to happen when encoding greater than 4 but less than 8
> AES blocks - in particular with both 7 and 5 blocks encoded I can see it
> go wrong from block 4 onwards. No idea why yet, and the asm is pretty
> dense, but that's where I'm at at the moment.)
>
> Regards,
> Daniel
>
>> Thanks,
>> -- 
>> Email: Herbert Xu 
>> Home Page: http://gondor.apana.org.au/~herbert/
>> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[PATCH] powerpc/64s: Make boot look nice(r)

2019-05-15 Thread Nicholas Piggin

Radix boot looks like this:

 -
 phys_mem_size = 0x2
 dcache_bsize  = 0x80
 icache_bsize  = 0x80
 cpu_features  = 0xc06f8f5fb1a7
   possible= 0xfbffcf5fb1a7
   always  = 0x0003800081a1
 cpu_user_features = 0xdc0065c2 0xaee0
 mmu_features  = 0xbc006041
 firmware_features = 0x1000
 hash-mmu: ppc64_pft_size= 0x0
 hash-mmu: kernel vmalloc start   = 0xc008
 hash-mmu: kernel IO start= 0xc00a
 hash-mmu: kernel vmemmap start   = 0xc00c
 -

Fix:

 -
 phys_mem_size = 0x2
 dcache_bsize  = 0x80
 icache_bsize  = 0x80
 cpu_features  = 0xc06f8f5fb1a7
   possible= 0xfbffcf5fb1a7
   always  = 0x0003800081a1
 cpu_user_features = 0xdc0065c2 0xaee0
 mmu_features  = 0xbc006041
 firmware_features = 0x1000
 vmalloc start = 0xc008
 IO start  = 0xc00a
 vmemmap start = 0xc00c
 -

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup-common.c| 8 +++-
 arch/powerpc/mm/book3s64/hash_utils.c | 3 ---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index aad9f5df6ab6..f2da8c809c85 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -810,9 +810,15 @@ static __init void print_system_info(void)
pr_info("mmu_features  = 0x%08x\n", cur_cpu_spec->mmu_features);
 #ifdef CONFIG_PPC64
pr_info("firmware_features = 0x%016lx\n", powerpc_firmware_features);
+#ifdef CONFIG_PPC_BOOK3S
+   pr_info("vmalloc start = 0x%lx\n", KERN_VIRT_START);
+   pr_info("IO start  = 0x%lx\n", KERN_IO_START);
+   pr_info("vmemmap start = 0x%lx\n", (unsigned long)vmemmap);
+#endif
 #endif
 
-   print_system_hash_info();
+   if (!early_radix_enabled())
+   print_system_hash_info();
 
if (PHYSICAL_START > 0)
pr_info("physical_start= 0x%llx\n",
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 919a861a8ec0..8b307b796b83 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1954,7 +1954,4 @@ void __init print_system_hash_info(void)
 
if (htab_hash_mask)
pr_info("htab_hash_mask= 0x%lx\n", htab_hash_mask);
-   pr_info("kernel vmalloc start   = 0x%lx\n", KERN_VIRT_START);
-   pr_info("kernel IO start= 0x%lx\n", KERN_IO_START);
-   pr_info("kernel vmemmap start   = 0x%lx\n", (unsigned long)vmemmap);
 }
-- 
2.20.1

Re: [v4 PATCH 1/2] [PowerPC] Add simd.h implementation

2019-05-15 Thread Shawn Landden

On Wed, May 15, 2019 at 1:27 AM Christophe Leroy
 wrote:
> Could you please as usual list here the changes provided by each version
> to ease the review ?
A bunch of embarrassing stuff that caused it not to build on some
set-ups (the functions were under the wrong include guards), and I
added include guards on simd.h so that you can use may_use_simd() even
if you don't have the FPU enabled (ARM's simd.h does this).

Re: [RFC PATCH] powerpc/mm: Implement STRICT_MODULE_RWX

2019-05-15 Thread Russell Currey

On Wed, 2019-05-15 at 06:20 +, Christophe Leroy wrote:
> Strict module RWX is just like strict kernel RWX, but for modules -
> so
> loadable modules aren't marked both writable and executable at the
> same
> time.  This is handled by the generic code in kernel/module.c, and
> simply requires the architecture to implement the set_memory() set of
> functions, declared with ARCH_HAS_SET_MEMORY.
> 
> There's nothing other than these functions required to turn
> ARCH_HAS_STRICT_MODULE_RWX on, so turn that on too.
> 
> With STRICT_MODULE_RWX enabled, there are as many W+X pages at
> runtime
> as there are with CONFIG_MODULES=n (none), so in Russel's testing it
> works
> well on both Hash and Radix book3s64.
> 
> There's a TODO in the code for also applying the page permission
> changes
> to the backing pages in the linear mapping: this is pretty simple for
> Radix and (seemingly) a lot harder for Hash, so I've left it for now
> since there's still a notable security benefit for the patch as-is.
> 
> Technically can be enabled without STRICT_KERNEL_RWX, but
> that doesn't gets you a whole lot, so we should leave it off by
> default
> until we can get STRICT_KERNEL_RWX to the point where it's enabled by
> default.
> 
> Signed-off-by: Russell Currey 
> Signed-off-by: Christophe Leroy 
> ---

Thanks for this, I figured you'd know how to make this work on 32bit
too.  I'll test on my end today.

Note that there are two Ls in my name!  To quote the great Rusty, "This
Russel disease must be stamped out before it becomes widespread".

Re: [PATCH] powerpc/mm/book3s64: Implement STRICT_MODULE_RWX

2019-05-15 Thread Russell Currey

On Tue, 2019-05-14 at 23:41 -0700, Christoph Hellwig wrote:
> > + * This program is free software; you can redistribute it and/or
> > modify it
> > + * under the terms of the GNU General Public License as published
> > by the
> > + * Free Software Foundation; either version 2 of the License, or
> > (at your
> > + * option) any later version.
> 
> This license boilerplate should not be added together with an SPDX
> tag.
> 
> > +// we need this to have a single pointer to pass into
> > apply_to_page_range()
> 
> Please use normal /* - */ style comments.

I was under the impression they're allowed (in powerpc at least, if not
the wider kernel nowadays) but happy to defer on this.

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Aleksa Sarai

On 2019-05-15, Christian Brauner  wrote:
> On Wed, May 15, 2019 at 04:00:20PM +0200, Yann Droneaud wrote:
> > Would it be possible to create file descriptor with "restricted"
> > operation ?
> > 
> > - O_RDONLY: waiting for process completion allowed (for example)
> > - O_WRONLY: sending process signal allowed
> 
> Yes, something like this is likely going to be possible in the future.
> We had discussion around this. But mapping this to O_RDONLY and O_WRONLY
> is not the right model. It makes more sense to have specialized flags
> that restrict actions.

Not to mention that the O_* flags have silly values which we shouldn't
replicate in new syscalls IMHO.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Yann Droneaud

Hi,

Le mercredi 15 mai 2019 à 12:03 +0200, Christian Brauner a écrit :
> 
> diff --git a/kernel/pid.c b/kernel/pid.c
> index 20881598bdfa..237d18d6ecb8 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -451,6 +452,53 @@ struct pid *find_ge_pid(int nr, struct
> pid_namespace *ns)
>   return idr_get_next(>idr, );
>  }
>  
> +/**
> + * pidfd_open() - Open new pid file descriptor.
> + *
> + * @pid:   pid for which to retrieve a pidfd
> + * @flags: flags to pass
> + *
> + * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> + * the process identified by @pid. Currently, the process identified by
> + * @pid must be a thread-group leader. This restriction currently exists
> + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> + * be used with CLONE_THREAD) and pidfd polling (only supports thread group
> + * leaders).
> + *

Would it be possible to create file descriptor with "restricted"
operation ?

- O_RDONLY: waiting for process completion allowed (for example)
- O_WRONLY: sending process signal allowed

For example, a process could send over a Unix socket a process a pidfd,
allowing this to only wait for completion, but not sending signal ?

I see the permission check is not done in pidfd_open(), so what prevent
a user from sending a signal to another user owned process ?

If it's in pidfd_send_signal(), then, passing the socket through
SCM_RIGHT won't be useful if the target process is not owned by the
same user, or root.

> + * Return: On success, a cloexec pidfd is returned.
> + * On error, a negative errno number will be returned.
> + */
> +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> +{
> + int fd, ret;
> + struct pid *p;
> + struct task_struct *tsk;
> +
> + if (flags)
> + return -EINVAL;
> +
> + if (pid <= 0)
> + return -EINVAL;
> +
> + p = find_get_pid(pid);
> + if (!p)
> + return -ESRCH;
> +
> + rcu_read_lock();
> + tsk = pid_task(p, PIDTYPE_PID);
> + if (!tsk)
> + ret = -ESRCH;
> + else if (unlikely(!thread_group_leader(tsk)))
> + ret = -EINVAL;
> + else
> + ret = 0;
> + rcu_read_unlock();
> +
> + fd = ret ?: pidfd_create(p);
> + put_pid(p);
> + return fd;
> +}
> +
>  void __init pid_idr_init(void)
>  {
>   /* Verify no one has done anything silly: */

Regards.

-- 
Yann Droneaud
OPTEYA

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Yann Droneaud

Hi,

Le mercredi 15 mai 2019 à 16:16 +0200, Christian Brauner a écrit :
> On Wed, May 15, 2019 at 04:00:20PM +0200, Yann Droneaud wrote:
> > Le mercredi 15 mai 2019 à 12:03 +0200, Christian Brauner a écrit :
> > > diff --git a/kernel/pid.c b/kernel/pid.c
> > > index 20881598bdfa..237d18d6ecb8 100644
> > > --- a/kernel/pid.c
> > > +++ b/kernel/pid.c
> > > @@ -451,6 +452,53 @@ struct pid *find_ge_pid(int nr, struct
> > > pid_namespace *ns)
> > >   return idr_get_next(>idr, );
> > >  }
> > >  
> > > +/**
> > > + * pidfd_open() - Open new pid file descriptor.
> > > + *
> > > + * @pid:   pid for which to retrieve a pidfd
> > > + * @flags: flags to pass
> > > + *
> > > + * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> > > + * the process identified by @pid. Currently, the process identified by
> > > + * @pid must be a thread-group leader. This restriction currently exists
> > > + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> > > + * be used with CLONE_THREAD) and pidfd polling (only supports thread 
> > > group
> > > + * leaders).
> > > + *
> > 
> > Would it be possible to create file descriptor with "restricted"
> > operation ?
> > 
> > - O_RDONLY: waiting for process completion allowed (for example)
> > - O_WRONLY: sending process signal allowed
> 
> Yes, something like this is likely going to be possible in the future.
> We had discussion around this. But mapping this to O_RDONLY and O_WRONLY
> is not the right model. It makes more sense to have specialized flags
> that restrict actions.

Yes, dedicated flags are the way to go. I've used the old open() flags
here as examples as an echo of the O_CLOEXEC flag used in the comment.

> > For example, a process could send over a Unix socket a process a pidfd,
> > allowing this to only wait for completion, but not sending signal ?
> > 
> > I see the permission check is not done in pidfd_open(), so what prevent
> > a user from sending a signal to another user owned process ?
> 
> That's supposed to be possible. You can do the same right now already
> with pids. Tools like LMK need this probably very much.
> Permission checking for signals is done at send time right now.
> And if you can't signal via a pid you can't signal via a pidfd as
> they're both subject to the same permissions checks.
> 

I would have expect it to behave like most other file descriptor,
permission check done at opening time, which allow more privileged
process to open the file descriptor, then pass it to a less privileged
process, or change its own privileged with setuid() and such. Then the
less privileged process can act on behalf of the privileged process
through the file descriptor.

> > If it's in pidfd_send_signal(), then, passing the socket through
> > SCM_RIGHT won't be useful if the target process is not owned by the
> > same user, or root.
> > 

If the permission check is done at sending time, the scenario above
case cannot be implemented.

Sending a pidfd through SCM_RIGHT is only useful if the receiver
process is equally or more privileged than the sender then.

For isolation purpose, I would have expect to be able to give a right
to send a signal to a highly privileged process a specific less
privileged process though Unix socket.

But I can't come up with a specific use case. So I dunno.

Regards.

-- 
Yann Droneaud
OPTEYA

Re: [PATCH] crypto: talitos - fix skcipher failure due to wrong output IV

2019-05-15 Thread Christophe Leroy





Le 15/05/2019 à 16:05, Horia Geanta a écrit :

On 5/15/2019 3:29 PM, Christophe Leroy wrote:

Selftests report the following:

[2.984845] alg: skcipher: cbc-aes-talitos encryption test failed (wrong output IV) on 
test vector 0, cfg="in-place"
[2.995377] : 3d af ba 42 9d 9e b4 30 b4 22 da 80 2c 9f ac 41
[3.032673] alg: skcipher: cbc-des-talitos encryption test failed (wrong output IV) on 
test vector 0, cfg="in-place"
[3.043185] : fe dc ba 98 76 54 32 10
[3.063238] alg: skcipher: cbc-3des-talitos encryption test failed (wrong output IV) 
on test vector 0, cfg="in-place"
[3.073818] : 7d 33 88 93 0f 93 b2 42

This above dumps show that the actual output IV is indeed the input IV.
This is due to the IV not being copied back into the request.

This patch fixes that.

Signed-off-by: Christophe Leroy 

Reviewed-by: Horia Geantă 


It's missing a Fixes: tag and a Cc: to stable.

I'll resend tomorrow.



While here, could you please check ecb mode (which by definition does not have
an IV) is behaving correctly?
Looking in driver_algs[] list of crypto algorithms supported by talitos,
ecb(aes,des,3des) are declared with ivsize != 0.


According to /proc/crypto, test are passed for ecb.

Christophe



Thanks,
Horia

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Christian Brauner

On Wed, May 15, 2019 at 05:35:15PM +0200, Oleg Nesterov wrote:
> On 05/15, Oleg Nesterov wrote:
> >
> > On 05/15, Christian Brauner wrote:
> > >
> > > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > > +{
> > > + int fd, ret;
> > > + struct pid *p;
> > > + struct task_struct *tsk;
> > > +
> > > + if (flags)
> > > + return -EINVAL;
> > > +
> > > + if (pid <= 0)
> > > + return -EINVAL;
> > > +
> > > + p = find_get_pid(pid);
> > > + if (!p)
> > > + return -ESRCH;
> > > +
> > > + rcu_read_lock();
> > > + tsk = pid_task(p, PIDTYPE_PID);
> >
> > You do not need find_get_pid() before rcu_lock and put_pid() at the end.
> > You can just do find_vpid() under rcu_read_lock().
> 
> Ah, sorry. Somehow I forgot you need to call pidfd_create(pid), you can't
> do this under rcu_read_lock().
> 
> So I was wrong, you can't avoid get/put_pid.

Yeah, I haven't made any changes yet.

Christian

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Oleg Nesterov

On 05/15, Oleg Nesterov wrote:
>
> On 05/15, Christian Brauner wrote:
> >
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +   int fd, ret;
> > +   struct pid *p;
> > +   struct task_struct *tsk;
> > +
> > +   if (flags)
> > +   return -EINVAL;
> > +
> > +   if (pid <= 0)
> > +   return -EINVAL;
> > +
> > +   p = find_get_pid(pid);
> > +   if (!p)
> > +   return -ESRCH;
> > +
> > +   rcu_read_lock();
> > +   tsk = pid_task(p, PIDTYPE_PID);
>
> You do not need find_get_pid() before rcu_lock and put_pid() at the end.
> You can just do find_vpid() under rcu_read_lock().

Ah, sorry. Somehow I forgot you need to call pidfd_create(pid), you can't
do this under rcu_read_lock().

So I was wrong, you can't avoid get/put_pid.

Oleg.

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Christian Brauner

On Wed, May 15, 2019 at 05:19:13PM +0200, Oleg Nesterov wrote:
> On 05/15, Christian Brauner wrote:
> >
> > On Wed, May 15, 2019 at 04:38:58PM +0200, Oleg Nesterov wrote:
> > >
> > > it seems that you can do a single check
> > >
> > >   tsk = pid_task(p, PIDTYPE_TGID);
> > >   if (!tsk)
> > >   ret = -ESRCH;
> > >
> > > this even looks more correct if we race with exec changing the leader.
> >
> > The logic here being that you can only reach the thread_group leader
> > from struct pid if PIDTYPE_PID == PIDTYPE_TGID for this struct pid?
> 
> Not exactly... it is not that PIDTYPE_PID == PIDTYPE_TGID for this pid,
> struct pid has no "type" or something like this.
> 
> The logic is that pid->tasks[PIDTYPE_XXX] is the list of task which use
> this pid as "XXX" type.
> 
> For example, clone(CLONE_THREAD) creates a pid which has a single non-
> empty list, pid->tasks[PIDTYPE_PID]. This pid can't be used as TGID or
> SID.
> 
> So if pid_task(PIDTYPE_TGID) returns non-NULL we know that this pid was
> used for a group-leader, see copy_process() which does

Ah, this was what I was asking myself when I worked on thread-specific
signal sending. This clarifies quite a lot of things!

Though I wonder how one reliably gets a the PGID or SID from a
PIDTYPE_PID.

> 
>   if (thread_group_leader(p))
>   attach_pid(p, PIDTYPE_TGID);
> 
> 
> If we race with exec which changes the leader pid_task(TGID) can return
> the old leader. We do not care, but this means that we should not check
> thread_group_leader().

Nice!

Thank you, Oleg! :)
Christian

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Oleg Nesterov

On 05/15, Christian Brauner wrote:
>
> On Wed, May 15, 2019 at 04:38:58PM +0200, Oleg Nesterov wrote:
> >
> > it seems that you can do a single check
> >
> > tsk = pid_task(p, PIDTYPE_TGID);
> > if (!tsk)
> > ret = -ESRCH;
> >
> > this even looks more correct if we race with exec changing the leader.
>
> The logic here being that you can only reach the thread_group leader
> from struct pid if PIDTYPE_PID == PIDTYPE_TGID for this struct pid?

Not exactly... it is not that PIDTYPE_PID == PIDTYPE_TGID for this pid,
struct pid has no "type" or something like this.

The logic is that pid->tasks[PIDTYPE_XXX] is the list of task which use
this pid as "XXX" type.

For example, clone(CLONE_THREAD) creates a pid which has a single non-
empty list, pid->tasks[PIDTYPE_PID]. This pid can't be used as TGID or
SID.

So if pid_task(PIDTYPE_TGID) returns non-NULL we know that this pid was
used for a group-leader, see copy_process() which does

if (thread_group_leader(p))
attach_pid(p, PIDTYPE_TGID);

If we race with exec which changes the leader pid_task(TGID) can return
the old leader. We do not care, but this means that we should not check
thread_group_leader().

Oleg.

Re: [PATCH stable 4.4] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Christophe Leroy





Le 15/05/2019 à 16:16, Greg KH a écrit :

On Wed, May 15, 2019 at 01:30:42PM +, Christophe Leroy wrote:

[Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]

On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
  * Note that the kernel may be running at an address which is different
  * from the address that it was linked at, so we must use RELOC/PTRRELOC
  * to access static data (including strings).  -- paulus

Therefore init_mem_is_free must be accessed with PTRRELOC()

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
Signed-off-by: Christophe Leroy 

---
Can't apply the upstream commit as such due to several other unrelated stuff
like for instance STRICT_KERNEL_RWX which are missing.
So instead, using same approach as for commit 
252eb55816a6f69ef9464cad303cdb3326cdc61d

Removed the Fixes: tag as I don't know yet the commit Id of the fixed commit on 
4.4 branch.
---
  arch/powerpc/lib/code-patching.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Now added, thanks.



Thanks,

However you took the commit log from the upstream commit, which doesn't 
corresponds exactly to the change being done here and described in the 
backport patch


Christophe

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Christian Brauner

On Wed, May 15, 2019 at 04:38:58PM +0200, Oleg Nesterov wrote:
> On 05/15, Christian Brauner wrote:
> >
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +   int fd, ret;
> > +   struct pid *p;
> > +   struct task_struct *tsk;
> > +
> > +   if (flags)
> > +   return -EINVAL;
> > +
> > +   if (pid <= 0)
> > +   return -EINVAL;
> > +
> > +   p = find_get_pid(pid);
> > +   if (!p)
> > +   return -ESRCH;
> > +
> > +   rcu_read_lock();
> > +   tsk = pid_task(p, PIDTYPE_PID);
> 
> You do not need find_get_pid() before rcu_lock and put_pid() at the end.
> You can just do find_vpid() under rcu_read_lock().

Will do.

> 
> > +   if (!tsk)
> > +   ret = -ESRCH;
> > +   else if (unlikely(!thread_group_leader(tsk)))
> > +   ret = -EINVAL;
> 
> it seems that you can do a single check
> 
>   tsk = pid_task(p, PIDTYPE_TGID);
>   if (!tsk)
>   ret = -ESRCH;
> 
> this even looks more correct if we race with exec changing the leader.

The logic here being that you can only reach the thread_group leader
from struct pid if PIDTYPE_PID == PIDTYPE_TGID for this struct pid?

Thanks, Oleg.
Christian

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Oleg Nesterov

On 05/15, Christian Brauner wrote:
>
> +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> +{
> + int fd, ret;
> + struct pid *p;
> + struct task_struct *tsk;
> +
> + if (flags)
> + return -EINVAL;
> +
> + if (pid <= 0)
> + return -EINVAL;
> +
> + p = find_get_pid(pid);
> + if (!p)
> + return -ESRCH;
> +
> + rcu_read_lock();
> + tsk = pid_task(p, PIDTYPE_PID);

You do not need find_get_pid() before rcu_lock and put_pid() at the end.
You can just do find_vpid() under rcu_read_lock().

> + if (!tsk)
> + ret = -ESRCH;
> + else if (unlikely(!thread_group_leader(tsk)))
> + ret = -EINVAL;

it seems that you can do a single check

tsk = pid_task(p, PIDTYPE_TGID);
if (!tsk)
ret = -ESRCH;

this even looks more correct if we race with exec changing the leader.

Oleg.

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Christian Brauner

On Wed, May 15, 2019 at 04:00:20PM +0200, Yann Droneaud wrote:
> Hi,
> 
> Le mercredi 15 mai 2019 à 12:03 +0200, Christian Brauner a écrit :
> > 
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index 20881598bdfa..237d18d6ecb8 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -451,6 +452,53 @@ struct pid *find_ge_pid(int nr, struct
> > pid_namespace *ns)
> > return idr_get_next(>idr, );
> >  }
> >  
> > +/**
> > + * pidfd_open() - Open new pid file descriptor.
> > + *
> > + * @pid:   pid for which to retrieve a pidfd
> > + * @flags: flags to pass
> > + *
> > + * This creates a new pid file descriptor with the O_CLOEXEC flag set for
> > + * the process identified by @pid. Currently, the process identified by
> > + * @pid must be a thread-group leader. This restriction currently exists
> > + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
> > + * be used with CLONE_THREAD) and pidfd polling (only supports thread group
> > + * leaders).
> > + *
> 
> Would it be possible to create file descriptor with "restricted"
> operation ?
> 
> - O_RDONLY: waiting for process completion allowed (for example)
> - O_WRONLY: sending process signal allowed

Yes, something like this is likely going to be possible in the future.
We had discussion around this. But mapping this to O_RDONLY and O_WRONLY
is not the right model. It makes more sense to have specialized flags
that restrict actions.

> 
> For example, a process could send over a Unix socket a process a pidfd,
> allowing this to only wait for completion, but not sending signal ?
> 
> I see the permission check is not done in pidfd_open(), so what prevent
> a user from sending a signal to another user owned process ?

That's supposed to be possible. You can do the same right now already
with pids. Tools like LMK need this probably very much.
Permission checking for signals is done at send time right now.
And if you can't signal via a pid you can't signal via a pidfd as
they're both subject to the same permissions checks.

> 
> If it's in pidfd_send_signal(), then, passing the socket through
> SCM_RIGHT won't be useful if the target process is not owned by the
> same user, or root.
> 
> > + * Return: On success, a cloexec pidfd is returned.
> > + * On error, a negative errno number will be returned.
> > + */
> > +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
> > +{
> > +   int fd, ret;
> > +   struct pid *p;
> > +   struct task_struct *tsk;
> > +
> > +   if (flags)
> > +   return -EINVAL;
> > +
> > +   if (pid <= 0)
> > +   return -EINVAL;
> > +
> > +   p = find_get_pid(pid);
> > +   if (!p)
> > +   return -ESRCH;
> > +
> > +   rcu_read_lock();
> > +   tsk = pid_task(p, PIDTYPE_PID);
> > +   if (!tsk)
> > +   ret = -ESRCH;
> > +   else if (unlikely(!thread_group_leader(tsk)))
> > +   ret = -EINVAL;
> > +   else
> > +   ret = 0;
> > +   rcu_read_unlock();
> > +
> > +   fd = ret ?: pidfd_create(p);
> > +   put_pid(p);
> > +   return fd;
> > +}
> > +
> >  void __init pid_idr_init(void)
> >  {
> > /* Verify no one has done anything silly: */
> 
> Regards.
> 
> -- 
> Yann Droneaud
> OPTEYA
> 
>

Re: [PATCH stable 4.4] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Greg KH

On Wed, May 15, 2019 at 01:30:42PM +, Christophe Leroy wrote:
> [Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]
> 
> On powerpc32, patch_instruction() is called by apply_feature_fixups()
> which is called from early_init()
> 
> There is the following note in front of early_init():
>  * Note that the kernel may be running at an address which is different
>  * from the address that it was linked at, so we must use RELOC/PTRRELOC
>  * to access static data (including strings).  -- paulus
> 
> Therefore init_mem_is_free must be accessed with PTRRELOC()
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
> Signed-off-by: Christophe Leroy 
> 
> ---
> Can't apply the upstream commit as such due to several other unrelated stuff
> like for instance STRICT_KERNEL_RWX which are missing.
> So instead, using same approach as for commit 
> 252eb55816a6f69ef9464cad303cdb3326cdc61d
> 
> Removed the Fixes: tag as I don't know yet the commit Id of the fixed commit 
> on 4.4 branch.
> ---
>  arch/powerpc/lib/code-patching.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Now added, thanks.

greg k-h

Re: [PATCH] crypto: talitos - fix skcipher failure due to wrong output IV

2019-05-15 Thread Horia Geanta

On 5/15/2019 3:29 PM, Christophe Leroy wrote:
> Selftests report the following:
> 
> [2.984845] alg: skcipher: cbc-aes-talitos encryption test failed (wrong 
> output IV) on test vector 0, cfg="in-place"
> [2.995377] : 3d af ba 42 9d 9e b4 30 b4 22 da 80 2c 9f ac 41
> [3.032673] alg: skcipher: cbc-des-talitos encryption test failed (wrong 
> output IV) on test vector 0, cfg="in-place"
> [3.043185] : fe dc ba 98 76 54 32 10
> [3.063238] alg: skcipher: cbc-3des-talitos encryption test failed (wrong 
> output IV) on test vector 0, cfg="in-place"
> [3.073818] : 7d 33 88 93 0f 93 b2 42
> 
> This above dumps show that the actual output IV is indeed the input IV.
> This is due to the IV not being copied back into the request.
> 
> This patch fixes that.
> 
> Signed-off-by: Christophe Leroy 
Reviewed-by: Horia Geantă 

While here, could you please check ecb mode (which by definition does not have
an IV) is behaving correctly?
Looking in driver_algs[] list of crypto algorithms supported by talitos,
ecb(aes,des,3des) are declared with ivsize != 0.

Thanks,
Horia

[PATCH v2] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-05-15 Thread Mathieu Malaterre

In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:

   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);

Since GCC 8 this triggers the following warning about incompatible
function types:

  arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible 
function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void 
*)' [-Werror=cast-function-type]

Since the warning is there for a reason, and should not be hidden behind
a cast, provide an intermediate callback function to avoid the warning.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Suggested-by: Christoph Hellwig 
Cc: Michael Neuling 
Signed-off-by: Mathieu Malaterre 
---
v2: do not hide warning using a hack

 arch/powerpc/kernel/hw_breakpoint.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index f70fb89dbf60..969092d84a2f 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -386,6 +386,11 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
 bool dawr_force_enable;
 EXPORT_SYMBOL_GPL(dawr_force_enable);
 
+static void set_dawr_cb(void *info)
+{
+   set_dawr(info);
+}
+
 static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -405,7 +410,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
-   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
+   smp_call_function(set_dawr_cb, _brk, 0);
 
return rc;
 }
-- 
2.20.1

Re: [PATCH] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-05-15 Thread Mathieu Malaterre

Hi Christoph,

On Wed, May 15, 2019 at 3:14 PM Christoph Hellwig  wrote:
>
> On Wed, May 15, 2019 at 02:09:42PM +0200, Mathieu Malaterre wrote:
> > In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
> > option") the following piece of code was added:
> >
> >smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
> >
> > Since GCC 8 this trigger the following warning about incompatible
> > function types:
>
> And the warning is there for a reason, and should not be hidden
> behind a cast.  This should instead be fixed by something like this:


OK, thanks for the quick feedback, will send a v2 asap.

> diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
> b/arch/powerpc/kernel/hw_breakpoint.c
> index da307dd93ee3..a26b67a1be83 100644
> --- a/arch/powerpc/kernel/hw_breakpoint.c
> +++ b/arch/powerpc/kernel/hw_breakpoint.c
> @@ -384,6 +384,12 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
>  bool dawr_force_enable;
>  EXPORT_SYMBOL_GPL(dawr_force_enable);
>
> +
> +static void set_dawr_cb(void *info)
> +{
> +   set_dawr(info);
> +}
> +
>  static ssize_t dawr_write_file_bool(struct file *file,
> const char __user *user_buf,
> size_t count, loff_t *ppos)
> @@ -403,7 +409,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
>
> /* If we are clearing, make sure all CPUs have the DAWR cleared */
> if (!dawr_force_enable)
> -   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
> +   smp_call_function(set_dawr_cb, _brk, 0);
>
> return rc;
>  }

[Bug 203609] New: Build error: implicit declaration of function 'cpu_mitigations_off'

2019-05-15 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=203609

Bug ID: 203609
   Summary: Build error: implicit declaration of function
'cpu_mitigations_off'
   Product: Platform Specific/Hardware
   Version: 2.5
Kernel Version: 4.19.43 and 4.14.119
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: PPC-64
  Assignee: platform_ppc...@kernel-bugs.osdl.org
  Reporter: ja...@bluehome.net
Regression: No

Created attachment 282765
  --> https://bugzilla.kernel.org/attachment.cgi?id=282765=edit
Build log

This just showed up in 4.19.43 and 4.14.119. 4.19.42 and 4.14.118 were fine.
I'm building with GCC 8.3 for ppc64el. 4.19.43 and 4.14.119 also build fine for
32- and 64-bit x86.

arch/powerpc/kernel/security.c: In function 'setup_barrier_nospec':
arch/powerpc/kernel/security.c:59:21: error: implicit declaration of function
'cpu_mitigations_off' [-Werror=implicit-function-declaration]
  if (!no_nospec && !cpu_mitigations_off())
 ^~~
cc1: all warnings being treated as errors
scripts/Makefile.build:303: recipe for target 'arch/powerpc/kernel/security.o'
failed
make[1]: *** [arch/powerpc/kernel/security.o] Error 1
Makefile:1051: recipe for target 'arch/powerpc/kernel' failed
make: *** [arch/powerpc/kernel] Error 2

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH stable 4.9] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Christophe Leroy





Le 15/05/2019 à 15:08, Greg KH a écrit :

On Wed, May 15, 2019 at 02:35:36PM +0200, Christophe Leroy wrote:



Le 15/05/2019 à 10:29, Greg KH a écrit :

On Wed, May 15, 2019 at 06:40:47AM +, Christophe Leroy wrote:

[Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]

On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
   * Note that the kernel may be running at an address which is different
   * from the address that it was linked at, so we must use RELOC/PTRRELOC
   * to access static data (including strings).  -- paulus

Therefore init_mem_is_free must be accessed with PTRRELOC()

Fixes: 1c38a84d4586 ("powerpc: Avoid code patching freed init sections")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
Signed-off-by: Christophe Leroy 

---
Can't apply the upstream commit as such due to several other unrelated stuff
like for instance STRICT_KERNEL_RWX which are missing.
So instead, using same approach as for commit 
252eb55816a6f69ef9464cad303cdb3326cdc61d


Now queued up, thanks.



Should go to 4.4 as well since the commit it fixes is now queued for 4.4
([PATCH 4.4 056/266] powerpc: Avoid code patching freed init sections)


Ok, can someone send me a backport that actually applies there?



Done

Christophe

[PATCH stable 4.4] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Christophe Leroy

[Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]

On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
 * Note that the kernel may be running at an address which is different
 * from the address that it was linked at, so we must use RELOC/PTRRELOC
 * to access static data (including strings).  -- paulus

Therefore init_mem_is_free must be accessed with PTRRELOC()

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
Signed-off-by: Christophe Leroy 

---
Can't apply the upstream commit as such due to several other unrelated stuff
like for instance STRICT_KERNEL_RWX which are missing.
So instead, using same approach as for commit 
252eb55816a6f69ef9464cad303cdb3326cdc61d

Removed the Fixes: tag as I don't know yet the commit Id of the fixed commit on 
4.4 branch.
---
 arch/powerpc/lib/code-patching.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2604192c0719..65ce778aee46 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -28,7 +28,7 @@ int patch_instruction(unsigned int *addr, unsigned int instr)
int err;
 
/* Make sure we aren't patching a freed init section */
-   if (init_mem_is_free && is_init(addr)) {
+   if (*PTRRELOC(_mem_is_free) && is_init(addr)) {
pr_debug("Skipping init section patching addr: 0x%px\n", addr);
return 0;
}
-- 
2.13.3

[RFC PATCH 5/5] powerpc/64s/radix: iomap use huge page mappings

2019-05-15 Thread Nicholas Piggin

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/pgtable.h |  8 +++
 arch/powerpc/mm/pgtable_64.c | 54 +---
 2 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 7dede2e34b70..93b8a99df88e 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -274,6 +274,14 @@ extern unsigned long __vmalloc_end;
 #define VMALLOC_START  __vmalloc_start
 #define VMALLOC_END__vmalloc_end
 
+static inline unsigned int ioremap_max_order(void)
+{
+   if (radix_enabled())
+   return PUD_SHIFT;
+   return 7 + PAGE_SHIFT; /* default from linux/vmalloc.h */
+}
+#define IOREMAP_MAX_ORDER ({ ioremap_max_order();})
+
 extern unsigned long __kernel_virt_start;
 extern unsigned long __kernel_virt_size;
 extern unsigned long __kernel_io_start;
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index d2d976ff8a0e..f660116251e6 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -112,7 +112,7 @@ unsigned long ioremap_bot = IOREMAP_BASE;
  * __ioremap_at - Low level function to establish the page tables
  *for an IO mapping
  */
-void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
+static void __iomem * hash__ioremap_at(phys_addr_t pa, void *ea, unsigned long 
size, pgprot_t prot)
 {
unsigned long i;
 
@@ -120,6 +120,50 @@ void __iomem *__ioremap_at(phys_addr_t pa, void *ea, 
unsigned long size, pgprot_
if (pgprot_val(prot) & H_PAGE_4K_PFN)
return NULL;
 
+   for (i = 0; i < size; i += PAGE_SIZE)
+   if (map_kernel_page((unsigned long)ea + i, pa + i, prot))
+   return NULL;
+
+   return (void __iomem *)ea;
+}
+
+static int radix__ioremap_page_range(unsigned long addr, unsigned long end,
+  phys_addr_t phys_addr, pgprot_t prot)
+{
+   while (addr != end) {
+   if (!(addr & ~PUD_MASK) && !(phys_addr & ~PUD_MASK) &&
+   end - addr >= PUD_SIZE) {
+   if (radix__map_kernel_page(addr, phys_addr, prot, 
PUD_SIZE))
+   return -ENOMEM;
+   addr += PUD_SIZE;
+   phys_addr += PUD_SIZE;
+
+   } else if (!(addr & ~PMD_MASK) && !(phys_addr & ~PMD_MASK) &&
+   end - addr >= PMD_SIZE) {
+   if (radix__map_kernel_page(addr, phys_addr, prot, 
PMD_SIZE))
+   return -ENOMEM;
+   addr += PMD_SIZE;
+   phys_addr += PMD_SIZE;
+
+   } else {
+   if (radix__map_kernel_page(addr, phys_addr, prot, 
PAGE_SIZE))
+   return -ENOMEM;
+   addr += PAGE_SIZE;
+   phys_addr += PAGE_SIZE;
+   }
+   }
+   return 0;
+}
+
+static void __iomem * radix__ioremap_at(phys_addr_t pa, void *ea, unsigned 
long size, pgprot_t prot)
+{
+   if (radix__ioremap_page_range((unsigned long)ea, (unsigned long)ea + 
size, pa, prot))
+   return NULL;
+   return ea;
+}
+
+void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
+{
if ((ea + size) >= (void *)IOREMAP_END) {
pr_warn("Outside the supported range\n");
return NULL;
@@ -129,11 +173,9 @@ void __iomem *__ioremap_at(phys_addr_t pa, void *ea, 
unsigned long size, pgprot_
WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
WARN_ON(size & ~PAGE_MASK);
 
-   for (i = 0; i < size; i += PAGE_SIZE)
-   if (map_kernel_page((unsigned long)ea + i, pa + i, prot))
-   return NULL;
-
-   return (void __iomem *)ea;
+   if (radix_enabled())
+   return radix__ioremap_at(pa, ea, size, prot);
+   return hash__ioremap_at(pa, ea, size, prot);
 }
 
 /**
-- 
2.20.1

[RFC PATCH 4/5] powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP

2019-05-15 Thread Nicholas Piggin

This does not actually enable huge vmap mappings, because powerpc/64
ioremap does not call ioremap_page_range, but it is required before
implementing huge mappings in ioremap, because the generic vunmap code
needs to cope with them.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig |  1 +
 arch/powerpc/mm/book3s64/radix_pgtable.c | 93 
 2 files changed, 94 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d7996cfaceca..ffac84600e0e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -166,6 +166,7 @@ config PPC
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
select HAVE_ARCH_AUDITSYSCALL
+   select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN  if PPC32
select HAVE_ARCH_KGDB
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index c9bcf428dd2b..3bc9ade56277 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -1122,3 +1122,96 @@ void radix__ptep_modify_prot_commit(struct 
vm_area_struct *vma,
 
set_pte_at(mm, addr, ptep, pte);
 }
+
+int __init arch_ioremap_pud_supported(void)
+{
+   return radix_enabled();
+}
+
+int __init arch_ioremap_pmd_supported(void)
+{
+   return radix_enabled();
+}
+
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+   return 0;
+}
+
+int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot)
+{
+   pte_t *ptep = (pte_t *)pud;
+   pte_t new_pud = pfn_pte(__phys_to_pfn(addr), prot);
+
+   set_pte_at(_mm, 0 /* radix unused */, ptep, new_pud);
+
+   return 1;
+}
+
+int pud_clear_huge(pud_t *pud)
+{
+   if (pud_huge(*pud)) {
+   pud_clear(pud);
+   return 1;
+   }
+
+   return 0;
+}
+
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
+{
+   pmd_t *pmd;
+   int i;
+
+   pmd = (pmd_t *)pud_page_vaddr(*pud);
+   pud_clear(pud);
+
+   flush_tlb_kernel_range(addr, addr + PUD_SIZE);
+
+   for (i = 0; i < PTRS_PER_PMD; i++) {
+   if (!pmd_none(pmd[i])) {
+   pte_t *pte;
+   pte = (pte_t *)pmd_page_vaddr(pmd[i]);
+
+   pte_free_kernel(_mm, pte);
+   }
+   }
+
+   pmd_free(_mm, pmd);
+
+   return 1;
+}
+
+int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot)
+{
+   pte_t *ptep = (pte_t *)pmd;
+   pte_t new_pmd = pfn_pte(__phys_to_pfn(addr), prot);
+
+   set_pte_at(_mm, 0 /* radix unused */, ptep, new_pmd);
+
+   return 1;
+}
+
+int pmd_clear_huge(pmd_t *pmd)
+{
+   if (pmd_huge(*pmd)) {
+   pmd_clear(pmd);
+   return 1;
+   }
+
+   return 0;
+}
+
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
+{
+   pte_t *pte;
+
+   pte = (pte_t *)pmd_page_vaddr(*pmd);
+   pmd_clear(pmd);
+
+   flush_tlb_kernel_range(addr, addr + PMD_SIZE);
+
+   pte_free_kernel(_mm, pte);
+
+   return 1;
+}
-- 
2.20.1

[RFC PATCH 3/5] mm/vmalloc: Hugepage vmalloc mappings

2019-05-15 Thread Nicholas Piggin

This appears to help cached git diff performance by about 5% on a
POWER9 (with 32MB dentry cache hash).

  Profiling git diff dTLB misses with a vanilla kernel:

  81.75%  git  [kernel.vmlinux][k] __d_lookup_rcu
   7.21%  git  [kernel.vmlinux][k] strncpy_from_user
   1.77%  git  [kernel.vmlinux][k] find_get_entry
   1.59%  git  [kernel.vmlinux][k] kmem_cache_free

40,168  dTLB-miss
   0.100342754 seconds time elapsed

After this patch (and the subsequent powerpc HUGE_VMAP patches), the
dentry cache hash gets mapped with 2MB pages:

 2,987  dTLB-miss
   0.095933138 seconds time elapsed

elapsed time improvement isn't too scientific but seems consistent,
TLB misses certainly improves an order of magnitude. My laptop
takes a lot of misses here too, so x86 would be interesting to test,
I think it should just work there.

---
 include/linux/vmalloc.h |  1 +
 mm/vmalloc.c| 87 +++--
 2 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c6eebb839552..029635560306 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -42,6 +42,7 @@ struct vm_struct {
unsigned long   size;
unsigned long   flags;
struct page **pages;
+   unsigned intpage_shift;
unsigned intnr_pages;
phys_addr_t phys_addr;
const void  *caller;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e5e9e1fcac01..c9ba88768bca 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -216,32 +216,34 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
  * Ie. pte at addr+N*PAGE_SIZE shall point to pfn corresponding to pages[N]
  */
 static int vmap_page_range_noflush(unsigned long start, unsigned long end,
-  pgprot_t prot, struct page **pages)
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
 {
-   pgd_t *pgd;
-   unsigned long next;
unsigned long addr = start;
-   int err = 0;
-   int nr = 0;
+   unsigned int i, nr = (end - start) >> (PAGE_SHIFT + page_shift);
 
-   BUG_ON(addr >= end);
-   pgd = pgd_offset_k(addr);
-   do {
-   next = pgd_addr_end(addr, end);
-   err = vmap_p4d_range(pgd, addr, next, prot, pages, );
+   for (i = 0; i < nr; i++) {
+   int err;
+
+   err = ioremap_page_range(addr,
+   addr + (PAGE_SIZE << page_shift),
+   __pa(page_address(pages[i])), prot);
if (err)
return err;
-   } while (pgd++, addr = next, addr != end);
+
+   addr += PAGE_SIZE << page_shift;
+   }
 
return nr;
 }
 
 static int vmap_page_range(unsigned long start, unsigned long end,
-  pgprot_t prot, struct page **pages)
+  pgprot_t prot, struct page **pages,
+  unsigned int page_shift)
 {
int ret;
 
-   ret = vmap_page_range_noflush(start, end, prot, pages);
+   ret = vmap_page_range_noflush(start, end, prot, pages, page_shift);
flush_cache_vmap(start, end);
return ret;
 }
@@ -1189,7 +1191,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, 
int node, pgprot_t pro
addr = va->va_start;
mem = (void *)addr;
}
-   if (vmap_page_range(addr, addr + size, prot, pages) < 0) {
+   if (vmap_page_range(addr, addr + size, prot, pages, 0) < 0) {
vm_unmap_ram(mem, count);
return NULL;
}
@@ -1305,7 +1307,7 @@ void __init vmalloc_init(void)
 int map_kernel_range_noflush(unsigned long addr, unsigned long size,
 pgprot_t prot, struct page **pages)
 {
-   return vmap_page_range_noflush(addr, addr + size, prot, pages);
+   return vmap_page_range_noflush(addr, addr + size, prot, pages, 0);
 }
 
 /**
@@ -1352,7 +1354,7 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, 
struct page **pages)
unsigned long end = addr + get_vm_area_size(area);
int err;
 
-   err = vmap_page_range(addr, end, prot, pages);
+   err = vmap_page_range(addr, end, prot, pages, 0);
 
return err > 0 ? 0 : err;
 }
@@ -1395,8 +1397,9 @@ static struct vm_struct *__get_vm_area_node(unsigned long 
size,
return NULL;
 
if (flags & VM_IOREMAP)
-   align = 1ul << clamp_t(int, get_count_order_long(size),
-  PAGE_SHIFT, IOREMAP_MAX_ORDER);
+   align = max(align,
+   1ul << clamp_t(int, get_count_order_long(size),
+  PAGE_SHIFT, IOREMAP_MAX_ORDER));

[RFC PATCH 2/5] mm: large system hash avoid vmap for non-NUMA machines when hashdist

2019-05-15 Thread Nicholas Piggin

hashdist currently always uses vmalloc when hashdist is true. When
there is only 1 online node and size <= MAX_ORDER, vmalloc can be
avoided.

Signed-off-by: Nicholas Piggin 
---
 mm/page_alloc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1683d54d6405..1312d4db5602 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7978,7 +7978,8 @@ void *__init alloc_large_system_hash(const char 
*tablename,
else
table = memblock_alloc_raw(size,
   SMP_CACHE_BYTES);
-   } else if (get_order(size) >= MAX_ORDER || hashdist) {
+   } else if (get_order(size) >= MAX_ORDER ||
+   (hashdist && num_online_nodes() > 1)) {
table = __vmalloc(size, gfp_flags, PAGE_KERNEL);
} else {
/*
-- 
2.20.1

[RFC PATCH 1/5] mm: large system hash use vmalloc for size > MAX_ORDER when !hashdist

2019-05-15 Thread Nicholas Piggin

The kernel currently clamps large system hashes to MAX_ORDER when
hashdist is not set, which is rather arbitrary.

vmalloc space is limited on 32-bit machines, but this shouldn't
result in much more used because of small physical memory.

Signed-off-by: Nicholas Piggin 
---
 mm/page_alloc.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59661106da16..1683d54d6405 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7978,7 +7978,7 @@ void *__init alloc_large_system_hash(const char 
*tablename,
else
table = memblock_alloc_raw(size,
   SMP_CACHE_BYTES);
-   } else if (hashdist) {
+   } else if (get_order(size) >= MAX_ORDER || hashdist) {
table = __vmalloc(size, gfp_flags, PAGE_KERNEL);
} else {
/*
@@ -7986,10 +7986,8 @@ void *__init alloc_large_system_hash(const char 
*tablename,
 * some pages at the end of hash table which
 * alloc_pages_exact() automatically does
 */
-   if (get_order(size) < MAX_ORDER) {
-   table = alloc_pages_exact(size, gfp_flags);
-   kmemleak_alloc(table, size, 1, gfp_flags);
-   }
+   table = alloc_pages_exact(size, gfp_flags);
+   kmemleak_alloc(table, size, 1, gfp_flags);
}
} while (!table && size > PAGE_SIZE && --log2qty);
 
-- 
2.20.1

Re: [PATCH] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-05-15 Thread Christoph Hellwig

On Wed, May 15, 2019 at 02:09:42PM +0200, Mathieu Malaterre wrote:
> In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
> option") the following piece of code was added:
> 
>smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
> 
> Since GCC 8 this trigger the following warning about incompatible
> function types:

And the warning is there for a reason, and should not be hidden
behind a cast.  This should instead be fixed by something like this:

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index da307dd93ee3..a26b67a1be83 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -384,6 +384,12 @@ void hw_breakpoint_pmu_read(struct perf_event *bp)
 bool dawr_force_enable;
 EXPORT_SYMBOL_GPL(dawr_force_enable);
 
+
+static void set_dawr_cb(void *info)
+{
+   set_dawr(info);
+}
+
 static ssize_t dawr_write_file_bool(struct file *file,
const char __user *user_buf,
size_t count, loff_t *ppos)
@@ -403,7 +409,7 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
-   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
+   smp_call_function(set_dawr_cb, _brk, 0);
 
return rc;
 }

Re: [PATCH stable 4.9] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Greg KH

On Wed, May 15, 2019 at 02:35:36PM +0200, Christophe Leroy wrote:
> 
> 
> Le 15/05/2019 à 10:29, Greg KH a écrit :
> > On Wed, May 15, 2019 at 06:40:47AM +, Christophe Leroy wrote:
> > > [Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]
> > > 
> > > On powerpc32, patch_instruction() is called by apply_feature_fixups()
> > > which is called from early_init()
> > > 
> > > There is the following note in front of early_init():
> > >   * Note that the kernel may be running at an address which is different
> > >   * from the address that it was linked at, so we must use RELOC/PTRRELOC
> > >   * to access static data (including strings).  -- paulus
> > > 
> > > Therefore init_mem_is_free must be accessed with PTRRELOC()
> > > 
> > > Fixes: 1c38a84d4586 ("powerpc: Avoid code patching freed init sections")
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
> > > Signed-off-by: Christophe Leroy 
> > > 
> > > ---
> > > Can't apply the upstream commit as such due to several other unrelated 
> > > stuff
> > > like for instance STRICT_KERNEL_RWX which are missing.
> > > So instead, using same approach as for commit 
> > > 252eb55816a6f69ef9464cad303cdb3326cdc61d
> > 
> > Now queued up, thanks.
> > 
> 
> Should go to 4.4 as well since the commit it fixes is now queued for 4.4
> ([PATCH 4.4 056/266] powerpc: Avoid code patching freed init sections)

Ok, can someone send me a backport that actually applies there?

thanks,

greg k-h

Re: [PATCH stable 4.9] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Christophe Leroy





Le 15/05/2019 à 10:29, Greg KH a écrit :

On Wed, May 15, 2019 at 06:40:47AM +, Christophe Leroy wrote:

[Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]

On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
  * Note that the kernel may be running at an address which is different
  * from the address that it was linked at, so we must use RELOC/PTRRELOC
  * to access static data (including strings).  -- paulus

Therefore init_mem_is_free must be accessed with PTRRELOC()

Fixes: 1c38a84d4586 ("powerpc: Avoid code patching freed init sections")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
Signed-off-by: Christophe Leroy 

---
Can't apply the upstream commit as such due to several other unrelated stuff
like for instance STRICT_KERNEL_RWX which are missing.
So instead, using same approach as for commit 
252eb55816a6f69ef9464cad303cdb3326cdc61d


Now queued up, thanks.



Should go to 4.4 as well since the commit it fixes is now queued for 4.4 
([PATCH 4.4 056/266] powerpc: Avoid code patching freed init sections)


Christophe

Re: [PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Geert Uytterhoeven

On Wed, May 15, 2019 at 12:04 PM Christian Brauner  wrote:
> This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
> pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
> process that is created via traditional fork()/clone() calls that is only
> referenced by a PID:
>
> int pidfd = pidfd_open(1234, 0);
> ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
>
> With the introduction of pidfds through CLONE_PIDFD it is possible to
> created pidfds at process creation time.
> However, a lot of processes get created with traditional PID-based calls
> such as fork() or clone() (without CLONE_PIDFD). For these processes a
> caller can currently not create a pollable pidfd. This is a huge problem
> for Android's low memory killer (LMK) and service managers such as systemd.
> Both are examples of tools that want to make use of pidfds to get reliable
> notification of process exit for non-parents (pidfd polling) and race-free
> signal sending (pidfd_send_signal()). They intend to switch to this API for
> process supervision/management as soon as possible. Having no way to get
> pollable pidfds from PID-only processes is one of the biggest blockers for
> them in adopting this api. With pidfd_open() making it possible to retrieve
> pidfd for PID-based processes we enable them to adopt this api.
>
> In line with Arnd's recent changes to consolidate syscall numbers across
> architectures, I have added the pidfd_open() syscall to all architectures
> at the same time.
>
> Signed-off-by: Christian Brauner 

>  arch/m68k/kernel/syscalls/syscall.tbl   |  1 +

Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH 4.4 084/266] powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup

2019-05-15 Thread Greg Kroah-Hartman

From: Diana Craciun 

commit 039daac5526932ec731e4499613018d263af8b3e upstream.

Fixed the following build warning:
powerpc-linux-gnu-ld: warning: orphan section `__btb_flush_fixup' from
`arch/powerpc/kernel/head_44x.o' being placed in section
`__btb_flush_fixup'.

Signed-off-by: Diana Craciun 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/powerpc/kernel/head_booke.h |   18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -31,6 +31,16 @@
  */
 #define THREAD_NORMSAVE(offset)(THREAD_NORMSAVES + (offset * 4))
 
+#ifdef CONFIG_PPC_FSL_BOOK3E
+#define BOOKE_CLEAR_BTB(reg)   
\
+START_BTB_FLUSH_SECTION
\
+   BTB_FLUSH(reg)  
\
+END_BTB_FLUSH_SECTION
+#else
+#define BOOKE_CLEAR_BTB(reg)
+#endif
+
+
 #define NORMAL_EXCEPTION_PROLOG(intno) 
 \
mtspr   SPRN_SPRG_WSCRATCH0, r10;   /* save one register */  \
mfspr   r10, SPRN_SPRG_THREAD;   \
@@ -42,9 +52,7 @@
andi.   r11, r11, MSR_PR;   /* check whether user or kernel*/\
mr  r11, r1; \
beq 1f;  \
-START_BTB_FLUSH_SECTION\
-   BTB_FLUSH(r11)  \
-END_BTB_FLUSH_SECTION  \
+   BOOKE_CLEAR_BTB(r11)\
/* if from user, start at top of this thread's kernel stack */   \
lwz r11, THREAD_INFO-THREAD(r10);\
ALLOC_STACK_FRAME(r11, THREAD_SIZE); \
@@ -130,9 +138,7 @@ END_BTB_FLUSH_SECTION   
\
stw r9,_CCR(r8);/* save CR on stack*/\
mfspr   r11,exc_level_srr1; /* check whether user or kernel*/\
DO_KVM  BOOKE_INTERRUPT_##intno exc_level_srr1;  \
-START_BTB_FLUSH_SECTION
\
-   BTB_FLUSH(r10)  
\
-END_BTB_FLUSH_SECTION  
\
+   BOOKE_CLEAR_BTB(r10)\
andi.   r11,r11,MSR_PR;  \
mfspr   r11,SPRN_SPRG_THREAD;   /* if from user, start at top of   */\
lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\

[PATCH 4.4 083/266] powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms

2019-05-15 Thread Greg Kroah-Hartman

From: Diana Craciun 

commit c28218d4abbf4f2035495334d8bfcba64bda4787 upstream.

Used barrier_nospec to sanitize the syscall table.

Signed-off-by: Diana Craciun 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/powerpc/kernel/entry_32.S |   10 ++
 1 file changed, 10 insertions(+)

--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * MSR_KERNEL is > 0x1 on 4xx/Book-E since it include MSR_CE.
@@ -340,6 +341,15 @@ syscall_dotrace_cont:
ori r10,r10,sys_call_table@l
slwir0,r0,2
bge-66f
+
+   barrier_nospec_asm
+   /*
+* Prevent the load of the handler below (based on the user-passed
+* system call number) being speculatively executed until the test
+* against NR_syscalls and branch to .66f above has
+* committed.
+*/
+
lwzxr10,r10,r0  /* Fetch system call handler [ptr] */
mtlrr10
addir9,r1,STACK_FRAME_OVERHEAD

[PATCH 4.4 082/266] powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)

2019-05-15 Thread Greg Kroah-Hartman

From: Diana Craciun 

commit 7fef436295bf6c05effe682c8797dfcb0deb112a upstream.

In order to protect against speculation attacks on
indirect branches, the branch predictor is flushed at
kernel entry to protect for the following situations:
- userspace process attacking another userspace process
- userspace process attacking the kernel
Basically when the privillege level change (i.e.the kernel
is entered), the branch predictor state is flushed.

Signed-off-by: Diana Craciun 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/powerpc/kernel/head_booke.h |6 ++
 arch/powerpc/kernel/head_fsl_booke.S |   15 +++
 2 files changed, 21 insertions(+)

--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -42,6 +42,9 @@
andi.   r11, r11, MSR_PR;   /* check whether user or kernel*/\
mr  r11, r1; \
beq 1f;  \
+START_BTB_FLUSH_SECTION\
+   BTB_FLUSH(r11)  \
+END_BTB_FLUSH_SECTION  \
/* if from user, start at top of this thread's kernel stack */   \
lwz r11, THREAD_INFO-THREAD(r10);\
ALLOC_STACK_FRAME(r11, THREAD_SIZE); \
@@ -127,6 +130,9 @@
stw r9,_CCR(r8);/* save CR on stack*/\
mfspr   r11,exc_level_srr1; /* check whether user or kernel*/\
DO_KVM  BOOKE_INTERRUPT_##intno exc_level_srr1;  \
+START_BTB_FLUSH_SECTION
\
+   BTB_FLUSH(r10)  
\
+END_BTB_FLUSH_SECTION  
\
andi.   r11,r11,MSR_PR;  \
mfspr   r11,SPRN_SPRG_THREAD;   /* if from user, start at top of   */\
lwz r11,THREAD_INFO-THREAD(r11); /* this thread's kernel stack */\
--- a/arch/powerpc/kernel/head_fsl_booke.S
+++ b/arch/powerpc/kernel/head_fsl_booke.S
@@ -451,6 +451,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
mfcrr13
stw r13, THREAD_NORMSAVE(3)(r10)
DO_KVM  BOOKE_INTERRUPT_DTLB_MISS SPRN_SRR1
+START_BTB_FLUSH_SECTION
+   mfspr r11, SPRN_SRR1
+   andi. r10,r11,MSR_PR
+   beq 1f
+   BTB_FLUSH(r10)
+1:
+END_BTB_FLUSH_SECTION
mfspr   r10, SPRN_DEAR  /* Get faulting address */
 
/* If we are faulting a kernel address, we have to use the
@@ -545,6 +552,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV)
mfcrr13
stw r13, THREAD_NORMSAVE(3)(r10)
DO_KVM  BOOKE_INTERRUPT_ITLB_MISS SPRN_SRR1
+START_BTB_FLUSH_SECTION
+   mfspr r11, SPRN_SRR1
+   andi. r10,r11,MSR_PR
+   beq 1f
+   BTB_FLUSH(r10)
+1:
+END_BTB_FLUSH_SECTION
+
mfspr   r10, SPRN_SRR0  /* Get faulting address */
 
/* If we are faulting a kernel address, we have to use the

[PATCH 4.4 081/266] powerpc/fsl: Emulate SPRN_BUCSR register

2019-05-15 Thread Greg Kroah-Hartman

From: Diana Craciun 

commit 98518c4d8728656db349f875fcbbc7c126d4c973 upstream.

In order to flush the branch predictor the guest kernel performs
writes to the BUCSR register which is hypervisor privilleged. However,
the branch predictor is flushed at each KVM entry, so the branch
predictor has been already flushed, so just return as soon as possible
to guest.

Signed-off-by: Diana Craciun 
[mpe: Tweak comment formatting]
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/powerpc/kvm/e500_emulate.c |7 +++
 1 file changed, 7 insertions(+)

--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -277,6 +277,13 @@ int kvmppc_core_emulate_mtspr_e500(struc
vcpu->arch.pwrmgtcr0 = spr_val;
break;
 
+   case SPRN_BUCSR:
+   /*
+* If we are here, it means that we have already flushed the
+* branch predictor, so just return to guest.
+*/
+   break;
+
/* extra exceptions */
 #ifdef CONFIG_SPE_POSSIBLE
case SPRN_IVOR32:

[PATCH 4.4 080/266] powerpc/fsl: Flush branch predictor when entering KVM

2019-05-15 Thread Greg Kroah-Hartman

From: Diana Craciun 

commit e7aa61f47b23afbec41031bc47ca8d6cb6516abc upstream.

Switching from the guest to host is another place
where the speculative accesses can be exploited.
Flush the branch predictor when entering KVM.

Signed-off-by: Diana Craciun 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/powerpc/kvm/bookehv_interrupts.S |4 
 1 file changed, 4 insertions(+)

--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -75,6 +75,10 @@
PPC_LL  r1, VCPU_HOST_STACK(r4)
PPC_LL  r2, HOST_R2(r1)
 
+START_BTB_FLUSH_SECTION
+   BTB_FLUSH(r10)
+END_BTB_FLUSH_SECTION
+
mfspr   r10, SPRN_PID
lwz r8, VCPU_HOST_PID(r4)
PPC_LL  r11, VCPU_SHARED(r4)

[PATCH 4.4 079/266] powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used

2019-05-15 Thread Greg Kroah-Hartman

From: Diana Craciun 

commit 3bc8ea8603ae4c1e09aca8de229ad38b8091fcb3 upstream.

If the user choses not to use the mitigations, replace
the code sequence with nops.

Signed-off-by: Diana Craciun 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman 
---
 arch/powerpc/kernel/setup_32.c |1 +
 arch/powerpc/kernel/setup_64.c |1 +
 2 files changed, 2 insertions(+)

--- a/arch/powerpc/kernel/setup_32.c
+++ b/arch/powerpc/kernel/setup_32.c
@@ -323,6 +323,7 @@ void __init setup_arch(char **cmdline_p)
if ( ppc_md.progress ) ppc_md.progress("arch: exit", 0x3eab);
 
setup_barrier_nospec();
+   setup_spectre_v2();
 
paging_init();
 
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -737,6 +737,7 @@ void __init setup_arch(char **cmdline_p)
ppc_md.setup_arch();
 
setup_barrier_nospec();
+   setup_spectre_v2();
 
paging_init();

[PATCH] crypto: talitos - fix skcipher failure due to wrong output IV

2019-05-15 Thread Christophe Leroy

Selftests report the following:

[2.984845] alg: skcipher: cbc-aes-talitos encryption test failed (wrong 
output IV) on test vector 0, cfg="in-place"
[2.995377] : 3d af ba 42 9d 9e b4 30 b4 22 da 80 2c 9f ac 41
[3.032673] alg: skcipher: cbc-des-talitos encryption test failed (wrong 
output IV) on test vector 0, cfg="in-place"
[3.043185] : fe dc ba 98 76 54 32 10
[3.063238] alg: skcipher: cbc-3des-talitos encryption test failed (wrong 
output IV) on test vector 0, cfg="in-place"
[3.073818] : 7d 33 88 93 0f 93 b2 42

This above dumps show that the actual output IV is indeed the input IV.
This is due to the IV not being copied back into the request.

This patch fixes that.

Signed-off-by: Christophe Leroy 
---
 drivers/crypto/talitos.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 1d429fc073d1..f443cbe7da80 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -1637,11 +1637,15 @@ static void ablkcipher_done(struct device *dev,
int err)
 {
struct ablkcipher_request *areq = context;
+   struct crypto_ablkcipher *cipher = crypto_ablkcipher_reqtfm(areq);
+   struct talitos_ctx *ctx = crypto_ablkcipher_ctx(cipher);
+   unsigned int ivsize = crypto_ablkcipher_ivsize(cipher);
struct talitos_edesc *edesc;
 
edesc = container_of(desc, struct talitos_edesc, desc);
 
common_nonsnoop_unmap(dev, edesc, areq);
+   memcpy(areq->info, ctx->iv, ivsize);
 
kfree(edesc);
 
-- 
2.13.3

Re: [RFC PATCH] powerpc/64/ftrace: mprofile-kernel patch out mflr

2019-05-15 Thread Michael Ellerman

"Naveen N. Rao"  writes:
> Michael Ellerman wrote:
>> "Naveen N. Rao"  writes:
>>> Michael Ellerman wrote:
 Nicholas Piggin  writes:
> The new mprofile-kernel mcount sequence is
>
>   mflrr0
>   bl  _mcount
>
> Dynamic ftrace patches the branch instruction with a noop, but leaves
> the mflr. mflr is executed by the branch unit that can only execute one
> per cycle on POWER9 and shared with branches, so it would be nice to
> avoid it where possible.
>
> This patch is a hacky proof of concept to nop out the mflr. Can we do
> this or are there races or other issues with it?
 
 There's a race, isn't there?
 
 We have a function foo which currently has tracing disabled, so the mflr
 and bl are nop'ed out.
 
   CPU 0CPU 1
   ==
   bl foo
   nop (ie. not mflr)
   -> interrupt
   something else   enable tracing for foo
   ...  patch mflr and branch
   <- rfi
   bl _mcount
 
 So we end up in _mcount() but with r0 not populated.
>>>
>>> Good catch! Looks like we need to patch the mflr with a "b +8" similar 
>>> to what we do in __ftrace_make_nop().
>> 
>> Would that actually make it any faster though? Nick?
>
> Ok, how about doing this as a 2-step process?
> 1. patch 'mflr r0' with a 'b +8'
>synchronize_rcu_tasks()
> 2. convert 'b +8' to a 'nop'

I think that would work, if I understand synchronize_rcu_tasks().

I worry that it will make the enable/disable expensive. But could be
worth trying.

cheers

[PATCH] powerpc: Include header file to fix a warning

2019-05-15 Thread Mathieu Malaterre

Make sure to include  to provide the following prototype:
__find_linux_pte.

Remove the following warning treated as error (W=1):

  arch/powerpc/mm/pgtable.c:316:8: error: no previous prototype for 
'__find_linux_pte' [-Werror=missing-prototypes]

Fixes: 0caed4de502c ("powerpc/mm: move __find_linux_pte() out of hugetlbpage.c")
Cc: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/mm/pgtable.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index db4a6253df92..2aa042193ace 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static inline int is_exec_fault(void)
 {
-- 
2.20.1

[PATCH] powerpc: silence a -Wcast-function-type warning in dawr_write_file_bool

2019-05-15 Thread Mathieu Malaterre

In commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9
option") the following piece of code was added:

   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);

Since GCC 8 this trigger the following warning about incompatible
function types:

  arch/powerpc/kernel/hw_breakpoint.c:408:21: error: cast between incompatible 
function types from 'int (*)(struct arch_hw_breakpoint *)' to 'void (*)(void 
*)' [-Werror=cast-function-type]

Cast the function to an intermediate (void*) to make the compiler loose
knowledge about actual type.

Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option")
Cc: Michael Neuling 
Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/kernel/hw_breakpoint.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/hw_breakpoint.c 
b/arch/powerpc/kernel/hw_breakpoint.c
index f70fb89dbf60..baeb4c58de3b 100644
--- a/arch/powerpc/kernel/hw_breakpoint.c
+++ b/arch/powerpc/kernel/hw_breakpoint.c
@@ -405,7 +405,8 @@ static ssize_t dawr_write_file_bool(struct file *file,
 
/* If we are clearing, make sure all CPUs have the DAWR cleared */
if (!dawr_force_enable)
-   smp_call_function((smp_call_func_t)set_dawr, _brk, 0);
+   smp_call_function((smp_call_func_t)(void *)set_dawr,
+ _brk, 0);
 
return rc;
 }
-- 
2.20.1

Re: [PATCH] powerpc: Remove double free

2019-05-15 Thread Greg Kroah-Hartman

On Wed, May 15, 2019 at 11:26:03AM +0200, Christophe Leroy wrote:
> kobject_put() released index_dir->kobj

Yes, but what is that kobject enclosed in?

> but who will release 'index' ?

The final kobject_put() will do that, see cacheinfo_create_index_dir()
for the details.

And please do not top-post, you lost all context.

greg k-h

[PATCH 4.4 248/266] x86/speculation: Support mitigations= cmdline option

2019-05-15 Thread Greg Kroah-Hartman

From: Josh Poimboeuf 

commit d68be4c4d31295ff6ae34a8ddfaa4c1a8ff42812 upstream.

Configure x86 runtime CPU speculation bug mitigations in accordance with
the 'mitigations=' cmdline option.  This affects Meltdown, Spectre v2,
Speculative Store Bypass, and L1TF.

The default behavior is unchanged.

Signed-off-by: Josh Poimboeuf 
Signed-off-by: Thomas Gleixner 
Tested-by: Jiri Kosina  (on x86)
Reviewed-by: Jiri Kosina 
Cc: Borislav Petkov 
Cc: "H . Peter Anvin" 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Jiri Kosina 
Cc: Waiman Long 
Cc: Andrea Arcangeli 
Cc: Jon Masters 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-a...@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: Tyler Hicks 
Cc: Linus Torvalds 
Cc: Randy Dunlap 
Cc: Steven Price 
Cc: Phil Auld 
Link: 
https://lkml.kernel.org/r/6616d0ae169308516cfdf5216bedd169f8a8291b.1555085500.git.jpoim...@redhat.com
[bwh: Backported to 4.4:
 - Drop the auto,nosmt option and the l1tf mitigation selection, which we can't
   support
 - Adjust filenames, context]
Signed-off-by: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman 
---
 Documentation/kernel-parameters.txt |   14 +-
 arch/x86/kernel/cpu/bugs.c  |6 --
 arch/x86/mm/kaiser.c|4 +++-
 3 files changed, 16 insertions(+), 8 deletions(-)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2174,15 +2174,19 @@ bytes respectively. Such letter suffixes
http://repo.or.cz/w/linux-2.6/mini2440.git
 
mitigations=
-   Control optional mitigations for CPU vulnerabilities.
-   This is a set of curated, arch-independent options, each
-   of which is an aggregation of existing arch-specific
-   options.
+   [X86] Control optional mitigations for CPU
+   vulnerabilities.  This is a set of curated,
+   arch-independent options, each of which is an
+   aggregation of existing arch-specific options.
 
off
Disable all optional CPU mitigations.  This
improves system performance, but it may also
expose users to several CPU vulnerabilities.
+   Equivalent to: nopti [X86]
+  nospectre_v2 [X86]
+  spectre_v2_user=off [X86]
+  spec_store_bypass_disable=off 
[X86]
 
auto (default)
Mitigate all CPU vulnerabilities, but leave SMT
@@ -2190,7 +2194,7 @@ bytes respectively. Such letter suffixes
users who don't want to be surprised by SMT
getting disabled across kernel upgrades, or who
have other ways of avoiding SMT-based attacks.
-   This is the default behavior.
+   Equivalent to: (default behavior)
 
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -479,7 +479,8 @@ static enum spectre_v2_mitigation_cmd __
char arg[20];
int ret, i;
 
-   if (cmdline_find_option_bool(boot_command_line, "nospectre_v2"))
+   if (cmdline_find_option_bool(boot_command_line, "nospectre_v2") ||
+   cpu_mitigations_off())
return SPECTRE_V2_CMD_NONE;
 
ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, 
sizeof(arg));
@@ -743,7 +744,8 @@ static enum ssb_mitigation_cmd __init ss
char arg[20];
int ret, i;
 
-   if (cmdline_find_option_bool(boot_command_line, 
"nospec_store_bypass_disable")) {
+   if (cmdline_find_option_bool(boot_command_line, 
"nospec_store_bypass_disable") ||
+   cpu_mitigations_off()) {
return SPEC_STORE_BYPASS_CMD_NONE;
} else {
ret = cmdline_find_option(boot_command_line, 
"spec_store_bypass_disable",
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #undef pr_fmt
 #define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt
@@ -297,7 +298,8 @@ void __init kaiser_check_boottime_disabl
goto skip;
}
 
-   if (cmdline_find_option_bool(boot_command_line, "nopti"))
+   if (cmdline_find_option_bool(boot_command_line, "nopti") ||
+   cpu_mitigations_off())
goto disable;
 
 skip:

[PATCH 4.4 247/266] cpu/speculation: Add mitigations= cmdline option

2019-05-15 Thread Greg Kroah-Hartman

From: Josh Poimboeuf 

commit 98af8452945c55652de68536afdde3b520fec429 upstream.

Keeping track of the number of mitigations for all the CPU speculation
bugs has become overwhelming for many users.  It's getting more and more
complicated to decide which mitigations are needed for a given
architecture.  Complicating matters is the fact that each arch tends to
have its own custom way to mitigate the same vulnerability.

Most users fall into a few basic categories:

a) they want all mitigations off;

b) they want all reasonable mitigations on, with SMT enabled even if
   it's vulnerable; or

c) they want all reasonable mitigations on, with SMT disabled if
   vulnerable.

Define a set of curated, arch-independent options, each of which is an
aggregation of existing options:

- mitigations=off: Disable all mitigations.

- mitigations=auto: [default] Enable all the default mitigations, but
  leave SMT enabled, even if it's vulnerable.

- mitigations=auto,nosmt: Enable all the default mitigations, disabling
  SMT if needed by a mitigation.

Currently, these options are placeholders which don't actually do
anything.  They will be fleshed out in upcoming patches.

Signed-off-by: Josh Poimboeuf 
Signed-off-by: Thomas Gleixner 
Tested-by: Jiri Kosina  (on x86)
Reviewed-by: Jiri Kosina 
Cc: Borislav Petkov 
Cc: "H . Peter Anvin" 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Cc: Jiri Kosina 
Cc: Waiman Long 
Cc: Andrea Arcangeli 
Cc: Jon Masters 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Martin Schwidefsky 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-a...@vger.kernel.org
Cc: Greg Kroah-Hartman 
Cc: Tyler Hicks 
Cc: Linus Torvalds 
Cc: Randy Dunlap 
Cc: Steven Price 
Cc: Phil Auld 
Link: 
https://lkml.kernel.org/r/b07a8ef9b7c5055c3a4637c87d07c296d5016fe0.1555085500.git.jpoim...@redhat.com
[bwh: Backported to 4.4:
 - Drop the auto,nosmt option which we can't support
 - Adjust filename]
Signed-off-by: Ben Hutchings 
Signed-off-by: Greg Kroah-Hartman 
---
 Documentation/kernel-parameters.txt |   19 +++
 include/linux/cpu.h |   17 +
 kernel/cpu.c|   13 +
 3 files changed, 49 insertions(+)

--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2173,6 +2173,25 @@ bytes respectively. Such letter suffixes
in the "bleeding edge" mini2440 support kernel at
http://repo.or.cz/w/linux-2.6/mini2440.git
 
+   mitigations=
+   Control optional mitigations for CPU vulnerabilities.
+   This is a set of curated, arch-independent options, each
+   of which is an aggregation of existing arch-specific
+   options.
+
+   off
+   Disable all optional CPU mitigations.  This
+   improves system performance, but it may also
+   expose users to several CPU vulnerabilities.
+
+   auto (default)
+   Mitigate all CPU vulnerabilities, but leave SMT
+   enabled, even if it's vulnerable.  This is for
+   users who don't want to be surprised by SMT
+   getting disabled across kernel upgrades, or who
+   have other ways of avoiding SMT-based attacks.
+   This is the default behavior.
+
mminit_loglevel=
[KNL] When CONFIG_DEBUG_MEMORY_INIT is set, this
parameter allows control of the logging verbosity for
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -296,4 +296,21 @@ bool cpu_wait_death(unsigned int cpu, in
 bool cpu_report_death(void);
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 
+/*
+ * These are used for a global "mitigations=" cmdline option for toggling
+ * optional CPU mitigations.
+ */
+enum cpu_mitigations {
+   CPU_MITIGATIONS_OFF,
+   CPU_MITIGATIONS_AUTO,
+};
+
+extern enum cpu_mitigations cpu_mitigations;
+
+/* mitigations=off */
+static inline bool cpu_mitigations_off(void)
+{
+   return cpu_mitigations == CPU_MITIGATIONS_OFF;
+}
+
 #endif /* _LINUX_CPU_H_ */
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -842,3 +842,16 @@ void init_cpu_online(const struct cpumas
 {
cpumask_copy(to_cpumask(cpu_online_bits), src);
 }
+
+enum cpu_mitigations cpu_mitigations = CPU_MITIGATIONS_AUTO;
+
+static int __init mitigations_parse_cmdline(char *arg)
+{
+   if (!strcmp(arg, "off"))
+   cpu_mitigations = CPU_MITIGATIONS_OFF;
+   else if (!strcmp(arg, "auto"))
+   cpu_mitigations = CPU_MITIGATIONS_AUTO;
+
+   return 0;
+}
+early_param("mitigations",

Re: [PATCH] powerpc/pseries: Fix xive=off command line

2019-05-15 Thread Sasha Levin

Hi,

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag,
fixing commit: eac1e731b59e powerpc/xive: guest exploitation of the XIVE 
interrupt controller.

The bot has tested the following trees: v5.1.1, v5.0.15, v4.19.42, v4.14.118.

v5.1.1: Build OK!
v5.0.15: Build OK!
v4.19.42: Failed to apply! Possible dependencies:
8ca2d5151e7f ("powerpc/prom_init: Move a few remaining statics to 
appropriate sections")
c886087caee7 ("powerpc/prom_init: Move prom_radix_disable to __prombss")

v4.14.118: Failed to apply! Possible dependencies:
028555a590d6 ("powerpc/xive: fix hcall H_INT_RESET to support long busy 
delays")
7a22d6321c3d ("powerpc/mm/radix: Update command line parsing for 
disable_radix")
8ca2d5151e7f ("powerpc/prom_init: Move a few remaining statics to 
appropriate sections")
c886087caee7 ("powerpc/prom_init: Move prom_radix_disable to __prombss")


How should we proceed with this patch?

--
Thanks,
Sasha

[PATCH 2/2] tests: add pidfd_open() tests

2019-05-15 Thread Christian Brauner

This adds testing for the new pidfd_open() syscalls. Specifically, we test:
- that no invalid flags can be passed to pidfd_open()
- that no invalid pid can be passed to pidfd_open()
- that a pidfd can be retrieved with pidfd_open()
- that the retrieved pidfd references the correct pid

Signed-off-by: Christian Brauner 
Cc: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Jann Horn 
Cc: David Howells 
Cc: "Michael Kerrisk (man-pages)" 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Oleg Nesterov 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
---
 tools/testing/selftests/pidfd/Makefile|   2 +-
 tools/testing/selftests/pidfd/pidfd.h |  57 ++
 .../testing/selftests/pidfd/pidfd_open_test.c | 170 ++
 tools/testing/selftests/pidfd/pidfd_test.c|  41 +
 4 files changed, 229 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/pidfd/pidfd.h
 create mode 100644 tools/testing/selftests/pidfd/pidfd_open_test.c

diff --git a/tools/testing/selftests/pidfd/Makefile 
b/tools/testing/selftests/pidfd/Makefile
index deaf8073bc06..b36c0be70848 100644
--- a/tools/testing/selftests/pidfd/Makefile
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -1,6 +1,6 @@
 CFLAGS += -g -I../../../../usr/include/
 
-TEST_GEN_PROGS := pidfd_test
+TEST_GEN_PROGS := pidfd_test pidfd_open_test
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/pidfd/pidfd.h 
b/tools/testing/selftests/pidfd/pidfd.h
new file mode 100644
index ..8452e910463f
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __PIDFD_H
+#define __PIDFD_H
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+/*
+ * The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c
+ * That means, when it wraps around any pid < 300 will be skipped.
+ * So we need to use a pid > 300 in order to test recycling.
+ */
+#define PID_RECYCLE 1000
+
+/*
+ * Define a few custom error codes for the child process to clearly indicate
+ * what is happening. This way we can tell the difference between a system
+ * error, a test error, etc.
+ */
+#define PIDFD_PASS 0
+#define PIDFD_FAIL 1
+#define PIDFD_ERROR 2
+#define PIDFD_SKIP 3
+#define PIDFD_XFAIL 4
+
+int wait_for_pid(pid_t pid)
+{
+   int status, ret;
+
+again:
+   ret = waitpid(pid, , 0);
+   if (ret == -1) {
+   if (errno == EINTR)
+   goto again;
+
+   return -1;
+   }
+
+   if (!WIFEXITED(status))
+   return -1;
+
+   return WEXITSTATUS(status);
+}
+
+
+#endif /* __PIDFD_H */
diff --git a/tools/testing/selftests/pidfd/pidfd_open_test.c 
b/tools/testing/selftests/pidfd/pidfd_open_test.c
new file mode 100644
index ..9b073c1ac618
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_open_test.c
@@ -0,0 +1,170 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pidfd.h"
+#include "../kselftest.h"
+
+static inline int sys_pidfd_open(pid_t pid, unsigned int flags)
+{
+   return syscall(__NR_pidfd_open, pid, flags);
+}
+
+static int safe_int(const char *numstr, int *converted)
+{
+   char *err = NULL;
+   long sli;
+
+   errno = 0;
+   sli = strtol(numstr, , 0);
+   if (errno == ERANGE && (sli == LONG_MAX || sli == LONG_MIN))
+   return -ERANGE;
+
+   if (errno != 0 && sli == 0)
+   return -EINVAL;
+
+   if (err == numstr || *err != '\0')
+   return -EINVAL;
+
+   if (sli > INT_MAX || sli < INT_MIN)
+   return -ERANGE;
+
+   *converted = (int)sli;
+   return 0;
+}
+
+static int char_left_gc(const char *buffer, size_t len)
+{
+   size_t i;
+
+   for (i = 0; i < len; i++) {
+   if (buffer[i] == ' ' ||
+   buffer[i] == '\t')
+   continue;
+
+   return i;
+   }
+
+   return 0;
+}
+
+static int char_right_gc(const char *buffer, size_t len)
+{
+   int i;
+
+   for (i = len - 1; i >= 0; i--) {
+   if (buffer[i] == ' '  ||
+   buffer[i] == '\t' ||
+   buffer[i] == '\n' ||
+   buffer[i] == '\0')
+   continue;
+
+   return i + 1;
+   }
+
+   return 0;
+}
+
+static char *trim_whitespace_in_place(char *buffer)
+{
+   buffer += char_left_gc(buffer, strlen(buffer));
+   buffer[char_right_gc(buffer, strlen(buffer))] = '\0';
+   return buffer;
+}
+
+static pid_t get_pid_from_fdinfo_file(int pidfd, const char *key, size_t 
keylen)
+{
+   int ret;
+   char

[PATCH 1/2] pid: add pidfd_open()

2019-05-15 Thread Christian Brauner

This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:

int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);

With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a huge problem
for Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfd for PID-based processes we enable them to adopt this api.

In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.

Signed-off-by: Christian Brauner 
Cc: Arnd Bergmann 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: Thomas Gleixner 
Cc: Jann Horn 
Cc: David Howells 
Cc: Andy Lutomirsky 
Cc: Andrew Morton 
Cc: Oleg Nesterov 
Cc: Aleksa Sarai 
Cc: Linus Torvalds 
Cc: Al Viro 
Cc: linux-...@vger.kernel.org
---
 arch/alpha/kernel/syscalls/syscall.tbl  |  1 +
 arch/arm64/include/asm/unistd32.h   |  2 +
 arch/ia64/kernel/syscalls/syscall.tbl   |  1 +
 arch/m68k/kernel/syscalls/syscall.tbl   |  1 +
 arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
 arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
 arch/parisc/kernel/syscalls/syscall.tbl |  1 +
 arch/powerpc/kernel/syscalls/syscall.tbl|  1 +
 arch/s390/kernel/syscalls/syscall.tbl   |  1 +
 arch/sh/kernel/syscalls/syscall.tbl |  1 +
 arch/sparc/kernel/syscalls/syscall.tbl  |  1 +
 arch/x86/entry/syscalls/syscall_32.tbl  |  1 +
 arch/x86/entry/syscalls/syscall_64.tbl  |  1 +
 arch/xtensa/kernel/syscalls/syscall.tbl |  1 +
 include/linux/pid.h |  1 +
 include/linux/syscalls.h|  1 +
 include/uapi/asm-generic/unistd.h   |  4 +-
 kernel/fork.c   |  2 +-
 kernel/pid.c| 48 +
 19 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/arch/alpha/kernel/syscalls/syscall.tbl 
b/arch/alpha/kernel/syscalls/syscall.tbl
index 165f268beafc..ddc3c93ad7a7 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -467,3 +467,4 @@
 535common  io_uring_setup  sys_io_uring_setup
 536common  io_uring_enter  sys_io_uring_enter
 537common  io_uring_register   sys_io_uring_register
+538common  pidfd_open  sys_pidfd_open
diff --git a/arch/arm64/include/asm/unistd32.h 
b/arch/arm64/include/asm/unistd32.h
index 23f1a44acada..350e2049b4a9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -874,6 +874,8 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
 __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
 #define __NR_io_uring_register 427
 __SYSCALL(__NR_io_uring_register, sys_io_uring_register)
+#define __NR_pidfd_open 428
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
 
 /*
  * Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl 
b/arch/ia64/kernel/syscalls/syscall.tbl
index 56e3d0b685e1..7115f6dd347a 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -348,3 +348,4 @@
 425common  io_uring_setup  sys_io_uring_setup
 426common  io_uring_enter  sys_io_uring_enter
 427common  io_uring_register   sys_io_uring_register
+428common  pidfd_open  sys_pidfd_open
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl 
b/arch/m68k/kernel/syscalls/syscall.tbl
index df4ec3ec71d1..44bf12b16ffe 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -427,3 +427,4 @@
 425common  io_uring_setup  sys_io_uring_setup
 426common  io_uring_enter  sys_io_uring_enter
 427common  io_uring_register   sys_io_uring_register
+428common  pidfd_open  sys_pidfd_open
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl 
b/arch/microblaze/kernel/syscalls/syscall.tbl
index 4964947732af..0d32e5152dc0

[PATCH] crypto: vmx - CTR: always increment IV as quadword

2019-05-15 Thread Daniel Axtens

The kernel self-tests picked up an issue with CTR mode:
alg: skcipher: p8_aes_ctr encryption test failed (wrong result) on test vector 
3, cfg="uneven misaligned splits, may sleep"

Test vector 3 has an IV of FFFD, so
after 3 increments it should wrap around to 0.

In the aesp8-ppc code from OpenSSL, there are two paths that
increment IVs: the bulk (8 at a time) path, and the individual
path which is used when there are fewer than 8 AES blocks to
process.

In the bulk path, the IV is incremented with vadduqm: "Vector
Add Unsigned Quadword Modulo", which does 128-bit addition.

In the individual path, however, the IV is incremented with
vadduwm: "Vector Add Unsigned Word Modulo", which instead
does 4 32-bit additions. Thus the IV would instead become
, throwing off the result.

Use vadduqm.

This was probably a typo originally, what with q and w being
adjacent. It is a pretty narrow edge case: I am really
impressed by the quality of the kernel self-tests!

Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM")
Cc: sta...@vger.kernel.org
Signed-off-by: Daniel Axtens 

---

I'll pass this along internally to get it into OpenSSL as well.
---
 drivers/crypto/vmx/aesp8-ppc.pl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
index de78282b8f44..9c6b5c1d6a1a 100644
--- a/drivers/crypto/vmx/aesp8-ppc.pl
+++ b/drivers/crypto/vmx/aesp8-ppc.pl
@@ -1357,7 +1357,7 @@ Loop_ctr32_enc:
addi$idx,$idx,16
bdnzLoop_ctr32_enc
 
-   vadduwm $ivec,$ivec,$one
+   vadduqm $ivec,$ivec,$one
 vmr$dat,$inptail
 lvx$inptail,0,$inp
 addi   $inp,$inp,16
-- 
2.19.1

Re: [PATCH] powerpc/pseries: Fix xive=off command line

2019-05-15 Thread Cédric Le Goater

On 5/15/19 12:05 PM, Greg Kurz wrote:
> On POWER9, if the hypervisor supports XIVE exploitation mode, the guest OS
> will unconditionally requests for the XIVE interrupt mode even if XIVE was
> deactivated with the kernel command line xive=off. Later on, when the spapr
> XIVE init code handles xive=off, it disables XIVE and tries to fall back on
> the legacy mode XICS.
> 
> This discrepency causes a kernel panic because the hypervisor is configured
> to provide the XIVE interrupt mode to the guest :
> 
> [0.008837] kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
> [0.008877] Oops: Exception in kernel mode, sig: 5 [#1]
> [0.008908] LE SMP NR_CPUS=1024 NUMA pSeries
> [0.008939] Modules linked in:
> [0.008964] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
> 5.0.13-200.fc29.ppc64le #1
> [0.009018] NIP:  c1029ab8 LR: c1029aac CTR: 
> c18e
> [0.009065] REGS: c007f96d7900 TRAP: 0700   Tainted: GW
>   (5.0.13-200.fc29.ppc64le)
> [0.009119] MSR:  82029033   CR: 
> 28000222  XER: 2004
> [0.009168] CFAR: c01b1e28 IRQMASK: 0
> [0.009168] GPR00: c1029aac c007f96d7b90 c15e8600 
> 
> [0.009168] GPR04: 0001  0061 
> 646f6d61696e0d0a
> [0.009168] GPR08: 0007fd8f 0001 c14c44c0 
> c007f96d76cf
> [0.009168] GPR12:  c18e 0001 
> 
> [0.009168] GPR16:  0001 c007f96d7c08 
> c16903d0
> [0.009168] GPR20: c007fffe04e8 ffea c1620164 
> c161fe58
> [0.009168] GPR24: c0ea6c88 c11151a8 006000c0 
> c007f96d7c34
> [0.009168] GPR28:  c14b286c c1115180 
> c161dc70
> [0.009558] NIP [c1029ab8] xics_smp_probe+0x38/0x98
> [0.009590] LR [c1029aac] xics_smp_probe+0x2c/0x98
> [0.009622] Call Trace:
> [0.009639] [c007f96d7b90] [c1029aac] xics_smp_probe+0x2c/0x98 
> (unreliable)
> [0.009687] [c007f96d7bb0] [c1033404] 
> pSeries_smp_probe+0x40/0xa0
> [0.009734] [c007f96d7bd0] [c10212a4] 
> smp_prepare_cpus+0x62c/0x6ec
> [0.009782] [c007f96d7cf0] [c10141b8] 
> kernel_init_freeable+0x148/0x448
> [0.009829] [c007f96d7db0] [c0010ba4] kernel_init+0x2c/0x148
> [0.009870] [c007f96d7e20] [c000bdd4] 
> ret_from_kernel_thread+0x5c/0x68
> [0.009916] Instruction dump:
> [0.009940] 7c0802a6 6000 7c0802a6 3882 f8010010 f821ffe1 3c62001c 
> e863b9a0
> [0.009988] 4b1882d1 6000 7c690034 5529d97e <0b09> 3d22001c 
> e929b998 3ce2ff8f
> 
> Look for xive=off during prom_init and don't ask for XIVE in this case. One
> exception though: if the host only supports XIVE, we still want to boot so
> we ignore xive=off.
> 
> Similarly, have the spapr XIVE init code to looking at the interrupt mode
> negociated during CAS, and ignore xive=off if the hypervisor only supports
> XIVE.
> 
> Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt 
> controller")
> Cc: sta...@vger.kernel.org # v4.20
> Reported-by: Pavithra R. Prakash 
> Signed-off-by: Greg Kurz 


Reviewed-by: Cédric Le Goater 

Thanks,

C.

> ---
> eac1e731b59e is a v4.16 commit actually but this patch only applies
> cleanly to v4.20 and newer. If needed I can send a backport for
> older versions.
> ---
>  arch/powerpc/kernel/prom_init.c  |   16 +++-
>  arch/powerpc/sysdev/xive/spapr.c |   52 
> +-
>  2 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 523bb99d7676..c8f7eb845927 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -172,6 +172,7 @@ static unsigned long __prombss prom_tce_alloc_end;
>  
>  #ifdef CONFIG_PPC_PSERIES
>  static bool __prombss prom_radix_disable;
> +static bool __prombss prom_xive_disable;
>  #endif
>  
>  struct platform_support {
> @@ -808,6 +809,12 @@ static void __init early_cmdline_parse(void)
>   }
>   if (prom_radix_disable)
>   prom_debug("Radix disabled from cmdline\n");
> +
> + opt = prom_strstr(prom_cmd_line, "xive=off");
> + if (opt) {
> + prom_xive_disable = true;
> + prom_debug("XIVE disabled from cmdline\n");
> + }
>  #endif /* CONFIG_PPC_PSERIES */
>  }
>  
> @@ -1216,10 +1223,17 @@ static void __init prom_parse_xive_model(u8 val,
>   switch (val) {
>   case OV5_FEAT(OV5_XIVE_EITHER): /* Either Available */
>   prom_debug("XIVE - either mode supported\n");
> - support->xive = true;
> + support->xive = !prom_xive_disable;
>   break;
>   case OV5_FEAT(OV5_XIVE_EXPLOIT):

Re: Latest Git kernel: Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup()

2019-05-15 Thread Christophe Leroy


Hi,

Le 15/05/2019 à 12:09, Christian Zigotzky a écrit :

Hi All,

I got the following error messages with the latest Git kernel today:

GEN .version
   CHK include/generated/compile.h
   LD  vmlinux.o
   MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x302a): Section mismatch in reference from the 
variable start_here_multiplatform to the function .init.text:.early_setup()

The function start_here_multiplatform() references
the function __init .early_setup().
This is often because start_here_multiplatform lacks a __init
annotation or the annotation of .early_setup is wrong.

   MODINFO modules.builtin.modinfo
   KSYM    .tmp_kallsyms1.o
   KSYM    .tmp_kallsyms2.o
   LD  vmlinux
   SORTEX  vmlinux
   SYSMAP  System.map
   CHKHEAD vmlinux

What does it mean?


I proposed a patch for it at https://patchwork.ozlabs.org/patch/1097845/

Christophe



Please find attached the kernel config.

Thanks,
Christian

[PATCH] powerpc/pseries: Fix xive=off command line

2019-05-15 Thread Greg Kurz

On POWER9, if the hypervisor supports XIVE exploitation mode, the guest OS
will unconditionally requests for the XIVE interrupt mode even if XIVE was
deactivated with the kernel command line xive=off. Later on, when the spapr
XIVE init code handles xive=off, it disables XIVE and tries to fall back on
the legacy mode XICS.

This discrepency causes a kernel panic because the hypervisor is configured
to provide the XIVE interrupt mode to the guest :

[0.008837] kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:135!
[0.008877] Oops: Exception in kernel mode, sig: 5 [#1]
[0.008908] LE SMP NR_CPUS=1024 NUMA pSeries
[0.008939] Modules linked in:
[0.008964] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
5.0.13-200.fc29.ppc64le #1
[0.009018] NIP:  c1029ab8 LR: c1029aac CTR: c18e
[0.009065] REGS: c007f96d7900 TRAP: 0700   Tainted: GW  
(5.0.13-200.fc29.ppc64le)
[0.009119] MSR:  82029033   CR: 28000222  
XER: 2004
[0.009168] CFAR: c01b1e28 IRQMASK: 0
[0.009168] GPR00: c1029aac c007f96d7b90 c15e8600 

[0.009168] GPR04: 0001  0061 
646f6d61696e0d0a
[0.009168] GPR08: 0007fd8f 0001 c14c44c0 
c007f96d76cf
[0.009168] GPR12:  c18e 0001 

[0.009168] GPR16:  0001 c007f96d7c08 
c16903d0
[0.009168] GPR20: c007fffe04e8 ffea c1620164 
c161fe58
[0.009168] GPR24: c0ea6c88 c11151a8 006000c0 
c007f96d7c34
[0.009168] GPR28:  c14b286c c1115180 
c161dc70
[0.009558] NIP [c1029ab8] xics_smp_probe+0x38/0x98
[0.009590] LR [c1029aac] xics_smp_probe+0x2c/0x98
[0.009622] Call Trace:
[0.009639] [c007f96d7b90] [c1029aac] xics_smp_probe+0x2c/0x98 
(unreliable)
[0.009687] [c007f96d7bb0] [c1033404] pSeries_smp_probe+0x40/0xa0
[0.009734] [c007f96d7bd0] [c10212a4] 
smp_prepare_cpus+0x62c/0x6ec
[0.009782] [c007f96d7cf0] [c10141b8] 
kernel_init_freeable+0x148/0x448
[0.009829] [c007f96d7db0] [c0010ba4] kernel_init+0x2c/0x148
[0.009870] [c007f96d7e20] [c000bdd4] 
ret_from_kernel_thread+0x5c/0x68
[0.009916] Instruction dump:
[0.009940] 7c0802a6 6000 7c0802a6 3882 f8010010 f821ffe1 3c62001c 
e863b9a0
[0.009988] 4b1882d1 6000 7c690034 5529d97e <0b09> 3d22001c e929b998 
3ce2ff8f

Look for xive=off during prom_init and don't ask for XIVE in this case. One
exception though: if the host only supports XIVE, we still want to boot so
we ignore xive=off.

Similarly, have the spapr XIVE init code to looking at the interrupt mode
negociated during CAS, and ignore xive=off if the hypervisor only supports
XIVE.

Fixes: eac1e731b59e ("powerpc/xive: guest exploitation of the XIVE interrupt 
controller")
Cc: sta...@vger.kernel.org # v4.20
Reported-by: Pavithra R. Prakash 
Signed-off-by: Greg Kurz 
---
eac1e731b59e is a v4.16 commit actually but this patch only applies
cleanly to v4.20 and newer. If needed I can send a backport for
older versions.
---
 arch/powerpc/kernel/prom_init.c  |   16 +++-
 arch/powerpc/sysdev/xive/spapr.c |   52 +-
 2 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 523bb99d7676..c8f7eb845927 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -172,6 +172,7 @@ static unsigned long __prombss prom_tce_alloc_end;
 
 #ifdef CONFIG_PPC_PSERIES
 static bool __prombss prom_radix_disable;
+static bool __prombss prom_xive_disable;
 #endif
 
 struct platform_support {
@@ -808,6 +809,12 @@ static void __init early_cmdline_parse(void)
}
if (prom_radix_disable)
prom_debug("Radix disabled from cmdline\n");
+
+   opt = prom_strstr(prom_cmd_line, "xive=off");
+   if (opt) {
+   prom_xive_disable = true;
+   prom_debug("XIVE disabled from cmdline\n");
+   }
 #endif /* CONFIG_PPC_PSERIES */
 }
 
@@ -1216,10 +1223,17 @@ static void __init prom_parse_xive_model(u8 val,
switch (val) {
case OV5_FEAT(OV5_XIVE_EITHER): /* Either Available */
prom_debug("XIVE - either mode supported\n");
-   support->xive = true;
+   support->xive = !prom_xive_disable;
break;
case OV5_FEAT(OV5_XIVE_EXPLOIT): /* Only Exploitation mode */
prom_debug("XIVE - exploitation mode supported\n");
+   if (prom_xive_disable) {
+   /*
+* If we __have__ to do XIVE, we're better off ignoring
+* the command

[PATCH] powerpc/mm: Drop VM_BUG_ON in get_region_id

2019-05-15 Thread Aneesh Kumar K.V

We can call get_region_id without validating the ea value. That means
with a wrong ea value we hit the BUG as below.

 kernel BUG at arch/powerpc/include/asm/book3s/64/hash.h:129!
 Oops: Exception in kernel mode, sig: 5 [#1]
 LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
 CPU: 0 PID: 3937 Comm: access_tests Not tainted 5.1.0
 
 NIP [c007ba20] do_slb_fault+0x70/0x320
 LR [c000896c] data_access_slb_common+0x15c/0x1a0

Fix this by removing the VM_BUG_ON. All callers make sure the returned region id
is valid and error out otherwise.

Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 
0xc range")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 1d1183048cfd..5486087e64ea 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -122,11 +122,9 @@ static inline int get_region_id(unsigned long ea)
if (ea < H_KERN_VIRT_START)
return LINEAR_MAP_REGION_ID;
 
-   VM_BUG_ON(id != 0xc);
BUILD_BUG_ON(NON_LINEAR_REGION_ID(H_VMALLOC_START) != 2);
 
region_id = NON_LINEAR_REGION_ID(ea);
-   VM_BUG_ON(region_id > VMEMMAP_REGION_ID);
return region_id;
 }
 
-- 
2.21.0

Re: [PATCH] powerpc: Remove double free

2019-05-15 Thread Christophe Leroy


kobject_put() released index_dir->kobj

but who will release 'index' ?

Christophe

Le 15/05/2019 à 11:07, Tobin C. Harding a écrit :

kfree() after kobject_put().  Who ever wrote this was on crack.

Fixes: 7e8039795a80 ("powerpc/cacheinfo: Fix kobject memleak")
Signed-off-by: Tobin C. Harding 
---

FTR

git log --pretty=format:"%h%x09%an%x09%ad%x09%s" | grep 7e8039795a80
7e8039795a80Tobin C. HardingTue Apr 30 11:09:23 2019 +1000  
powerpc/cacheinfo: Fix kobject memleak

  arch/powerpc/kernel/cacheinfo.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index f2ed3ef4b129..862e2890bd3d 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -767,7 +767,6 @@ static void cacheinfo_create_index_dir(struct cache *cache, 
int index,
  cache_dir->kobj, "index%d", index);
if (rc) {
kobject_put(_dir->kobj);
-   kfree(index_dir);
return;
}

[PATCH] powerpc: Remove double free

2019-05-15 Thread Tobin C. Harding

kfree() after kobject_put().  Who ever wrote this was on crack.

Fixes: 7e8039795a80 ("powerpc/cacheinfo: Fix kobject memleak")
Signed-off-by: Tobin C. Harding 
---

FTR

git log --pretty=format:"%h%x09%an%x09%ad%x09%s" | grep 7e8039795a80
7e8039795a80Tobin C. HardingTue Apr 30 11:09:23 2019 +1000  
powerpc/cacheinfo: Fix kobject memleak

 arch/powerpc/kernel/cacheinfo.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index f2ed3ef4b129..862e2890bd3d 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -767,7 +767,6 @@ static void cacheinfo_create_index_dir(struct cache *cache, 
int index,
  cache_dir->kobj, "index%d", index);
if (rc) {
kobject_put(_dir->kobj);
-   kfree(index_dir);
return;
}
 
-- 
2.21.0

RE: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-15 Thread David Laight

From: Petr Mladek
> Sent: 15 May 2019 08:36
> On Tue 2019-05-14 14:37:51, Steven Rostedt wrote:
> >
> > [ Purple is a nice shade on the bike shed. ;-) ]
> >
> > On Tue, 14 May 2019 11:02:17 +0200
> > Geert Uytterhoeven  wrote:
> >
> > > On Tue, May 14, 2019 at 10:29 AM David Laight  
> > > wrote:
> > > > > And I like Steven's "(fault)" idea.
> > > > > How about this:
> > > > >
> > > > >   if ptr < PAGE_SIZE  -> "(null)"
> > > > >   if IS_ERR_VALUE(ptr)-> "(fault)"
> > > > >
> > > > >   -ss
> > > >
> > > > Or:
> > > > if (ptr < PAGE_SIZE)
> > > > return ptr ? "(null+)" : "(null)";
> >
> > Hmm, that is useful.
> >
> > > > if IS_ERR_VALUE(ptr)
> > > > return "(errno)"
> >
> > I still prefer "(fault)" as is pretty much all I would expect from a
> > pointer dereference, even if it is just bad parsing of, say, a parsing
> > an MAC address. "fault" is generic enough. "errno" will be confusing,
> > because that's normally a variable not a output.
> >
> > >
> > > Do we care about the value? "(-E%u)"?
> >
> > That too could be confusing. What would (-E22) be considered by a user
> > doing an sprintf() on some string. I know that would confuse me, or I
> > would think that it was what the %pX displayed, and wonder why it
> > displayed it that way. Whereas "(fault)" is quite obvious for any %p
> > use case.
> 
> This discussion clearly shows that it is hard to make anyone happy.
> 
> I considered switching to "(fault)" because there seems to be more
> people in favor of this.
> 
> But there is used also "(einval)" when an unsupported pointer
> modifier is passed. The idea is to show error codes that people
> are familiar with.
> 
> It might have been better to use the uppercase "(EFAULT)" and
> "(EINVAL)" to make it more obvious. But I wanted to follow
> the existing style with the lowercase "(null)".

Printing 'fault' when the code was (trying to) validate the
address was ok.
When the only check is for an -errno value it seems wrong as
most invalid addresses will actually fault (and panic).

The reason modern printf generate "(null)" is that it is far too
easy for a diagnostic print to fail to test a pointer.
It also makes it easier when 'throwing in' printf while debugging
to add a single trace that will work regardless of whether a
call had succeeded or not.

With the Linux kernel putting errno values into pointers it
seems likely that most invalid pointers in printf will actaully
be error values.
Printing the value will be helpful during debugging - as a
trace can be put after a call and show the parameters and result.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH stable 4.9] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Greg KH

On Wed, May 15, 2019 at 06:40:47AM +, Christophe Leroy wrote:
> [Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]
> 
> On powerpc32, patch_instruction() is called by apply_feature_fixups()
> which is called from early_init()
> 
> There is the following note in front of early_init():
>  * Note that the kernel may be running at an address which is different
>  * from the address that it was linked at, so we must use RELOC/PTRRELOC
>  * to access static data (including strings).  -- paulus
> 
> Therefore init_mem_is_free must be accessed with PTRRELOC()
> 
> Fixes: 1c38a84d4586 ("powerpc: Avoid code patching freed init sections")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
> Signed-off-by: Christophe Leroy 
> 
> ---
> Can't apply the upstream commit as such due to several other unrelated stuff
> like for instance STRICT_KERNEL_RWX which are missing.
> So instead, using same approach as for commit 
> 252eb55816a6f69ef9464cad303cdb3326cdc61d

Now queued up, thanks.

greg k-h

Re: [PATCH 2/3] arm64: dts: ls1028a: Add PCIe controller DT nodes

2019-05-15 Thread Arnd Bergmann

On Wed, May 15, 2019 at 9:36 AM Xiaowei Bao  wrote:
> Signed-off-by: Xiaowei Bao 
> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |   52 
> 
>  1 files changed, 52 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index b045812..50b579b 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -398,6 +398,58 @@
> status = "disabled";
> };
>
> +   pcie@340 {
> +   compatible = "fsl,ls1028a-pcie";
> +   reg = <0x00 0x0340 0x0 0x0010   /* controller 
> registers */
> +  0x80 0x 0x0 0x2000>; /* 
> configuration space */
> +   reg-names = "regs", "config";
> +   interrupts = , /* 
> PME interrupt */
> +; /* 
> aer interrupt */
> +   interrupt-names = "pme", "aer";
> +   #address-cells = <3>;
> +   #size-cells = <2>;
> +   device_type = "pci";
> +   dma-coherent;
> +   num-lanes = <4>;
> +   bus-range = <0x0 0xff>;
> +   ranges = <0x8100 0x0 0x 0x80 0x0001 
> 0x0 0x0001   /* downstream I/O */
> + 0x8200 0x0 0x4000 0x80 0x4000 
> 0x0 0x4000>; /* non-prefetchable memory */

Are you sure there is no support for 64-bit BARs or prefetchable memory?

Is this a hardware bug, or something that can be fixed in firmware?

   Arnd

Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-15 Thread Petr Mladek

On Wed 2019-05-15 09:23:05, Geert Uytterhoeven wrote:
> Hi Steve,
> 
> On Tue, May 14, 2019 at 9:35 PM Steven Rostedt  wrote:
> > On Tue, 14 May 2019 21:13:06 +0200
> > Geert Uytterhoeven  wrote:
> > > > > Do we care about the value? "(-E%u)"?
> > > >
> > > > That too could be confusing. What would (-E22) be considered by a user
> > > > doing an sprintf() on some string. I know that would confuse me, or I
> > > > would think that it was what the %pX displayed, and wonder why it
> > > > displayed it that way. Whereas "(fault)" is quite obvious for any %p
> > > > use case.
> > >
> > > I would immediately understand there's a missing IS_ERR() check in a
> > > function that can return  -EINVAL, without having to add a new printk()
> > > to find out what kind of bogus value has been received, and without
> > > having to reboot, and trying to reproduce...
> >
> > I have to ask. Has there actually been a case that you used a %pX and
> > it faulted, and you had to go back to find what the value of the
> > failure was?
> 
> If it faulted, the bad pointer value is obvious from the backtrace.
> If the code avoids the fault by verifying the pointer and returning
> "(efault)" instead, the bad pointer value is lost.
> 
> Or am I missing something?

Should buggy printk() crash the system?

Another problem is that vsprintf() is called in printk() under
lockbuf_lock. The messages are stored into printk_safe per CPU
buffers. It allows to see the nested messages. But there is still
a bigger risk of missing them than with a "normal" fault.

Finally, various variants of these checks were already used
in "random" printf formats. The only change is that we are
using them consistently everywhere[*] a pointer is accessed.

[*] Just the top level pointer is checked. Some pointer modifiers
are accessing ptr->ptr->val. The lower level pointers are not
checked to avoid too much complexity.

Best Regards,
Petr

[PATCH v3] powerpc/pseries: Fix cpu_hotplug_lock acquisition in resize_hpt()

2019-05-15 Thread Gautham R. Shenoy

From: "Gautham R. Shenoy" 

The calls to arch_add_memory()/arch_remove_memory() are always made
with the read-side cpu_hotplug_lock acquired via
memory_hotplug_begin().  On pSeries,
arch_add_memory()/arch_remove_memory() eventually call resize_hpt()
which in turn calls stop_machine() which acquires the read-side
cpu_hotplug_lock again, thereby resulting in the recursive acquisition
of this lock.

Lockdep complains as follows in these code-paths.

 swapper/0/1 is trying to acquire lock:
 (ptrval) (cpu_hotplug_lock.rw_sem){}, at: stop_machine+0x2c/0x60

but task is already holding lock:
(ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
mem_hotplug_begin+0x20/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

CPU0

   lock(cpu_hotplug_lock.rw_sem);
   lock(cpu_hotplug_lock.rw_sem);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 3 locks held by swapper/0/1:
  #0: (ptrval) (>mutex){}, at: __driver_attach+0x12c/0x1b0
  #1: (ptrval) (cpu_hotplug_lock.rw_sem){}, at: 
mem_hotplug_begin+0x20/0x50
  #2: (ptrval) (mem_hotplug_lock.rw_sem){}, at: 
percpu_down_write+0x54/0x1a0

stack backtrace:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc5-58373-gbc99402235f3-dirty 
#166
 Call Trace:
 [c000feb03150] [c0e32bd4] dump_stack+0xe8/0x164 (unreliable)
 [c000feb031a0] [c020d6c0] __lock_acquire+0x1110/0x1c70
 [c000feb03320] [c020f080] lock_acquire+0x240/0x290
 [c000feb033e0] [c017f554] cpus_read_lock+0x64/0xf0
 [c000feb03420] [c029ebac] stop_machine+0x2c/0x60
 [c000feb03460] [c00d7f7c] pseries_lpar_resize_hpt+0x19c/0x2c0
 [c000feb03500] [c00788d0] resize_hpt_for_hotplug+0x70/0xd0
 [c000feb03570] [c0e5d278] arch_add_memory+0x58/0xfc
 [c000feb03610] [c03553a8] devm_memremap_pages+0x5e8/0x8f0
 [c000feb036c0] [c09c2394] pmem_attach_disk+0x764/0x830
 [c000feb037d0] [c09a7c38] nvdimm_bus_probe+0x118/0x240
 [c000feb03860] [c0968500] really_probe+0x230/0x4b0
 [c000feb038f0] [c0968aec] driver_probe_device+0x16c/0x1e0
 [c000feb03970] [c0968ca8] __driver_attach+0x148/0x1b0
 [c000feb039f0] [c09650b0] bus_for_each_dev+0x90/0x130
 [c000feb03a50] [c0967dd4] driver_attach+0x34/0x50
 [c000feb03a70] [c0967068] bus_add_driver+0x1a8/0x360
 [c000feb03b00] [c096a498] driver_register+0x108/0x170
 [c000feb03b70] [c09a7400] __nd_driver_register+0xd0/0xf0
 [c000feb03bd0] [c128aa90] nd_pmem_driver_init+0x34/0x48
 [c000feb03bf0] [c0010a10] do_one_initcall+0x1e0/0x45c
 [c000feb03cd0] [c122462c] kernel_init_freeable+0x540/0x64c
 [c000feb03db0] [c001110c] kernel_init+0x2c/0x160
 [c000feb03e20] [c000bed4] ret_from_kernel_thread+0x5c/0x68

Fix this issue by
  1) Requiring all the calls to pseries_lpar_resize_hpt() be made
 with cpu_hotplug_lock held.

  2) In pseries_lpar_resize_hpt() invoke stop_machine_cpuslocked()
 as a consequence of 1)

  3) To satisfy 1), in hpt_order_set(), call mmu_hash_ops.resize_hpt()
 with cpu_hotplug_lock held.

Reported-by: Aneesh Kumar K.V 
Signed-off-by: Gautham R. Shenoy 
---
v2 -> v3 : Updated the comment for pseries_lpar_resize_hpt()
   Updated the commit-log with the full backtrace.
v1 -> v2 : Rebased against powerpc/next instead of linux/master

 arch/powerpc/mm/book3s64/hash_utils.c | 9 -
 arch/powerpc/platforms/pseries/lpar.c | 8 ++--
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index 919a861..d07fcafd 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1928,10 +1929,16 @@ static int hpt_order_get(void *data, u64 *val)
 
 static int hpt_order_set(void *data, u64 val)
 {
+   int ret;
+
if (!mmu_hash_ops.resize_hpt)
return -ENODEV;
 
-   return mmu_hash_ops.resize_hpt(val);
+   cpus_read_lock();
+   ret = mmu_hash_ops.resize_hpt(val);
+   cpus_read_unlock();
+
+   return ret;
 }
 
 DEFINE_DEBUGFS_ATTRIBUTE(fops_hpt_order, hpt_order_get, hpt_order_set, 
"%llu\n");
diff --git a/arch/powerpc/platforms/pseries/lpar.c 
b/arch/powerpc/platforms/pseries/lpar.c
index 1034ef1..557d592 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -859,7 +859,10 @@ static int pseries_lpar_resize_hpt_commit(void *data)
return 0;
 }
 
-/* Must be called in user context */
+/*
+ * Must be called in process context. The caller must hold the
+ * cpus_lock.
+ */
 static int pseries_lpar_resize_hpt(unsigned long shift)
 {
struct hpt_resize_state state = {
@@ -913,7

[PATCH 3/3] PCI: layerscape: Add LS1028a support

2019-05-15 Thread Xiaowei Bao

Add support for the LS1028a PCIe controller.

Signed-off-by: Xiaowei Bao 
---
 drivers/pci/controller/dwc/pci-layerscape.c |9 +
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 3a5fa26..8c556e1 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -236,6 +236,14 @@ static int ls_pcie_msi_host_init(struct pcie_port *pp)
.dw_pcie_ops = _ls_pcie_ops,
 };
 
+static const struct ls_pcie_drvdata ls1028a_drvdata = {
+   .lut_offset = 0x8,
+   .ltssm_shift = 0,
+   .lut_dbg = 0x407fc,
+   .ops = _pcie_host_ops,
+   .dw_pcie_ops = _ls_pcie_ops,
+};
+
 static const struct ls_pcie_drvdata ls1046_drvdata = {
.lut_offset = 0x8,
.ltssm_shift = 24,
@@ -263,6 +271,7 @@ static int ls_pcie_msi_host_init(struct pcie_port *pp)
 static const struct of_device_id ls_pcie_of_match[] = {
{ .compatible = "fsl,ls1012a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1021a-pcie", .data = _drvdata },
+   { .compatible = "fsl,ls1028a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1043a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls1046a-pcie", .data = _drvdata },
{ .compatible = "fsl,ls2080a-pcie", .data = _drvdata },
-- 
1.7.1

[PATCH 2/3] arm64: dts: ls1028a: Add PCIe controller DT nodes

2019-05-15 Thread Xiaowei Bao

LS1028a implements 2 PCIe 3.0 controllers.

Signed-off-by: Xiaowei Bao 
---
 arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi |   52 
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
index b045812..50b579b 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
@@ -398,6 +398,58 @@
status = "disabled";
};
 
+   pcie@340 {
+   compatible = "fsl,ls1028a-pcie";
+   reg = <0x00 0x0340 0x0 0x0010   /* controller 
registers */
+  0x80 0x 0x0 0x2000>; /* 
configuration space */
+   reg-names = "regs", "config";
+   interrupts = , /* PME 
interrupt */
+; /* aer 
interrupt */
+   interrupt-names = "pme", "aer";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   dma-coherent;
+   num-lanes = <4>;
+   bus-range = <0x0 0xff>;
+   ranges = <0x8100 0x0 0x 0x80 0x0001 0x0 
0x0001   /* downstream I/O */
+ 0x8200 0x0 0x4000 0x80 0x4000 0x0 
0x4000>; /* non-prefetchable memory */
+   msi-parent = <>;
+   #interrupt-cells = <1>;
+   interrupt-map-mask = <0 0 0 7>;
+   interrupt-map = < 0 0 1  GIC_SPI 109 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 2  GIC_SPI 110 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 3  GIC_SPI 111 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 4  GIC_SPI 112 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
+   };
+
+   pcie@350 {
+   compatible = "fsl,ls1028a-pcie";
+   reg = <0x00 0x0350 0x0 0x0010   /* controller 
registers */
+  0x88 0x 0x0 0x2000>; /* 
configuration space */
+   reg-names = "regs", "config";
+   interrupts = ,
+;
+   interrupt-names = "pme", "aer";
+   #address-cells = <3>;
+   #size-cells = <2>;
+   device_type = "pci";
+   dma-coherent;
+   num-lanes = <4>;
+   bus-range = <0x0 0xff>;
+   ranges = <0x8100 0x0 0x 0x88 0x0001 0x0 
0x0001   /* downstream I/O */
+ 0x8200 0x0 0x4000 0x88 0x4000 0x0 
0x4000>; /* non-prefetchable memory */
+   msi-parent = <>;
+   #interrupt-cells = <1>;
+   interrupt-map-mask = <0 0 0 7>;
+   interrupt-map = < 0 0 1  GIC_SPI 114 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 2  GIC_SPI 115 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 3  GIC_SPI 116 
IRQ_TYPE_LEVEL_HIGH>,
+   < 0 0 4  GIC_SPI 117 
IRQ_TYPE_LEVEL_HIGH>;
+   status = "disabled";
+   };
+
pcie@1f000 { /* Integrated Endpoint Root Complex */
compatible = "pci-host-ecam-generic";
reg = <0x01 0xf000 0x0 0x10>;
-- 
1.7.1

[PATCH 1/3] dt-bindings: pci: layerscape-pci: add compatible strings "fsl, ls1028a-pcie"

2019-05-15 Thread Xiaowei Bao

Add the PCIe compatible string for LS1028A

Signed-off-by: Xiaowei Bao 
---
 .../devicetree/bindings/pci/layerscape-pci.txt |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
index e20ceaa..99a386e 100644
--- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
+++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
@@ -21,6 +21,7 @@ Required properties:
 "fsl,ls1046a-pcie"
 "fsl,ls1043a-pcie"
 "fsl,ls1012a-pcie"
+"fsl,ls1028a-pcie"
   EP mode:
"fsl,ls1046a-pcie-ep", "fsl,ls-pcie-ep"
 - reg: base addresses and lengths of the PCIe controller register blocks.
-- 
1.7.1

Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-15 Thread Petr Mladek

On Tue 2019-05-14 14:37:51, Steven Rostedt wrote:
> 
> [ Purple is a nice shade on the bike shed. ;-) ]
> 
> On Tue, 14 May 2019 11:02:17 +0200
> Geert Uytterhoeven  wrote:
> 
> > On Tue, May 14, 2019 at 10:29 AM David Laight  
> > wrote:
> > > > And I like Steven's "(fault)" idea.
> > > > How about this:
> > > >
> > > >   if ptr < PAGE_SIZE  -> "(null)"
> > > >   if IS_ERR_VALUE(ptr)-> "(fault)"
> > > >
> > > >   -ss  
> > >
> > > Or:
> > > if (ptr < PAGE_SIZE)
> > > return ptr ? "(null+)" : "(null)";
> 
> Hmm, that is useful.
> 
> > > if IS_ERR_VALUE(ptr)
> > > return "(errno)"  
> 
> I still prefer "(fault)" as is pretty much all I would expect from a
> pointer dereference, even if it is just bad parsing of, say, a parsing
> an MAC address. "fault" is generic enough. "errno" will be confusing,
> because that's normally a variable not a output.
> 
> > 
> > Do we care about the value? "(-E%u)"?
> 
> That too could be confusing. What would (-E22) be considered by a user
> doing an sprintf() on some string. I know that would confuse me, or I
> would think that it was what the %pX displayed, and wonder why it
> displayed it that way. Whereas "(fault)" is quite obvious for any %p
> use case.

This discussion clearly shows that it is hard to make anyone happy.

I considered switching to "(fault)" because there seems to be more
people in favor of this.

But there is used also "(einval)" when an unsupported pointer
modifier is passed. The idea is to show error codes that people
are familiar with.

It might have been better to use the uppercase "(EFAULT)" and
"(EINVAL)" to make it more obvious. But I wanted to follow
the existing style with the lowercase "(null)".

As of now, I think that we should keep it as is unless there is
some wider agreement on a change.

Best Regards,
Petr

Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-15 Thread Geert Uytterhoeven

Hi Steve,

On Tue, May 14, 2019 at 9:35 PM Steven Rostedt  wrote:
> On Tue, 14 May 2019 21:13:06 +0200
> Geert Uytterhoeven  wrote:
> > > > Do we care about the value? "(-E%u)"?
> > >
> > > That too could be confusing. What would (-E22) be considered by a user
> > > doing an sprintf() on some string. I know that would confuse me, or I
> > > would think that it was what the %pX displayed, and wonder why it
> > > displayed it that way. Whereas "(fault)" is quite obvious for any %p
> > > use case.
> >
> > I would immediately understand there's a missing IS_ERR() check in a
> > function that can return  -EINVAL, without having to add a new printk()
> > to find out what kind of bogus value has been received, and without
> > having to reboot, and trying to reproduce...
>
> I have to ask. Has there actually been a case that you used a %pX and
> it faulted, and you had to go back to find what the value of the
> failure was?

If it faulted, the bad pointer value is obvious from the backtrace.
If the code avoids the fault by verifying the pointer and returning
"(efault)" instead, the bad pointer value is lost.

Or am I missing something?

Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Re: [PATCH, RFC] byteorder: sanity check toolchain vs kernel endianess

2019-05-15 Thread Arnd Bergmann

On Mon, May 13, 2019 at 2:04 PM Christoph Hellwig  wrote:
>
> On Mon, May 13, 2019 at 01:50:19PM +0200, Dmitry Vyukov wrote:
> > > We did have some bugs in the past (~1-2 y/ago) but AFAIK they are all
> > > fixed now. These days I build most of my kernels with a bi-endian 64-bit
> > > toolchain, and switching endian without running `make clean` also works.
> >
> > For the record, yes, it turn out to be a problem in our code (a latent
> > bug). We actually used host (x86) gcc to build as-if ppc code that can
> > run on the host, so it defined neither LE no BE macros. It just
> > happened to work in the past :)
>
> So Nick was right and these checks actually are useful..

Yes, definitely. I wonder if we should also bring back the word size check
from include/asm-generic/bitsperlong.h, which was disabled right
after I originally added that.

  Arnd

Re: [RFC PATCH] powerpc/mm: Implement STRICT_MODULE_RWX

2019-05-15 Thread Christophe Leroy





Le 15/05/2019 à 08:42, Christoph Hellwig a écrit :

+static int change_page_ro(pte_t *ptep, pgtable_t token, unsigned long addr, 
void *data)


There are a couple way too long lines like this in the patch.



powerpc arch accepts 90 chars per line, see arch/powerpc/tools/checkpatch.pl

Christophe

Re: [RFC PATCH] powerpc/mm: Implement STRICT_MODULE_RWX

2019-05-15 Thread Christoph Hellwig

> +static int change_page_ro(pte_t *ptep, pgtable_t token, unsigned long addr, 
> void *data)

There are a couple way too long lines like this in the patch.

[PATCH RESEND V6 3/3] ASoC: fsl_asrc: Unify the supported input and output rate

2019-05-15 Thread S.j. Wang

Unify the supported input and output rate, add the
12kHz/24kHz/128kHz to the support list

Signed-off-by: Shengjiu Wang 
Acked-by: Nicolin Chen 
---
 sound/soc/fsl/fsl_asrc.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index a8d6710f2541..cbbf6257f08a 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -27,13 +27,14 @@
dev_dbg(_priv->pdev->dev, "Pair %c: " fmt, 'A' + index, 
##__VA_ARGS__)
 
 /* Corresponding to process_option */
-static int supported_input_rate[] = {
-   5512, 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200,
-   96000, 176400, 192000,
+static unsigned int supported_asrc_rate[] = {
+   5512, 8000, 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000,
+   64000, 88200, 96000, 128000, 176400, 192000,
 };
 
-static int supported_asrc_rate[] = {
-   8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200, 96000, 
176400, 192000,
+static struct snd_pcm_hw_constraint_list fsl_asrc_rate_constraints = {
+   .count = ARRAY_SIZE(supported_asrc_rate),
+   .list = supported_asrc_rate,
 };
 
 /**
@@ -293,11 +294,11 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair 
*pair)
ideal = config->inclk == INCLK_NONE;
 
/* Validate input and output sample rates */
-   for (in = 0; in < ARRAY_SIZE(supported_input_rate); in++)
-   if (inrate == supported_input_rate[in])
+   for (in = 0; in < ARRAY_SIZE(supported_asrc_rate); in++)
+   if (inrate == supported_asrc_rate[in])
break;
 
-   if (in == ARRAY_SIZE(supported_input_rate)) {
+   if (in == ARRAY_SIZE(supported_asrc_rate)) {
pair_err("unsupported input sample rate: %dHz\n", inrate);
return -EINVAL;
}
@@ -311,7 +312,7 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair *pair)
return -EINVAL;
}
 
-   if ((outrate >= 8000 && outrate <= 3) &&
+   if ((outrate >= 5512 && outrate <= 3) &&
(outrate > 24 * inrate || inrate > 8 * outrate)) {
pair_err("exceed supported ratio range [1/24, 8] for \
inrate/outrate: %d/%d\n", inrate, outrate);
@@ -486,7 +487,9 @@ static int fsl_asrc_dai_startup(struct snd_pcm_substream 
*substream,
snd_pcm_hw_constraint_step(substream->runtime, 0,
   SNDRV_PCM_HW_PARAM_CHANNELS, 2);
 
-   return 0;
+
+   return snd_pcm_hw_constraint_list(substream->runtime, 0,
+   SNDRV_PCM_HW_PARAM_RATE, _asrc_rate_constraints);
 }
 
 static int fsl_asrc_dai_hw_params(struct snd_pcm_substream *substream,
@@ -599,7 +602,6 @@ static int fsl_asrc_dai_probe(struct snd_soc_dai *dai)
return 0;
 }
 
-#define FSL_ASRC_RATES  SNDRV_PCM_RATE_8000_192000
 #define FSL_ASRC_FORMATS   (SNDRV_PCM_FMTBIT_S24_LE | \
 SNDRV_PCM_FMTBIT_S16_LE | \
 SNDRV_PCM_FMTBIT_S20_3LE)
@@ -610,14 +612,18 @@ static struct snd_soc_dai_driver fsl_asrc_dai = {
.stream_name = "ASRC-Playback",
.channels_min = 1,
.channels_max = 10,
-   .rates = FSL_ASRC_RATES,
+   .rate_min = 5512,
+   .rate_max = 192000,
+   .rates = SNDRV_PCM_RATE_KNOT,
.formats = FSL_ASRC_FORMATS,
},
.capture = {
.stream_name = "ASRC-Capture",
.channels_min = 1,
.channels_max = 10,
-   .rates = FSL_ASRC_RATES,
+   .rate_min = 5512,
+   .rate_max = 192000,
+   .rates = SNDRV_PCM_RATE_KNOT,
.formats = FSL_ASRC_FORMATS,
},
.ops = _asrc_dai_ops,
-- 
2.21.0

[PATCH RESEND V6 2/3] ASoC: fsl_asrc: replace the process_option table with function

2019-05-15 Thread S.j. Wang

When we want to support more sample rate, for example 12kHz/24kHz
we need update the process_option table, if we want to support more
sample rate next time, the table need to be updated again. which
is not flexible.

We got a function fsl_asrc_sel_proc to replace the table, which can
give the pre-processing and post-processing options according to
the sample rate.

Signed-off-by: Shengjiu Wang 
Acked-by: Nicolin Chen 
---
 sound/soc/fsl/fsl_asrc.c | 71 +---
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index ea035c12a325..a8d6710f2541 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -26,24 +26,6 @@
 #define pair_dbg(fmt, ...) \
dev_dbg(_priv->pdev->dev, "Pair %c: " fmt, 'A' + index, 
##__VA_ARGS__)
 
-/* Sample rates are aligned with that defined in pcm.h file */
-static const u8 process_option[][12][2] = {
-   /* 8kHz 11.025kHz 16kHz 22.05kHz 32kHz 44.1kHz 48kHz   64kHz   88.2kHz 
96kHz   176kHz  192kHz */
-   {{0, 1}, {0, 1}, {0, 1}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 
0}, {0, 0}, {0, 0}, {0, 0},},  /* 5512Hz */
-   {{0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 
0}, {0, 0}, {0, 0}, {0, 0},},  /* 8kHz */
-   {{0, 2}, {0, 1}, {0, 1}, {0, 1}, {0, 0}, {0, 0}, {0, 0}, {0, 0}, {0, 
0}, {0, 0}, {0, 0}, {0, 0},},  /* 11025Hz */
-   {{1, 2}, {0, 2}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 0}, {0, 
0}, {0, 0}, {0, 0}, {0, 0},},  /* 16kHz */
-   {{1, 2}, {1, 2}, {0, 2}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 0}, {0, 
0}, {0, 0}, {0, 0}, {0, 0},},  /* 22050Hz */
-   {{1, 2}, {2, 1}, {2, 1}, {0, 2}, {0, 1}, {0, 1}, {0, 1}, {0, 1}, {0, 
1}, {0, 0}, {0, 0}, {0, 0},},  /* 32kHz */
-   {{2, 2}, {2, 2}, {2, 1}, {2, 1}, {0, 2}, {0, 1}, {0, 1}, {0, 1}, {0, 
1}, {0, 1}, {0, 0}, {0, 0},},  /* 44.1kHz */
-   {{2, 2}, {2, 2}, {2, 1}, {2, 1}, {0, 2}, {0, 2}, {0, 1}, {0, 1}, {0, 
1}, {0, 1}, {0, 0}, {0, 0},},  /* 48kHz */
-   {{2, 2}, {2, 2}, {2, 2}, {2, 1}, {1, 2}, {0, 2}, {0, 2}, {0, 1}, {0, 
1}, {0, 1}, {0, 1}, {0, 0},},  /* 64kHz */
-   {{2, 2}, {2, 2}, {2, 2}, {2, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 1}, {1, 
1}, {1, 1}, {1, 1}, {1, 1},},  /* 88.2kHz */
-   {{2, 2}, {2, 2}, {2, 2}, {2, 2}, {1, 2}, {1, 2}, {1, 2}, {1, 1}, {1, 
1}, {1, 1}, {1, 1}, {1, 1},},  /* 96kHz */
-   {{2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 1}, {2, 
1}, {2, 1}, {2, 1}, {2, 1},},  /* 176kHz */
-   {{2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 2}, {2, 1}, {2, 
1}, {2, 1}, {2, 1}, {2, 1},},  /* 192kHz */
-};
-
 /* Corresponding to process_option */
 static int supported_input_rate[] = {
5512, 8000, 11025, 16000, 22050, 32000, 44100, 48000, 64000, 88200,
@@ -79,6 +61,52 @@ static unsigned char output_clk_map_imx53[] = {
 
 static unsigned char *clk_map[2];
 
+/**
+ * Select the pre-processing and post-processing options
+ * Make sure to exclude following unsupported cases before
+ * calling this function:
+ * 1) inrate > 8.125 * outrate
+ * 2) inrate > 16.125 * outrate
+ *
+ * inrate: input sample rate
+ * outrate: output sample rate
+ * pre_proc: return value for pre-processing option
+ * post_proc: return value for post-processing option
+ */
+static void fsl_asrc_sel_proc(int inrate, int outrate,
+int *pre_proc, int *post_proc)
+{
+   bool post_proc_cond2;
+   bool post_proc_cond0;
+
+   /* select pre_proc between [0, 2] */
+   if (inrate * 8 > 33 * outrate)
+   *pre_proc = 2;
+   else if (inrate * 8 > 15 * outrate) {
+   if (inrate > 152000)
+   *pre_proc = 2;
+   else
+   *pre_proc = 1;
+   } else if (inrate < 76000)
+   *pre_proc = 0;
+   else if (inrate > 152000)
+   *pre_proc = 2;
+   else
+   *pre_proc = 1;
+
+   /* Condition for selection of post-processing */
+   post_proc_cond2 = (inrate * 15 > outrate * 16 && outrate < 56000) ||
+ (inrate > 56000 && outrate < 56000);
+   post_proc_cond0 = inrate * 23 < outrate * 8;
+
+   if (post_proc_cond2)
+   *post_proc = 2;
+   else if (post_proc_cond0)
+   *post_proc = 0;
+   else
+   *post_proc = 1;
+}
+
 /**
  * Request ASRC pair
  *
@@ -239,6 +267,7 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair *pair)
u32 inrate, outrate, indiv, outdiv;
u32 clk_index[2], div[2];
int in, out, channels;
+   int pre_proc, post_proc;
struct clk *clk;
bool ideal;
 
@@ -377,11 +406,13 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair 
*pair)
   ASRCTR_IDRi_MASK(index) | ASRCTR_USRi_MASK(index),
   ASRCTR_IDR(index) | ASRCTR_USR(index));
 
+

[PATCH RESEND V6 1/3] ASoC: fsl_asrc: Fix the issue about unsupported rate

2019-05-15 Thread S.j. Wang

When the output sample rate is [8kHz, 30kHz], the limitation
of the supported ratio range is [1/24, 8]. In the driver
we use (8kHz, 30kHz) instead of [8kHz, 30kHz].
So this patch is to fix this issue and the potential rounding
issue with divider.

Fixes: fff6e03c7b65 ("ASoC: fsl_asrc: add support for 8-30kHz
output sample rate")
Cc: 
Signed-off-by: Shengjiu Wang 
Acked-by: Nicolin Chen 
---
 sound/soc/fsl/fsl_asrc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 0b937924d2e4..ea035c12a325 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -282,8 +282,8 @@ static int fsl_asrc_config_pair(struct fsl_asrc_pair *pair)
return -EINVAL;
}
 
-   if ((outrate > 8000 && outrate < 3) &&
-   (outrate/inrate > 24 || inrate/outrate > 8)) {
+   if ((outrate >= 8000 && outrate <= 3) &&
+   (outrate > 24 * inrate || inrate > 8 * outrate)) {
pair_err("exceed supported ratio range [1/24, 8] for \
inrate/outrate: %d/%d\n", inrate, outrate);
return -EINVAL;
-- 
2.21.0

[PATCH RESEND V6 0/3] Support more sample rate in asrc

2019-05-15 Thread S.j. Wang

Support more sample rate in asrc

Shengjiu Wang (3):
  ASoC: fsl_asrc: Fix the issue about unsupported rate
  ASoC: fsl_asrc: replace the process_option table with function
  ASoC: fsl_asrc: Unify the supported input and output rate

Changes in RESEND V6
- change the Content-Transfer-Encoding to "quoted-printable", for
- "base64" can't be applied

Changes in v6
- add acked-by
- fixed minor issue according to comments in v5

Changes in v5
- fix the [1/24, 8]
- move fsl_asrc_sel_proc before setting

Changes in v4
- add patch to Fix the [8kHz, 30kHz] open set issue.

Changes in v3
- remove FSL_ASRC_RATES
- refine fsl_asrc_sel_proc according to comments

Changes in v2
- add more comments in code
- add commit "Unify the supported input and output rate"

 sound/soc/fsl/fsl_asrc.c | 105 ++-
 1 file changed, 71 insertions(+), 34 deletions(-)

-- 
2.21.0

Re: [PATCH] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Christophe Leroy


Oops, forgot to tell it's for 4.9. Resending with proper subject.

Le 15/05/2019 à 08:39, Christophe Leroy a écrit :

[Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]

On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
  * Note that the kernel may be running at an address which is different
  * from the address that it was linked at, so we must use RELOC/PTRRELOC
  * to access static data (including strings).  -- paulus

Therefore init_mem_is_free must be accessed with PTRRELOC()

Fixes: 1c38a84d4586 ("powerpc: Avoid code patching freed init sections")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
Signed-off-by: Christophe Leroy 

---
Can't apply the upstream commit as such due to several other unrelated stuff
like for instance STRICT_KERNEL_RWX which are missing.
So instead, using same approach as for commit 
252eb55816a6f69ef9464cad303cdb3326cdc61d
---
  arch/powerpc/lib/code-patching.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 14535ad4cdd1..c312955977ce 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -23,7 +23,7 @@ int patch_instruction(unsigned int *addr, unsigned int instr)
int err;
  
  	/* Make sure we aren't patching a freed init section */

-   if (init_mem_is_free && init_section_contains(addr, 4)) {
+   if (*PTRRELOC(_mem_is_free) && init_section_contains(addr, 4)) {
pr_debug("Skipping init section patching addr: 0x%px\n", addr);
return 0;
}

Re: [PATCH] powerpc/mm/book3s64: Implement STRICT_MODULE_RWX

2019-05-15 Thread Christoph Hellwig

> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 2 of the License, or (at your
> + * option) any later version.

This license boilerplate should not be added together with an SPDX tag.

> +// we need this to have a single pointer to pass into apply_to_page_range()

Please use normal /* - */ style comments.

[PATCH] powerpc/lib: fix book3s/32 boot failure due to code patching

2019-05-15 Thread Christophe Leroy

[Backport of upstream commit b45ba4a51cde29b2939365ef0c07ad34c8321789]

On powerpc32, patch_instruction() is called by apply_feature_fixups()
which is called from early_init()

There is the following note in front of early_init():
 * Note that the kernel may be running at an address which is different
 * from the address that it was linked at, so we must use RELOC/PTRRELOC
 * to access static data (including strings).  -- paulus

Therefore init_mem_is_free must be accessed with PTRRELOC()

Fixes: 1c38a84d4586 ("powerpc: Avoid code patching freed init sections")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203597
Signed-off-by: Christophe Leroy 

---
Can't apply the upstream commit as such due to several other unrelated stuff
like for instance STRICT_KERNEL_RWX which are missing.
So instead, using same approach as for commit 
252eb55816a6f69ef9464cad303cdb3326cdc61d
---
 arch/powerpc/lib/code-patching.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 14535ad4cdd1..c312955977ce 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -23,7 +23,7 @@ int patch_instruction(unsigned int *addr, unsigned int instr)
int err;
 
/* Make sure we aren't patching a freed init section */
-   if (init_mem_is_free && init_section_contains(addr, 4)) {
+   if (*PTRRELOC(_mem_is_free) && init_section_contains(addr, 4)) {
pr_debug("Skipping init section patching addr: 0x%px\n", addr);
return 0;
}
-- 
2.13.3

Re: [PATCH] crypto: vmx - fix copy-paste error in CTR mode

2019-05-15 Thread Daniel Axtens

Herbert Xu  writes:

> On Wed, May 15, 2019 at 03:35:51AM +1000, Daniel Axtens wrote:
>>
>> By all means disable vmx ctr if I don't get an answer to you in a
>> timeframe you are comfortable with, but I am going to at least try to
>> have a look.
>
> I'm happy to give you guys more time.  How much time do you think
> you will need?
>
Give me till the end of the week: if I haven't solved it by then I will
probably have to give up and go on to other things anyway.

(FWIW, it seems to happen when encoding greater than 4 but less than 8
AES blocks - in particular with both 7 and 5 blocks encoded I can see it
go wrong from block 4 onwards. No idea why yet, and the asm is pretty
dense, but that's where I'm at at the moment.)

Regards,
Daniel

> Thanks,
> -- 
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [v4 PATCH 1/2] [PowerPC] Add simd.h implementation

2019-05-15 Thread Christophe Leroy





Le 15/05/2019 à 03:37, Shawn Landden a écrit :

Based off the x86 one.

WireGuard really wants to be able to do SIMD in interrupts,
so it can accelerate its in-bound path.

Signed-off-by: Shawn Landden 
---


Could you please as usual list here the changes provided by each version 
to ease the review ?


Thanks
Christophe


  arch/powerpc/include/asm/simd.h | 17 +
  arch/powerpc/kernel/process.c   | 30 ++
  2 files changed, 47 insertions(+)
  create mode 100644 arch/powerpc/include/asm/simd.h

diff --git a/arch/powerpc/include/asm/simd.h b/arch/powerpc/include/asm/simd.h
new file mode 100644
index 0..2fe26f258
--- /dev/null
+++ b/arch/powerpc/include/asm/simd.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+
+/*
+ * may_use_simd - whether it is allowable at this time to issue SIMD
+ *instructions or access the SIMD register file
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+#ifdef CONFIG_PPC_FPU
+extern bool may_use_simd(void);
+#else
+static inline bool may_use_simd(void)
+{
+   return false;
+}
+#endif
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index dd9e0d538..ef534831f 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -345,6 +345,36 @@ static int restore_altivec(struct task_struct *tsk)
}
return 0;
  }
+
+/*
+ * Were we in user mode when we were
+ * interrupted?
+ *
+ * Doing kernel_altivec/vsx_begin/end() is ok if we are running
+ * in an interrupt context from user mode - we'll just
+ * save the FPU state as required.
+ */
+static bool interrupted_user_mode(void)
+{
+   struct pt_regs *regs = get_irq_regs();
+
+   return regs && user_mode(regs);
+}
+
+/*
+ * Can we use FPU in kernel mode with the
+ * whole "kernel_fpu/altivec/vsx_begin/end()" sequence?
+ *
+ * It's always ok in process context (ie "not interrupt")
+ * but it is sometimes ok even from an irq.
+ */
+bool may_use_simd(void)
+{
+   return !in_interrupt() ||
+   interrupted_user_mode();
+}
+EXPORT_SYMBOL(may_use_simd);
+
  #else
  #define loadvec(thr) 0
  static inline int restore_altivec(struct task_struct *tsk) { return 0; }

Re: [PATCH] powerpc/mm/book3s64: Implement STRICT_MODULE_RWX

2019-05-15 Thread Christophe Leroy





Le 15/05/2019 à 03:30, Russell Currey a écrit :

Strict module RWX is just like strict kernel RWX, but for modules - so
loadable modules aren't marked both writable and executable at the same
time.  This is handled by the generic code in kernel/module.c, and
simply requires the architecture to implement the set_memory() set of
functions, declared with ARCH_HAS_SET_MEMORY.

The set_memory() family of functions are implemented for book3s64
MMUs (so Hash and Radix), however they could likely be adapted to work
for other platforms as well and made more generic.  I did it this way
since they're the platforms I have the most understanding of and ability
to test.


Based on this patch, I have drafted a generic implementation. Please 
comment and test. I'll test on my side on PPC32.


Christophe



There's nothing other than these functions required to turn
ARCH_HAS_STRICT_MODULE_RWX on, so turn that on too.

With STRICT_MODULE_RWX enabled, there are as many W+X pages at runtime
as there are with CONFIG_MODULES=n (none), so in my testing it works
well on both Hash and Radix.

There's a TODO in the code for also applying the page permission changes
to the backing pages in the linear mapping: this is pretty simple for
Radix and (seemingly) a lot harder for Hash, so I've left it for now
since there's still a notable security benefit for the patch as-is.

Technically can be enabled without STRICT_KERNEL_RWX, but I don't think
that gets you a whole lot, so I think we should leave it off by default
until we can get STRICT_KERNEL_RWX to the point where it's enabled by
default.

Signed-off-by: Russell Currey 
---
  arch/powerpc/Kconfig  |   2 +
  arch/powerpc/include/asm/set_memory.h |  12 +++
  arch/powerpc/mm/book3s64/Makefile |   2 +-
  arch/powerpc/mm/book3s64/pageattr.c   | 106 ++
  4 files changed, 121 insertions(+), 1 deletion(-)
  create mode 100644 arch/powerpc/include/asm/set_memory.h
  create mode 100644 arch/powerpc/mm/book3s64/pageattr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d7996cfaceca..9e1bfa81bc5a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -131,7 +131,9 @@ config PPC
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC64
+   select ARCH_HAS_SET_MEMORY  if PPC_BOOK3S_64
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!RELOCATABLE && !HIBERNATION)
+   select ARCH_HAS_STRICT_MODULE_RWX   if PPC_BOOK3S_64
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE  if PPC64
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/powerpc/include/asm/set_memory.h 
b/arch/powerpc/include/asm/set_memory.h
new file mode 100644
index ..5323a8b06f98
--- /dev/null
+++ b/arch/powerpc/include/asm/set_memory.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef _ASM_POWERPC_SET_MEMORY_H
+#define _ASM_POWERPC_SET_MEMORY_H
+
+#ifdef CONFIG_PPC_BOOK3S_64
+int set_memory_ro(unsigned long addr, int numpages);
+int set_memory_rw(unsigned long addr, int numpages);
+int set_memory_nx(unsigned long addr, int numpages);
+int set_memory_x(unsigned long addr, int numpages);
+#endif
+
+#endif
diff --git a/arch/powerpc/mm/book3s64/Makefile 
b/arch/powerpc/mm/book3s64/Makefile
index 974b4fc19f4f..09c5afadf235 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -5,7 +5,7 @@ ccflags-y   := $(NO_MINIMAL_TOC)
  CFLAGS_REMOVE_slb.o = $(CC_FLAGS_FTRACE)
  
  obj-y+= hash_pgtable.o hash_utils.o slb.o \

-  mmu_context.o pgtable.o hash_tlb.o
+  mmu_context.o pgtable.o hash_tlb.o pageattr.o
  obj-$(CONFIG_PPC_NATIVE)  += hash_native.o
  obj-$(CONFIG_PPC_RADIX_MMU)   += radix_pgtable.o radix_tlb.o
  obj-$(CONFIG_PPC_4K_PAGES)+= hash_4k.o
diff --git a/arch/powerpc/mm/book3s64/pageattr.c 
b/arch/powerpc/mm/book3s64/pageattr.c
new file mode 100644
index ..d6afa89fb407
--- /dev/null
+++ b/arch/powerpc/mm/book3s64/pageattr.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/*
+ * Page attribute and set_memory routines for Radix and Hash MMUs
+ *
+ * Derived from the arm64 implementation.
+ *
+ * Author: Russell Currey 
+ *
+ * Copyright 2019, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.


Above text should be removed as it is redundant with the 
SPDX-Licence-Identifier



+ */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+// we need this to have a single pointer to pass into apply_to_page_range()
+struct

Re: [PATCH] vsprintf: Do not break early boot with probing addresses

2019-05-15 Thread Sergey Senozhatsky

On (05/14/19 21:13), Geert Uytterhoeven wrote:
> I would immediately understand there's a missing IS_ERR() check in a
> function that can return  -EINVAL, without having to add a new printk()
> to find out what kind of bogus value has been received, and without
> having to reboot, and trying to reproduce...

But chances are that missing IS_ERR() will crash the kernel sooner
or later (in general case), if not in sprintf() then somewhere else.

-ss

[RFC PATCH] powerpc/mm: Implement STRICT_MODULE_RWX

2019-05-15 Thread Christophe Leroy

Strict module RWX is just like strict kernel RWX, but for modules - so
loadable modules aren't marked both writable and executable at the same
time.  This is handled by the generic code in kernel/module.c, and
simply requires the architecture to implement the set_memory() set of
functions, declared with ARCH_HAS_SET_MEMORY.

There's nothing other than these functions required to turn
ARCH_HAS_STRICT_MODULE_RWX on, so turn that on too.

With STRICT_MODULE_RWX enabled, there are as many W+X pages at runtime
as there are with CONFIG_MODULES=n (none), so in Russel's testing it works
well on both Hash and Radix book3s64.

There's a TODO in the code for also applying the page permission changes
to the backing pages in the linear mapping: this is pretty simple for
Radix and (seemingly) a lot harder for Hash, so I've left it for now
since there's still a notable security benefit for the patch as-is.

Technically can be enabled without STRICT_KERNEL_RWX, but
that doesn't gets you a whole lot, so we should leave it off by default
until we can get STRICT_KERNEL_RWX to the point where it's enabled by
default.

Signed-off-by: Russell Currey 
Signed-off-by: Christophe Leroy 
---
 Generic implementation based on Russel's patch ("powerpc/mm/book3s64: 
Implement STRICT_MODULE_RWX")
 Untested

 arch/powerpc/Kconfig  |  2 +
 arch/powerpc/include/asm/set_memory.h | 32 +
 arch/powerpc/mm/Makefile  |  2 +-
 arch/powerpc/mm/pageattr.c| 85 +++
 4 files changed, 120 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/set_memory.h
 create mode 100644 arch/powerpc/mm/pageattr.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d7996cfaceca..1f1423e3d818 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -131,7 +131,9 @@ config PPC
select ARCH_HAS_PTE_SPECIAL
select ARCH_HAS_MEMBARRIER_CALLBACKS
select ARCH_HAS_SCALED_CPUTIME  if VIRT_CPU_ACCOUNTING_NATIVE 
&& PPC64
+   select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX   if ((PPC_BOOK3S_64 || PPC32) && 
!RELOCATABLE && !HIBERNATION)
+   select ARCH_HAS_STRICT_MODULE_RWX   if PPC_BOOK3S_64 || PPC32
select ARCH_HAS_TICK_BROADCAST  if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAS_UACCESS_FLUSHCACHE  if PPC64
select ARCH_HAS_UBSAN_SANITIZE_ALL
diff --git a/arch/powerpc/include/asm/set_memory.h 
b/arch/powerpc/include/asm/set_memory.h
new file mode 100644
index ..4b9683f3b3dd
--- /dev/null
+++ b/arch/powerpc/include/asm/set_memory.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0+ */
+#ifndef _ASM_POWERPC_SET_MEMORY_H
+#define _ASM_POWERPC_SET_MEMORY_H
+
+#define SET_MEMORY_RO  1
+#define SET_MEMORY_RW  2
+#define SET_MEMORY_NX  3
+#define SET_MEMORY_X   4
+
+int change_memory(unsigned long addr, int numpages, int action);
+
+static inline int set_memory_ro(unsigned long addr, int numpages)
+{
+   return change_memory(addr, numpages, SET_MEMORY_RO);
+}
+
+static inline int set_memory_rw(unsigned long addr, int numpages)
+{
+   return change_memory(addr, numpages, SET_MEMORY_RW);
+}
+
+static inline int set_memory_nx(unsigned long addr, int numpages)
+{
+   return change_memory(addr, numpages, SET_MEMORY_NX);
+}
+
+static inline int set_memory_x(unsigned long addr, int numpages)
+{
+   return change_memory(addr, numpages, SET_MEMORY_X);
+}
+
+#endif
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 0f499db315d6..b683d1c311b3 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -7,7 +7,7 @@ ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
 
 obj-y  := fault.o mem.o pgtable.o mmap.o \
   init_$(BITS).o pgtable_$(BITS).o \
-  pgtable-frag.o \
+  pgtable-frag.o pageattr.o \
   init-common.o mmu_context.o drmem.o
 obj-$(CONFIG_PPC_MMU_NOHASH)   += nohash/
 obj-$(CONFIG_PPC_BOOK3S_32)+= book3s32/
diff --git a/arch/powerpc/mm/pageattr.c b/arch/powerpc/mm/pageattr.c
new file mode 100644
index ..3e8f2c203a00
--- /dev/null
+++ b/arch/powerpc/mm/pageattr.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0+
+
+/*
+ * Page attribute and set_memory routines
+ *
+ * Derived from the arm64 implementation.
+ *
+ * Author: Russell Currey 
+ *
+ * Copyright 2019, IBM Corporation.
+ *
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+static int change_page_ro(pte_t *ptep, pgtable_t token, unsigned long addr, 
void *data)
+{
+   set_pte_at(_mm, addr, ptep, pte_wrprotect(READ_ONCE(*ptep)));
+   return 0;
+}
+
+static int change_page_rw(pte_t *ptep, pgtable_t token, unsigned long addr, 
void *data)
+{
+   set_pte_at(_mm, addr, ptep, pte_mkwrite(READ_ONCE(*ptep)));
+   return 0;
+}
+
+static int

Re: [PATCH] powerpc/security: Fix build break

2019-05-15 Thread Greg Kroah-Hartman

On Wed, May 15, 2019 at 07:18:30AM +0200, Greg Kroah-Hartman wrote:
> On Wed, May 15, 2019 at 02:22:06PM +0930, Joel Stanley wrote:
> > This fixes a build break introduced in with the recent round of CPU
> > bug patches.
> > 
> >   arch/powerpc/kernel/security.c: In function ‘setup_barrier_nospec’:
> >   arch/powerpc/kernel/security.c:59:21: error: implicit declaration of
> >   function ‘cpu_mitigations_off’ [-Werror=implicit-function-declaration]
> > if (!no_nospec && !cpu_mitigations_off())
> >^~~
> > 
> > Fixes: 782e69efb3df ("powerpc/speculation: Support 'mitigations=' cmdline 
> > option")
> > Signed-off-by: Joel Stanley 
> > ---
> > This should be applied to the 4.14 and 4.19 trees. There is no issue
> > with 5.1. The commit message contains a fixes line for the commit in
> > Linus tree.
> > ---
> >  arch/powerpc/kernel/security.c | 1 +
> >  1 file changed, 1 insertion(+)
> 
> Isn't this just commit 42e2acde1237 ("powerpc/64s: Include cpu header")?

Which I have now queued up.

94 matches

Mail list logo