Re: PowerPC arch_ptrace() writes beyond thread_struct/task_struct

2019-06-05 Thread Christophe Leroy




Le 05/06/2019 à 23:45, Radu Rendec a écrit :

Hi Everyone,

I'm seeing some weird memory corruption that I have been able to isolate
to arch_ptrace() [arch/powerpc/kernel/ptrace.c] and PTRACE_POKEUSR. I am
on PowerPC 32 (MPC8378), kernel 4.9.179.

It's not very easy for me to test on the latest kernel, but I guess
little has changed since 4.9 in either the architecture specific ptrace
code or PowerPC register data structures.

What happens is that gdb calls ptrace(PTRACE_POKEUSER) with addr=0x158.
This goes down to arch_ptrace() [arch/powerpc/kernel/ptrace.c], inside
`case PTRACE_POKEUSR`, on the branch that does this:

 memcpy(>thread.TS_FPR(fpidx), ,
 sizeof(long));

where:
 index = addr >> 2 = 0x56 = 86
 fpidx = index - PT_FPR0 = 86 - 48 = 38


In struct thread_fp_state, fpr field is u64, so I guess we should have 
the following on PPC32:


fpidx = (index - PT_FPR0) >> 1;

Christophe


 >thread.TS_FPR(fpidx) = (void *)child + 1296

 offsetof(struct task_struct, thread) = 960
 sizeof(struct thread_struct) = 336
 sizeof(struct task_struct) = 1296

In other words, the memcpy() call writes just beyond thread_struct
(which is also beyond task_struct, for that matter).

This should never get past the bounds checks for `index`, so perhaps
there is a mismatch between ptrace macros and the actual register data
structures layout.

I will continue to investigate, but I'm not familiar with the PowerPC
registers so it will take a while before I make sense of all the data
structures and macros. Hopefully this rings a bell to someone who is
already familiar with those and could figure out quickly what the
problem is.

Best regards,
Radu Rendec



Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59

2019-06-05 Thread Shawn Anastasio

On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:

This is an attempt to allow DMA masks between 32..59 which are not large
enough to use either a PHB3 bypass mode or a sketchy bypass. Depending
on the max order, up to 40 is usually available.


This is based on v5.2-rc2.

Please comment. Thanks.


I have tested this patch set with an AMD GPU that's limited to <64bit
DMA (I believe it's 40 or 42 bit). It successfully allows the card to
operate without falling back to 32-bit DMA mode as it does without
the patches.

Relevant kernel log message:
```
[0.311211] pci 0033:01 : [PE# 00] Enabling 64-bit DMA bypass
```

Tested-by: Shawn Anastasio 


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-05 Thread Larry Finger

On 6/5/19 5:50 PM, Aaro Koskinen wrote:

Hi,

When upgrading from v5.0 -> v5.1 on G4 PowerBook, I noticed WLAN does
not work anymore:

[   42.004303] b43legacy-phy0: Loading firmware version 0x127, patch level 14 
(2005-04-18 02:36:27)
[   42.184837] b43legacy-phy0 debug: Chip initialized
[   42.184873] b43legacy-phy0 ERROR: The machine/kernel does not support the 
required 30-bit DMA mask

The same happens with the current mainline.

Bisected to:

commit 65a21b71f948406201e4f62e41f06513350ca390
Author: Christoph Hellwig 
Date:   Wed Feb 13 08:01:26 2019 +0100

powerpc/dma: remove dma_nommu_dma_supported

This function is largely identical to the generic version used
everywhere else.  Replace it with the generic version.

Signed-off-by: Christoph Hellwig 
Tested-by: Christian Zigotzky 
Signed-off-by: Michael Ellerman 


Aaro,

First of all, you have my sympathy for the laborious bisection on a PowerBook 
G4. I have done several myself. Thank you.


I confirm your results.

The ppc code has a maximum DMA size of 31 bits, thus a 32-bit request will fail. 
Why the 30-bit fallback fails in b43legacy fails while it works in b43 is a mystery.


Although dma_nommu_dma_supported() may be "largely identical" to 
dma_direct_supported(), they obviously differ. Routine dma_nommu_dma_supported() 
returns 1 for 32-bit systems, but I do not know what dma_direct_supported() returns.


I am trying to find a patch.

Larry


Re: [RFC V2] mm: Generalize notify_page_fault()

2019-06-05 Thread Anshuman Khandual



On 06/05/2019 04:53 PM, Matthew Wilcox wrote:
> On Wed, Jun 05, 2019 at 09:19:22PM +1000, Michael Ellerman wrote:
>> Anshuman Khandual  writes:
>>> Similar notify_page_fault() definitions are being used by architectures
>>> duplicating much of the same code. This attempts to unify them into a
>>> single implementation, generalize it and then move it to a common place.
>>> kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
>>> need not be wrapped again within CONFIG_KPROBES. Trap number argument can
>>> now contain upto an 'unsigned int' accommodating all possible platforms.
>> ...
>>
>> You've changed several of the architectures from something like above,
>> where it disables preemption around the call into the below:
>>
>>
>> Which skips everything if we're preemptible. Is that an equivalent
>> change? If so can you please explain why in more detail.
> 
> See the discussion in v1 of this patch, which you were cc'd on.
> 
> I agree the description here completely fails to mention why the change.
> It should mention commit a980c0ef9f6d8c.

I will update the commit message to include an explanation for this new
preempt behavior in the generic definition.


Re: [RFC V2] mm: Generalize notify_page_fault()

2019-06-05 Thread Anshuman Khandual
On 06/05/2019 04:49 PM, Michael Ellerman wrote:
> Anshuman Khandual  writes:
>> Similar notify_page_fault() definitions are being used by architectures
>> duplicating much of the same code. This attempts to unify them into a
>> single implementation, generalize it and then move it to a common place.
>> kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
>> need not be wrapped again within CONFIG_KPROBES. Trap number argument can
>> now contain upto an 'unsigned int' accommodating all possible platforms.
> ...
>> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
>> index 58f69fa..1bc3b18 100644
>> --- a/arch/arm/mm/fault.c
>> +++ b/arch/arm/mm/fault.c
>> @@ -30,28 +30,6 @@
>>  
>>  #ifdef CONFIG_MMU
>>  
>> -#ifdef CONFIG_KPROBES
>> -static inline int notify_page_fault(struct pt_regs *regs, unsigned int fsr)
>> -{
>> -int ret = 0;
>> -
>> -if (!user_mode(regs)) {
>> -/* kprobe_running() needs smp_processor_id() */
>> -preempt_disable();
>> -if (kprobe_running() && kprobe_fault_handler(regs, fsr))
>> -ret = 1;
>> -preempt_enable();
>> -}
>> -
>> -return ret;
>> -}
>> -#else
> 
> You've changed several of the architectures from something like above,
> where it disables preemption around the call into the below:
> 
>> +int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
>> +{
>> +int ret = 0;
>> +
>> +/*
>> + * To be potentially processing a kprobe fault and to be allowed
>> + * to call kprobe_running(), we have to be non-preemptible.
>> + */
>> +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
>> +if (kprobe_running() && kprobe_fault_handler(regs, trap))
>> +ret = 1;
>> +}
>> +return ret;
>> +}
> 
> Which skips everything if we're preemptible. Is that an equivalent

Right.

> change? If so can you please explain why in more detail.

It is probably not an equivalent change. The following explanation is extracted 
from
RFC V1 discussions (https://patchwork.kernel.org/patch/10968273/). Will explain 
the
rational for this behavior change in the commit message next time around.


a980c0ef9f6d ("x86/kprobes: Refactor kprobes_fault() like 
kprobe_exceptions_notify()")
b506a9d08bae ("x86: code clarification patch to Kprobes arch code")

In particular the later one (b506a9d08bae). It explains how the invoking context
in itself should be non-preemptible for the kprobes processing context 
irrespective
of whether kprobe_running() or perhaps smp_processor_id() is safe or not. Hence 
it
does not make much sense to continue when original invoking context is 
preemptible.
Instead just bail out earlier. This seems to be making more sense than preempt
disable-enable pair. If there are no concerns about this change from other 
platforms,
I will change the preemption behavior in proposed generic function next time 
around.


Do you see any concern changing preempt behavior in the x86 way ?

> 
> Also why not have it return bool?

Just that all architectures (except powerpc) had 'int' as return type. But we 
can
change that to 'bool'.


Re: crash after NX error

2019-06-05 Thread Stewart Smith
Michael Ellerman  writes:
> Stewart Smith  writes:
>> On my two socket POWER9 system (powernv) with 842 zwap set up, I
>> recently got a crash with the Ubuntu kernel (I haven't tried with
>> upstream, and this is the first time the system has died like this, so
>> I'm not sure how repeatable it is).
>>
>> [2.891463] zswap: loaded using pool 842-nx/zbud
>> ...
>> [15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 500 
>> us, giving up : 00 00 00 00 
>> [16868.932913] Unable to handle kernel paging request for data at address 
>> 0x6655f67da816cdb8
>> [16868.933726] Faulting instruction address: 0xc0391600
>>
>>
>> cpu 0x68: Vector: 380 (Data Access Out of Range) at [c01c9d98b9a0]
>> pc: c0391600: kmem_cache_alloc+0x2e0/0x340
>> lr: c03915ec: kmem_cache_alloc+0x2cc/0x340
>> sp: c01c9d98bc20
>>msr: 9280b033
>>dar: 6655f67da816cdb8
>>   current = 0xc01ad43cb400
>>   paca= 0xcfac7800   softe: 0irq_happened: 0x01
>> pid   = 8319, comm = make
>> Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 
>> 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 
>> (Ubuntu 4.15.0-50.54-generic 4.15.18)
>>
>> 68:mon> t
>> [c01c9d98bc20] c03914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable)
>> [c01c9d98bc80] c03b1e14 __khugepaged_enter+0x54/0x220
>> [c01c9d98bcc0] c010f0ec copy_process.isra.5.part.6+0xebc/0x1a10
>> [c01c9d98bda0] c010fe4c _do_fork+0xec/0x510
>> [c01c9d98be30] c000b584 ppc_clone+0x8/0xc
>> --- Exception: c00 (System Call) at 7afe9daf87f4
>> SP (7fffca606880) is in userspace
>>
>> So, it looks like there could be a problem in the error path, plausibly
>> fixed by this patch:
>>
>> commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
>> Author: Haren Myneni 
>> Date:   Wed Jun 13 00:32:40 2018 -0700
>>
>> crypto/nx: Initialize 842 high and normal RxFIFO control registers
>> 
>> NX increments readOffset by FIFO size in receive FIFO control register
>> when CRB is read. But the index in RxFIFO has to match with the
>> corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
>> may be processing incorrect CRBs and can cause CRB timeout.
>> 
>> VAS FIFO offset is 0 when the receive window is opened during
>> initialization. When the module is reloaded or in kexec boot, readOffset
>> in FIFO control register may not match with VAS entry. This patch adds
>> nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
>> control register for both high and normal FIFOs.
>> 
>> Signed-off-by: Haren Myneni 
>> [mpe: Fixup uninitialized variable warning]
>> Signed-off-by: Michael Ellerman 
>>
>> $ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
>> v4.19-rc1~24^2~50
>>
>>
>> Which was never backported to any stable release, so probably needs to
>> be for v4.14 through v4.18.
>
> Yeah the P9 NX support went in in:
>   b0d6c9bab5e4 ("crypto/nx: Add P9 NX support for 842 compression engine")
>
> Which was: v4.14-rc1~119^2~21, so first released in v4.14.
>
>
> I'm actually less interested in that and more interested in the
> subsequent crash. The time stamps are miles apart though, did we just
> leave some corrupted memory after the NX failed and then hit it later?
> Or did we not correctly signal to the upper level APIs that the request
> failed.
>
> I think we need to do some testing with errors injected into the
> wait_for_csb() path, to ensure that failures there are not causing
> corrupting in zswap. Haren have you done any testing of error
> injection?

So, things died pretty heavily overnight (requiring e2fsck) with a *lot*
of those wait_for_csb() errors in the log.

It certainly *looks* like there's corruption around, as one of the CI
jobs that failed around that time got "internal compiler error" which is
usually a good sign that things have gone poorly somewhere.

-- 
Stewart Smith
OPAL Architect, IBM.



Re: [RFC V2] mm: Generalize notify_page_fault()

2019-06-05 Thread Anshuman Khandual



On 06/05/2019 03:23 AM, Matthew Wilcox wrote:
> On Tue, Jun 04, 2019 at 12:04:06PM +0530, Anshuman Khandual wrote:
>> +++ b/arch/x86/mm/fault.c
>> @@ -46,23 +46,6 @@ kmmio_fault(struct pt_regs *regs, unsigned long addr)
>>  return 0;
>>  }
>>  
>> -static nokprobe_inline int kprobes_fault(struct pt_regs *regs)
>> -{
> ...
>> -}
> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 0e8834a..c5a8dcf 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1778,6 +1778,7 @@ static inline int pte_devmap(pte_t pte)
>>  }
>>  #endif
>>  
>> +int notify_page_fault(struct pt_regs *regs, unsigned int trap);
> 
> Why is it now out-of-line?  

Did not get it. AFAICS it is the same from last version and does not cross
80 characters limit on that line.

> 
>> +++ b/mm/memory.c
>> +int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
>> +{
>> +int ret = 0;
>> +
>> +/*
>> + * To be potentially processing a kprobe fault and to be allowed
>> + * to call kprobe_running(), we have to be non-preemptible.
>> + */
>> +if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
>> +if (kprobe_running() && kprobe_fault_handler(regs, trap))
>> +ret = 1;
>> +}
>> +return ret;
>> +}
>> +
> 
> I would argue this should be in kprobes.h as a static nokprobe_inline.

We can do that. Though it will be a stand alone (not inside #ifdef) as it
already takes care of CONFIG_KPROBES via kprobes_built_in(). Will change
it and in which case the above declaration in mm.h would not be required.


Re: [BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-05 Thread Benjamin Herrenschmidt
On Thu, 2019-06-06 at 01:50 +0300, Aaro Koskinen wrote:
> Hi,
> 
> When upgrading from v5.0 -> v5.1 on G4 PowerBook, I noticed WLAN does
> not work anymore:
> 
> [   42.004303] b43legacy-phy0: Loading firmware version 0x127, patch level 14 
> (2005-04-18 02:36:27)
> [   42.184837] b43legacy-phy0 debug: Chip initialized
> [   42.184873] b43legacy-phy0 ERROR: The machine/kernel does not support the 
> required 30-bit DMA mask
> 
> The same happens with the current mainline.

How much RAM do you have ?

Ben.

> 
> Bisected to:
> 
>   commit 65a21b71f948406201e4f62e41f06513350ca390
>   Author: Christoph Hellwig 
>   Date:   Wed Feb 13 08:01:26 2019 +0100
> 
>   powerpc/dma: remove dma_nommu_dma_supported
> 
>   This function is largely identical to the generic version used
>   everywhere else.  Replace it with the generic version.
> 
>   Signed-off-by: Christoph Hellwig 
>   Tested-by: Christian Zigotzky 
>   Signed-off-by: Michael Ellerman 
> 
> A.



Re: [PATCH] ASoC: fsl_esai: fix the channel swap issue after xrun

2019-06-05 Thread Nicolin Chen
Hello Shengjiu,

On Wed, Jun 05, 2019 at 10:29:37AM +, S.j. Wang wrote:
> > > ETDR is not volatile,  if we mark it is volatile, is it correct?
> > 
> > Well, you have a point -- it might not be ideally true, but it sounds like a
> > correct fix to me according to this comments.
> > 
> > We can wait for Mark's comments or just send a patch to the mail list for
> > review.
> 
> I test this patch, we don't need to reset the FIFO, and regcache_sync didn't
> Write the ETDR even the EDTR is not volatile.  This fault maybe caused by

The fsl_esai driver uses FLAT type cache so regcache_sync() would
go through regcache_default_sync() that would bypass cache sync at
the regcache_reg_needs_sync() check when the cached register value
matches its default value: in case of ETDR who has a default value
0x0, it'd just "continue" without doing that _regmap_write() when
the cached value equals to 0x0.

> Legacy, in the beginning we add this patch in internal branch, there maybe
> Something cause this issue, but now can't reproduced. 

The "legacy" case might happen to have two mismatched ETDR values
between the cached value and default 0x0. And I am worried it may
appear once again someday.

So I feel we still need to change ETDR to volatile type. And for
your question "ETDR is not volatile,  if we mark it is volatile,
is it correct?", I double checked the definition of volatile_reg,
and it says:
 * @volatile_reg: Optional callback returning true if the register
 *value can't be cached. If this field is NULL but

So it seems correct to me then, as the "volatile" should be also
transcribed as "non-cacheable".

Thanks
Nicolin


[BISECTED REGRESSION] b43legacy broken on G4 PowerBook

2019-06-05 Thread Aaro Koskinen
Hi,

When upgrading from v5.0 -> v5.1 on G4 PowerBook, I noticed WLAN does
not work anymore:

[   42.004303] b43legacy-phy0: Loading firmware version 0x127, patch level 14 
(2005-04-18 02:36:27)
[   42.184837] b43legacy-phy0 debug: Chip initialized
[   42.184873] b43legacy-phy0 ERROR: The machine/kernel does not support the 
required 30-bit DMA mask

The same happens with the current mainline.

Bisected to:

commit 65a21b71f948406201e4f62e41f06513350ca390
Author: Christoph Hellwig 
Date:   Wed Feb 13 08:01:26 2019 +0100

powerpc/dma: remove dma_nommu_dma_supported

This function is largely identical to the generic version used
everywhere else.  Replace it with the generic version.

Signed-off-by: Christoph Hellwig 
Tested-by: Christian Zigotzky 
Signed-off-by: Michael Ellerman 

A.


Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()

2019-06-05 Thread David Hildenbrand
On 05.06.19 23:22, Wei Yang wrote:
> On Wed, Jun 05, 2019 at 12:58:46PM +0200, David Hildenbrand wrote:
>> On 05.06.19 10:58, David Hildenbrand wrote:
> /*
>  * For now, we have a linear search to go find the appropriate
>  * memory_block corresponding to a particular phys_index. If
> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block 
> **memory, int block_id,
>   unsigned long start_pfn;
>   int ret = 0;
>
> + mem = find_memory_block_by_id(block_id, NULL);
> + if (mem) {
> + put_device(>dev);
> + return -EEXIST;
> + }

 find_memory_block_by_id() is not that close to the main idea in this patch.
 Would it be better to split this part?
>>>
>>> I played with that but didn't like the temporary results (e.g. having to
>>> export find_memory_block_by_id()). I'll stick to this for now.
>>>

>   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>   if (!mem)
>   return -ENOMEM;
> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>   return 0;
> }
>
> +static void unregister_memory(struct memory_block *memory)
> +{
> + if (WARN_ON_ONCE(memory->dev.bus != _subsys))
> + return;
> +
> + /* drop the ref. we got via find_memory_block() */
> + put_device(>dev);
> + device_unregister(>dev);
> +}
> +
> /*
> - * need an interface for the VM to add new memory regions,
> - * but without onlining it.
> + * Create memory block devices for the given memory area. Start and size
> + * have to be aligned to memory block granularity. Memory block devices
> + * will be initialized as offline.
>  */
> -int hotplug_memory_register(int nid, struct mem_section *section)
> +int create_memory_block_devices(unsigned long start, unsigned long size)
> {
> - int block_id = base_memory_block_id(__section_nr(section));
> - int ret = 0;
> + const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
> + int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>   struct memory_block *mem;
> + unsigned long block_id;
> + int ret = 0;
>
> - mutex_lock(_sysfs_mutex);
> + if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
> +  !IS_ALIGNED(size, memory_block_size_bytes(
> + return -EINVAL;
>
> - mem = find_memory_block(section);
> - if (mem) {
> - mem->section_count++;
> - put_device(>dev);
> - } else {
> + mutex_lock(_sysfs_mutex);
> + for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>   ret = init_memory_block(, block_id, MEM_OFFLINE);
>   if (ret)
> - goto out;
> - mem->section_count++;
> + break;
> + mem->section_count = sections_per_block;
> + }
> + if (ret) {
> + end_block_id = block_id;
> + for (block_id = start_block_id; block_id != end_block_id;
> +  block_id++) {
> + mem = find_memory_block_by_id(block_id, NULL);
> + mem->section_count = 0;
> + unregister_memory(mem);
> + }
>   }

 Would it be better to do this in reverse order?

 And unregister_memory() would free mem, so it is still necessary to set
 section_count to 0?
>>>
>>> 1. I kept the existing behavior (setting it to 0) for now. I am planning
>>> to eventually remove the section count completely (it could be
>>> beneficial to detect removing of partially populated memory blocks).
>>
>> Correction: We already use it to block offlining of partially populated
>> memory blocks \o/
> 
> Would you mind letting me know where we leverage this?

Sure:

drivers/base/memory.c:memory_subsys_offline()

if (mem->section_count != sections_per_block)
return -EINVAL;

I would have expected such checks in the offline_pages() function instead.

-- 

Thanks,

David / dhildenb


PowerPC arch_ptrace() writes beyond thread_struct/task_struct

2019-06-05 Thread Radu Rendec
Hi Everyone,

I'm seeing some weird memory corruption that I have been able to isolate
to arch_ptrace() [arch/powerpc/kernel/ptrace.c] and PTRACE_POKEUSR. I am
on PowerPC 32 (MPC8378), kernel 4.9.179.

It's not very easy for me to test on the latest kernel, but I guess
little has changed since 4.9 in either the architecture specific ptrace
code or PowerPC register data structures.

What happens is that gdb calls ptrace(PTRACE_POKEUSER) with addr=0x158.
This goes down to arch_ptrace() [arch/powerpc/kernel/ptrace.c], inside
`case PTRACE_POKEUSR`, on the branch that does this:

memcpy(>thread.TS_FPR(fpidx), ,
sizeof(long));

where:
index = addr >> 2 = 0x56 = 86
fpidx = index - PT_FPR0 = 86 - 48 = 38
>thread.TS_FPR(fpidx) = (void *)child + 1296

offsetof(struct task_struct, thread) = 960
sizeof(struct thread_struct) = 336
sizeof(struct task_struct) = 1296

In other words, the memcpy() call writes just beyond thread_struct
(which is also beyond task_struct, for that matter).

This should never get past the bounds checks for `index`, so perhaps
there is a mismatch between ptrace macros and the actual register data
structures layout.

I will continue to investigate, but I'm not familiar with the PowerPC
registers so it will take a while before I make sense of all the data
structures and macros. Hopefully this rings a bell to someone who is
already familiar with those and could figure out quickly what the
problem is.

Best regards,
Radu Rendec


Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()

2019-06-05 Thread Wei Yang
On Wed, Jun 05, 2019 at 12:58:46PM +0200, David Hildenbrand wrote:
>On 05.06.19 10:58, David Hildenbrand wrote:
 /*
  * For now, we have a linear search to go find the appropriate
  * memory_block corresponding to a particular phys_index. If
 @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block 
 **memory, int block_id,
unsigned long start_pfn;
int ret = 0;

 +  mem = find_memory_block_by_id(block_id, NULL);
 +  if (mem) {
 +  put_device(>dev);
 +  return -EEXIST;
 +  }
>>>
>>> find_memory_block_by_id() is not that close to the main idea in this patch.
>>> Would it be better to split this part?
>> 
>> I played with that but didn't like the temporary results (e.g. having to
>> export find_memory_block_by_id()). I'll stick to this for now.
>> 
>>>
mem = kzalloc(sizeof(*mem), GFP_KERNEL);
if (!mem)
return -ENOMEM;
 @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
return 0;
 }

 +static void unregister_memory(struct memory_block *memory)
 +{
 +  if (WARN_ON_ONCE(memory->dev.bus != _subsys))
 +  return;
 +
 +  /* drop the ref. we got via find_memory_block() */
 +  put_device(>dev);
 +  device_unregister(>dev);
 +}
 +
 /*
 - * need an interface for the VM to add new memory regions,
 - * but without onlining it.
 + * Create memory block devices for the given memory area. Start and size
 + * have to be aligned to memory block granularity. Memory block devices
 + * will be initialized as offline.
  */
 -int hotplug_memory_register(int nid, struct mem_section *section)
 +int create_memory_block_devices(unsigned long start, unsigned long size)
 {
 -  int block_id = base_memory_block_id(__section_nr(section));
 -  int ret = 0;
 +  const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
 +  int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
struct memory_block *mem;
 +  unsigned long block_id;
 +  int ret = 0;

 -  mutex_lock(_sysfs_mutex);
 +  if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
 +   !IS_ALIGNED(size, memory_block_size_bytes(
 +  return -EINVAL;

 -  mem = find_memory_block(section);
 -  if (mem) {
 -  mem->section_count++;
 -  put_device(>dev);
 -  } else {
 +  mutex_lock(_sysfs_mutex);
 +  for (block_id = start_block_id; block_id != end_block_id; block_id++) {
ret = init_memory_block(, block_id, MEM_OFFLINE);
if (ret)
 -  goto out;
 -  mem->section_count++;
 +  break;
 +  mem->section_count = sections_per_block;
 +  }
 +  if (ret) {
 +  end_block_id = block_id;
 +  for (block_id = start_block_id; block_id != end_block_id;
 +   block_id++) {
 +  mem = find_memory_block_by_id(block_id, NULL);
 +  mem->section_count = 0;
 +  unregister_memory(mem);
 +  }
}
>>>
>>> Would it be better to do this in reverse order?
>>>
>>> And unregister_memory() would free mem, so it is still necessary to set
>>> section_count to 0?
>> 
>> 1. I kept the existing behavior (setting it to 0) for now. I am planning
>> to eventually remove the section count completely (it could be
>> beneficial to detect removing of partially populated memory blocks).
>
>Correction: We already use it to block offlining of partially populated
>memory blocks \o/

Would you mind letting me know where we leverage this?

>
>> 
>> 2. Reverse order: We would have to start with "block_id - 1", I don't
>> like that better.
>> 
>> Thanks for having a look!
>> 
>
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me


Re: [PATCH v3 11/11] mm/memory_hotplug: Remove "zone" parameter from sparse_remove_one_section

2019-06-05 Thread Wei Yang
On Mon, May 27, 2019 at 01:11:52PM +0200, David Hildenbrand wrote:
>The parameter is unused, so let's drop it. Memory removal paths should
>never care about zones. This is the job of memory offlining and will
>require more refactorings.
>
>Reviewed-by: Dan Williams 
>Signed-off-by: David Hildenbrand 

Reviewed-by: Wei Yang 

>---
> include/linux/memory_hotplug.h | 2 +-
> mm/memory_hotplug.c| 2 +-
> mm/sparse.c| 4 ++--
> 3 files changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 2f1f87e13baa..1a4257c5f74c 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -346,7 +346,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, 
>unsigned long start_pfn,
> extern bool is_memblock_offlined(struct memory_block *mem);
> extern int sparse_add_one_section(int nid, unsigned long start_pfn,
> struct vmem_altmap *altmap);
>-extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
>*ms,
>+extern void sparse_remove_one_section(struct mem_section *ms,
>   unsigned long map_offset, struct vmem_altmap *altmap);
> extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map,
> unsigned long pnum);
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 82136c5b4c5f..e48ec7b9dee2 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -524,7 +524,7 @@ static void __remove_section(struct zone *zone, struct 
>mem_section *ms,
>   start_pfn = section_nr_to_pfn((unsigned long)scn_nr);
>   __remove_zone(zone, start_pfn);
> 
>-  sparse_remove_one_section(zone, ms, map_offset, altmap);
>+  sparse_remove_one_section(ms, map_offset, altmap);
> }
> 
> /**
>diff --git a/mm/sparse.c b/mm/sparse.c
>index d1d5e05f5b8d..1552c855d62a 100644
>--- a/mm/sparse.c
>+++ b/mm/sparse.c
>@@ -800,8 +800,8 @@ static void free_section_usemap(struct page *memmap, 
>unsigned long *usemap,
>   free_map_bootmem(memmap);
> }
> 
>-void sparse_remove_one_section(struct zone *zone, struct mem_section *ms,
>-  unsigned long map_offset, struct vmem_altmap *altmap)
>+void sparse_remove_one_section(struct mem_section *ms, unsigned long 
>map_offset,
>+ struct vmem_altmap *altmap)
> {
>   struct page *memmap = NULL;
>   unsigned long *usemap = NULL;
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


Re: [PATCH v3 10/11] mm/memory_hotplug: Make unregister_memory_block_under_nodes() never fail

2019-06-05 Thread Wei Yang
On Mon, May 27, 2019 at 01:11:51PM +0200, David Hildenbrand wrote:
>We really don't want anything during memory hotunplug to fail.
>We always pass a valid memory block device, that check can go. Avoid
>allocating memory and eventually failing. As we are always called under
>lock, we can use a static piece of memory. This avoids having to put
>the structure onto the stack, having to guess about the stack size
>of callers.
>
>Patch inspired by a patch from Oscar Salvador.
>
>In the future, there might be no need to iterate over nodes at all.
>mem->nid should tell us exactly what to remove. Memory block devices
>with mixed nodes (added during boot) should properly fenced off and never
>removed.
>
>Cc: Greg Kroah-Hartman 
>Cc: "Rafael J. Wysocki" 
>Cc: Alex Deucher 
>Cc: "David S. Miller" 
>Cc: Mark Brown 
>Cc: Chris Wilson 
>Cc: David Hildenbrand 
>Cc: Oscar Salvador 
>Cc: Andrew Morton 
>Cc: Jonathan Cameron 
>Signed-off-by: David Hildenbrand 

Reviewed-by: Wei Yang 

>---
> drivers/base/node.c  | 18 +-
> include/linux/node.h |  5 ++---
> 2 files changed, 7 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/base/node.c b/drivers/base/node.c
>index 04fdfa99b8bc..9be88fd05147 100644
>--- a/drivers/base/node.c
>+++ b/drivers/base/node.c
>@@ -803,20 +803,14 @@ int register_mem_sect_under_node(struct memory_block 
>*mem_blk, void *arg)
> 
> /*
>  * Unregister memory block device under all nodes that it spans.
>+ * Has to be called with mem_sysfs_mutex held (due to unlinked_nodes).
>  */
>-int unregister_memory_block_under_nodes(struct memory_block *mem_blk)
>+void unregister_memory_block_under_nodes(struct memory_block *mem_blk)
> {
>-  NODEMASK_ALLOC(nodemask_t, unlinked_nodes, GFP_KERNEL);
>   unsigned long pfn, sect_start_pfn, sect_end_pfn;
>+  static nodemask_t unlinked_nodes;
> 
>-  if (!mem_blk) {
>-  NODEMASK_FREE(unlinked_nodes);
>-  return -EFAULT;
>-  }
>-  if (!unlinked_nodes)
>-  return -ENOMEM;
>-  nodes_clear(*unlinked_nodes);
>-
>+  nodes_clear(unlinked_nodes);
>   sect_start_pfn = section_nr_to_pfn(mem_blk->start_section_nr);
>   sect_end_pfn = section_nr_to_pfn(mem_blk->end_section_nr);
>   for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
>@@ -827,15 +821,13 @@ int unregister_memory_block_under_nodes(struct 
>memory_block *mem_blk)
>   continue;
>   if (!node_online(nid))
>   continue;
>-  if (node_test_and_set(nid, *unlinked_nodes))
>+  if (node_test_and_set(nid, unlinked_nodes))
>   continue;
>   sysfs_remove_link(_devices[nid]->dev.kobj,
>kobject_name(_blk->dev.kobj));
>   sysfs_remove_link(_blk->dev.kobj,
>kobject_name(_devices[nid]->dev.kobj));
>   }
>-  NODEMASK_FREE(unlinked_nodes);
>-  return 0;
> }
> 
> int link_mem_sections(int nid, unsigned long start_pfn, unsigned long end_pfn)
>diff --git a/include/linux/node.h b/include/linux/node.h
>index 02a29e71b175..548c226966a2 100644
>--- a/include/linux/node.h
>+++ b/include/linux/node.h
>@@ -139,7 +139,7 @@ extern int register_cpu_under_node(unsigned int cpu, 
>unsigned int nid);
> extern int unregister_cpu_under_node(unsigned int cpu, unsigned int nid);
> extern int register_mem_sect_under_node(struct memory_block *mem_blk,
>   void *arg);
>-extern int unregister_memory_block_under_nodes(struct memory_block *mem_blk);
>+extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk);
> 
> extern int register_memory_node_under_compute_node(unsigned int mem_nid,
>  unsigned int cpu_nid,
>@@ -175,9 +175,8 @@ static inline int register_mem_sect_under_node(struct 
>memory_block *mem_blk,
> {
>   return 0;
> }
>-static inline int unregister_memory_block_under_nodes(struct memory_block 
>*mem_blk)
>+static inline void unregister_memory_block_under_nodes(struct memory_block 
>*mem_blk)
> {
>-  return 0;
> }
> 
> static inline void register_hugetlbfs_with_node(node_registration_func_t reg,
>-- 
>2.20.1

-- 
Wei Yang
Help you, Help me


[PATCH v2] powerpc/setup_64: fix -Wempty-body warnings

2019-06-05 Thread Qian Cai
At the beginning of setup_64.c, it has,

  #ifdef DEBUG
  #define DBG(fmt...) udbg_printf(fmt)
  #else
  #define DBG(fmt...)
  #endif

where DBG() could be compiled away, and generate warnings,

arch/powerpc/kernel/setup_64.c: In function 'initialize_cache_info':
arch/powerpc/kernel/setup_64.c:579:49: warning: suggest braces around
empty body in an 'if' statement [-Wempty-body]
DBG("Argh, can't find dcache properties !\n");
 ^
arch/powerpc/kernel/setup_64.c:582:49: warning: suggest braces around
empty body in an 'if' statement [-Wempty-body]
DBG("Argh, can't find icache properties !\n");

Suggested-by: Tyrel Datwyler 
Signed-off-by: Qian Cai 
---

v2: fix it by using a NOP while loop.

 arch/powerpc/kernel/setup_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 44b4c432a273..bed4ae8d338c 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -71,7 +71,7 @@
 #ifdef DEBUG
 #define DBG(fmt...) udbg_printf(fmt)
 #else
-#define DBG(fmt...)
+#define DBG(fmt...) do { } while (0)
 #endif
 
 int spinning_secondaries;
-- 
1.8.3.1



[PATCH] powerpc/eeh_cache: fix a W=1 kernel-doc warning

2019-06-05 Thread Qian Cai
The opening comment mark "/**" is reserved for kernel-doc comments, so
it will generate a warning with "make W=1".

arch/powerpc/kernel/eeh_cache.c:37: warning: cannot understand function
prototype: 'struct pci_io_addr_range

Since this is not a kernel-doc for the struct below, but rather an
overview of this source eeh_cache.c, just use the free-form comments
kernel-doc syntax instead.

Signed-off-by: Qian Cai 
---
 arch/powerpc/kernel/eeh_cache.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 320472373122..05ffd32b3416 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -18,6 +18,8 @@
 
 
 /**
+ * DOC: Overview
+ *
  * The pci address cache subsystem.  This subsystem places
  * PCI device address resources into a red-black tree, sorted
  * according to the address range, so that given only an i/o
@@ -34,6 +36,7 @@
  * than any hash algo I could think of for this problem, even
  * with the penalty of slow pointer chases for d-cache misses).
  */
+
 struct pci_io_addr_range {
struct rb_node rb_node;
resource_size_t addr_lo;
-- 
1.8.3.1



Re: [PATCH] powerpc/setup_64: fix -Wempty-body warnings

2019-06-05 Thread Tyrel Datwyler
On 06/05/2019 01:17 PM, Qian Cai wrote:
> At the beginning of setup_64.c, it has,
> 
>   #ifdef DEBUG
>   #define DBG(fmt...) udbg_printf(fmt)
>   #else
>   #define DBG(fmt...)
>   #endif

Simpler solution is just to define the debug in the else clause as such:

#define DBG(fmt...) do { } while(0)

-Tyrel

> 
> where DBG() could be compiled away, and generate warnings,
> 
> arch/powerpc/kernel/setup_64.c: In function 'initialize_cache_info':
> arch/powerpc/kernel/setup_64.c:579:49: warning: suggest braces around
> empty body in an 'if' statement [-Wempty-body]
> DBG("Argh, can't find dcache properties !\n");
>  ^
> arch/powerpc/kernel/setup_64.c:582:49: warning: suggest braces around
> empty body in an 'if' statement [-Wempty-body]
> DBG("Argh, can't find icache properties !\n");
> 
> Signed-off-by: Qian Cai 
> ---
>  arch/powerpc/kernel/setup_64.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 44b4c432a273..23758834324f 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -575,12 +575,13 @@ void __init initialize_cache_info(void)
>* d-cache and i-cache sizes... -Peter
>*/
>   if (cpu) {
> - if (!parse_cache_info(cpu, false, _caches.l1d))
> + /* Add an extra brace to silence -Wempty-body warnings. */
> + if (!parse_cache_info(cpu, false, _caches.l1d)) {
>   DBG("Argh, can't find dcache properties !\n");
> -
> - if (!parse_cache_info(cpu, true, _caches.l1i))
> + }
> + if (!parse_cache_info(cpu, true, _caches.l1i)) {
>   DBG("Argh, can't find icache properties !\n");
> -
> + }
>   /*
>* Try to find the L2 and L3 if any. Assume they are
>* unified and use the D-side properties.
> 



[PATCH] powerpc/setup_64: fix -Wempty-body warnings

2019-06-05 Thread Qian Cai
At the beginning of setup_64.c, it has,

  #ifdef DEBUG
  #define DBG(fmt...) udbg_printf(fmt)
  #else
  #define DBG(fmt...)
  #endif

where DBG() could be compiled away, and generate warnings,

arch/powerpc/kernel/setup_64.c: In function 'initialize_cache_info':
arch/powerpc/kernel/setup_64.c:579:49: warning: suggest braces around
empty body in an 'if' statement [-Wempty-body]
DBG("Argh, can't find dcache properties !\n");
 ^
arch/powerpc/kernel/setup_64.c:582:49: warning: suggest braces around
empty body in an 'if' statement [-Wempty-body]
DBG("Argh, can't find icache properties !\n");

Signed-off-by: Qian Cai 
---
 arch/powerpc/kernel/setup_64.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 44b4c432a273..23758834324f 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -575,12 +575,13 @@ void __init initialize_cache_info(void)
 * d-cache and i-cache sizes... -Peter
 */
if (cpu) {
-   if (!parse_cache_info(cpu, false, _caches.l1d))
+   /* Add an extra brace to silence -Wempty-body warnings. */
+   if (!parse_cache_info(cpu, false, _caches.l1d)) {
DBG("Argh, can't find dcache properties !\n");
-
-   if (!parse_cache_info(cpu, true, _caches.l1i))
+   }
+   if (!parse_cache_info(cpu, true, _caches.l1i)) {
DBG("Argh, can't find icache properties !\n");
-
+   }
/*
 * Try to find the L2 and L3 if any. Assume they are
 * unified and use the D-side properties.
-- 
1.8.3.1



Re: [PATCH v3] powerpc: fix kexec failure on book3s/32

2019-06-05 Thread Aaro Koskinen
Hi,

On Mon, Jun 03, 2019 at 08:20:28AM +, Christophe Leroy wrote:
> In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
> Therefore, allthough __mapin_ram_chunk() was already mapping kernel
> text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
> memory was executable. Part of the memory (first 512kbytes) was
> mapped with BATs instead of page table, but it was also entirely
> mapped as executable.
> 
> In commit 385e89d5b20f ("powerpc/mm: add exec protection on
> powerpc 603"), we started adding exec protection to some 6xx, namely
> the 603, for pages mapped via pagetables.
> 
> Then, in commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for
> STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
> memory, so that really only the kernel text could be executed.
> 
> The problem here is that kexec is based on copying some code into
> upper part of memory then executing it from there in order to install
> a fresh new kernel at its definitive location.
> 
> However, the code is position independant and first part of it is
> just there to deactivate the MMU and jump to the second part. So it
> is possible to run this first part inplace instead of running the
> copy. Once the MMU is off, there is no protection anymore and the
> second part of the code will just run as before.
> 
> Reported-by: Aaro Koskinen 
> Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Christophe Leroy 
> ---
>  Aaro, can you test this patch ? Thanks.

Tested-by: Aaro Koskinen 

A.

>  arch/powerpc/include/asm/kexec.h   | 3 +++
>  arch/powerpc/kernel/machine_kexec_32.c | 4 +++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/kexec.h 
> b/arch/powerpc/include/asm/kexec.h
> index 4a585cba1787..c68476818753 100644
> --- a/arch/powerpc/include/asm/kexec.h
> +++ b/arch/powerpc/include/asm/kexec.h
> @@ -94,6 +94,9 @@ static inline bool kdump_in_progress(void)
>   return crashing_cpu >= 0;
>  }
>  
> +void relocate_new_kernel(unsigned long indirection_page, unsigned long 
> reboot_code_buffer,
> +  unsigned long start_address) __noreturn;
> +
>  #ifdef CONFIG_KEXEC_FILE
>  extern const struct kexec_file_ops kexec_elf64_ops;
>  
> diff --git a/arch/powerpc/kernel/machine_kexec_32.c 
> b/arch/powerpc/kernel/machine_kexec_32.c
> index affe5dcce7f4..2b160d68db49 100644
> --- a/arch/powerpc/kernel/machine_kexec_32.c
> +++ b/arch/powerpc/kernel/machine_kexec_32.c
> @@ -30,7 +30,6 @@ typedef void (*relocate_new_kernel_t)(
>   */
>  void default_machine_kexec(struct kimage *image)
>  {
> - extern const unsigned char relocate_new_kernel[];
>   extern const unsigned int relocate_new_kernel_size;
>   unsigned long page_list;
>   unsigned long reboot_code_buffer, reboot_code_buffer_phys;
> @@ -58,6 +57,9 @@ void default_machine_kexec(struct kimage *image)
>   reboot_code_buffer + KEXEC_CONTROL_PAGE_SIZE);
>   printk(KERN_INFO "Bye!\n");
>  
> + if (!IS_ENABLED(CONFIG_FSL_BOOKE) && !IS_ENABLED(CONFIG_44x))
> + relocate_new_kernel(page_list, reboot_code_buffer_phys, 
> image->start);
> +
>   /* now call it */
>   rnk = (relocate_new_kernel_t) reboot_code_buffer;
>   (*rnk)(page_list, reboot_code_buffer_phys, image->start);
> -- 
> 2.13.3
> 


[PATCH v2 2/2] powerpc/powernv: Add new opal message type

2019-06-05 Thread Vasant Hegde
We have OPAL_MSG_PRD message type to pass prd related messages from OPAL
to `opal-prd`. It can handle messages upto 64 bytes. We have a requirement
to send bigger than 64 bytes of data from OPAL to `opal-prd`. Lets add new
message type (OPAL_MSG_PRD2) to pass bigger data.

Cc: Jeremy Kerr 
Signed-off-by: Vasant Hegde 
---
 arch/powerpc/include/asm/opal-api.h   | 1 +
 arch/powerpc/platforms/powernv/opal-prd.c | 9 -
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 09a8553833d1..428b5ef7db7b 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -453,6 +453,7 @@ enum opal_msg_type {
OPAL_MSG_DPO= 5,
OPAL_MSG_PRD= 6,
OPAL_MSG_OCC= 7,
+   OPAL_MSG_PRD2   = 8,
OPAL_MSG_TYPE_MAX,
 };
 
diff --git a/arch/powerpc/platforms/powernv/opal-prd.c 
b/arch/powerpc/platforms/powernv/opal-prd.c
index e072bf157d62..50a735d77192 100644
--- a/arch/powerpc/platforms/powernv/opal-prd.c
+++ b/arch/powerpc/platforms/powernv/opal-prd.c
@@ -342,7 +342,7 @@ static int opal_prd_msg_notifier(struct notifier_block *nb,
int msg_size, item_size;
unsigned long flags;
 
-   if (msg_type != OPAL_MSG_PRD)
+   if (msg_type != OPAL_MSG_PRD && msg_type != OPAL_MSG_PRD2)
return 0;
 
/* Calculate total size of the message and item we need to store. The
@@ -393,6 +393,13 @@ static int opal_prd_probe(struct platform_device *pdev)
return rc;
}
 
+   rc = opal_message_notifier_register(OPAL_MSG_PRD2, _prd_event_nb);
+   if (rc) {
+   pr_err("%s: Couldn't register event notifier (%d)\n",
+  __func__, OPAL_MSG_PRD2);
+   return rc;
+   }
+
rc = misc_register(_prd_dev);
if (rc) {
pr_err("failed to register miscdev\n");
-- 
2.14.3



[PATCH v2 1/2] powerpc/powernv: Enhance opal message read interface

2019-06-05 Thread Vasant Hegde
Use "opal-msg-size" device tree property to allocate memory for "opal_msg".

Cc: Mahesh Salgaonkar 
Cc: Jeremy Kerr 
Signed-off-by: Vasant Hegde 
---
 arch/powerpc/platforms/powernv/opal.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c 
b/arch/powerpc/platforms/powernv/opal.c
index 98c5d94b17fb..e6ea32f3b3c8 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -58,6 +58,8 @@ static DEFINE_SPINLOCK(opal_write_lock);
 static struct atomic_notifier_head opal_msg_notifier_head[OPAL_MSG_TYPE_MAX];
 static uint32_t opal_heartbeat;
 static struct task_struct *kopald_tsk;
+static struct opal_msg *opal_msg;
+static uint64_t opal_msg_size;
 
 void opal_configure_cores(void)
 {
@@ -264,14 +266,9 @@ static void opal_message_do_notify(uint32_t msg_type, void 
*msg)
 static void opal_handle_message(void)
 {
s64 ret;
-   /*
-* TODO: pre-allocate a message buffer depending on opal-msg-size
-* value in /proc/device-tree.
-*/
-   static struct opal_msg msg;
u32 type;
 
-   ret = opal_get_msg(__pa(), sizeof(msg));
+   ret = opal_get_msg(__pa(opal_msg), opal_msg_size);
/* No opal message pending. */
if (ret == OPAL_RESOURCE)
return;
@@ -283,14 +280,14 @@ static void opal_handle_message(void)
return;
}
 
-   type = be32_to_cpu(msg.msg_type);
+   type = be32_to_cpu(opal_msg->msg_type);
 
/* Sanity check */
if (type >= OPAL_MSG_TYPE_MAX) {
pr_warn_once("%s: Unknown message type: %u\n", __func__, type);
return;
}
-   opal_message_do_notify(type, (void *));
+   opal_message_do_notify(type, (void *)opal_msg);
 }
 
 static irqreturn_t opal_message_notify(int irq, void *data)
@@ -299,9 +296,22 @@ static irqreturn_t opal_message_notify(int irq, void *data)
return IRQ_HANDLED;
 }
 
-static int __init opal_message_init(void)
+static int __init opal_message_init(struct device_node *opal_node)
 {
int ret, i, irq;
+   const __be32 *val;
+
+   val = of_get_property(opal_node, "opal-msg-size", NULL);
+   if (val)
+   opal_msg_size = be32_to_cpup(val);
+
+   /* If opal-msg-size property is not available then use default size */
+   if (!opal_msg_size)
+   opal_msg_size = sizeof(struct opal_msg);
+
+   opal_msg = kmalloc(opal_msg_size, GFP_KERNEL);
+   if (!opal_msg)
+   return -ENOMEM;
 
for (i = 0; i < OPAL_MSG_TYPE_MAX; i++)
ATOMIC_INIT_NOTIFIER_HEAD(_msg_notifier_head[i]);
@@ -903,7 +913,7 @@ static int __init opal_init(void)
}
 
/* Initialise OPAL messaging system */
-   opal_message_init();
+   opal_message_init(opal_node);
 
/* Initialise OPAL asynchronous completion interface */
opal_async_comp_init();
-- 
2.14.3



[PATCH] powerpc/32: Add .data..LASAN* sections explicitly

2019-06-05 Thread Mathieu Malaterre
When both `CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y` and `CONFIG_KASAN=y`
are set, link step typically produce numberous warnings about orphan
section:

  powerpc-linux-gnu-ld: warning: orphan section `.data..LASAN0' from 
`net/core/filter.o' being placed in section `.data..LASAN0'
  powerpc-linux-gnu-ld: warning: orphan section `.data..LASANLOC1' from 
`net/core/filter.o' being placed in section `.data..LASANLOC1'

This commit remove those warnings produced at W=1.

Reported-by: Christophe Leroy 
Signed-off-by: Mathieu Malaterre 
---
 arch/powerpc/kernel/vmlinux.lds.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index 060a1acd7c6d..c74f4cb6ec3a 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -307,6 +307,9 @@ SECTIONS
 #ifdef CONFIG_PPC32
.data : AT(ADDR(.data) - LOAD_OFFSET) {
DATA_DATA
+#ifdef CONFIG_KASAN
+   *(.data..LASAN*)
+#endif
 #ifdef CONFIG_UBSAN
*(.data..Lubsan_data*)
*(.data..Lubsan_type*)
-- 
2.20.1



Re: [PATCH] powerpc/32s: fix booting with CONFIG_PPC_EARLY_DEBUG_BOOTX

2019-06-05 Thread Mathieu Malaterre
On Mon, Jun 3, 2019 at 3:00 PM Christophe Leroy  wrote:
>
> When booting through OF, setup_disp_bat() does nothing because
> disp_BAT are not set. By change, it used to work because BOOTX
> buffer is mapped 1:1 at address 0x8100 by the bootloader, and
> btext_setup_display() sets virt addr same as phys addr.
>
> But since commit 215b823707ce ("powerpc/32s: set up an early static
> hash table for KASAN."), a temporary page table overrides the
> bootloader mapping.
>
> This 0x8100 is also problematic with the newly implemented
> Kernel Userspace Access Protection (KUAP) because it is within user
> address space.
>
> This patch fixes those issues by properly setting disp_BAT through
> a call to btext_prepare_BAT(), allowing setup_disp_bat() to
> properly setup BAT3 for early bootx screen buffer access.
>
> Reported-by: Mathieu Malaterre 
> Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for 
> KASAN.")
> Signed-off-by: Christophe Leroy 

The patch below does fix the symptoms I reported. Tested with CONFIG_KASAN=n :

Tested-by: Mathieu Malaterre 

Thanks !

> ---
>  arch/powerpc/include/asm/btext.h   | 4 
>  arch/powerpc/kernel/prom_init.c| 1 +
>  arch/powerpc/kernel/prom_init_check.sh | 2 +-
>  3 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/btext.h 
> b/arch/powerpc/include/asm/btext.h
> index 3ffad030393c..461b0f193864 100644
> --- a/arch/powerpc/include/asm/btext.h
> +++ b/arch/powerpc/include/asm/btext.h
> @@ -13,7 +13,11 @@ extern void btext_update_display(unsigned long phys, int 
> width, int height,
>  int depth, int pitch);
>  extern void btext_setup_display(int width, int height, int depth, int pitch,
> unsigned long address);
> +#ifdef CONFIG_PPC32
>  extern void btext_prepare_BAT(void);
> +#else
> +static inline void btext_prepare_BAT(void) { }
> +#endif
>  extern void btext_map(void);
>  extern void btext_unmap(void);
>
> diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
> index 3555cad7bdde..ed446b7ea164 100644
> --- a/arch/powerpc/kernel/prom_init.c
> +++ b/arch/powerpc/kernel/prom_init.c
> @@ -2336,6 +2336,7 @@ static void __init prom_check_displays(void)
> prom_printf("W=%d H=%d LB=%d addr=0x%x\n",
> width, height, pitch, addr);
> btext_setup_display(width, height, 8, pitch, addr);
> +   btext_prepare_BAT();
> }
>  #endif /* CONFIG_PPC_EARLY_DEBUG_BOOTX */
> }
> diff --git a/arch/powerpc/kernel/prom_init_check.sh 
> b/arch/powerpc/kernel/prom_init_check.sh
> index 518d416971c1..160bef0d553d 100644
> --- a/arch/powerpc/kernel/prom_init_check.sh
> +++ b/arch/powerpc/kernel/prom_init_check.sh
> @@ -24,7 +24,7 @@ fi
>  WHITELIST="add_reloc_offset __bss_start __bss_stop copy_and_flush
>  _end enter_prom $MEM_FUNCS reloc_offset __secondary_hold
>  __secondary_hold_acknowledge __secondary_hold_spinloop __start
> -logo_linux_clut224
> +logo_linux_clut224 btext_prepare_BAT
>  reloc_got2 kernstart_addr memstart_addr linux_banner _stext
>  __prom_init_toc_start __prom_init_toc_end btext_setup_display TOC."
>
> --
> 2.13.3
>


Re: [RFC V2] mm: Generalize notify_page_fault()

2019-06-05 Thread Matthew Wilcox
On Wed, Jun 05, 2019 at 09:19:22PM +1000, Michael Ellerman wrote:
> Anshuman Khandual  writes:
> > Similar notify_page_fault() definitions are being used by architectures
> > duplicating much of the same code. This attempts to unify them into a
> > single implementation, generalize it and then move it to a common place.
> > kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
> > need not be wrapped again within CONFIG_KPROBES. Trap number argument can
> > now contain upto an 'unsigned int' accommodating all possible platforms.
> ...
> 
> You've changed several of the architectures from something like above,
> where it disables preemption around the call into the below:
> 
> 
> Which skips everything if we're preemptible. Is that an equivalent
> change? If so can you please explain why in more detail.

See the discussion in v1 of this patch, which you were cc'd on.

I agree the description here completely fails to mention why the change.
It should mention commit a980c0ef9f6d8c.

> Also why not have it return bool?
> 
> cheers
> 


Re: [RFC V2] mm: Generalize notify_page_fault()

2019-06-05 Thread Michael Ellerman
Anshuman Khandual  writes:
> Similar notify_page_fault() definitions are being used by architectures
> duplicating much of the same code. This attempts to unify them into a
> single implementation, generalize it and then move it to a common place.
> kprobes_built_in() can detect CONFIG_KPROBES, hence notify_page_fault()
> need not be wrapped again within CONFIG_KPROBES. Trap number argument can
> now contain upto an 'unsigned int' accommodating all possible platforms.
...
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 58f69fa..1bc3b18 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -30,28 +30,6 @@
>  
>  #ifdef CONFIG_MMU
>  
> -#ifdef CONFIG_KPROBES
> -static inline int notify_page_fault(struct pt_regs *regs, unsigned int fsr)
> -{
> - int ret = 0;
> -
> - if (!user_mode(regs)) {
> - /* kprobe_running() needs smp_processor_id() */
> - preempt_disable();
> - if (kprobe_running() && kprobe_fault_handler(regs, fsr))
> - ret = 1;
> - preempt_enable();
> - }
> -
> - return ret;
> -}
> -#else

You've changed several of the architectures from something like above,
where it disables preemption around the call into the below:

> +int __kprobes notify_page_fault(struct pt_regs *regs, unsigned int trap)
> +{
> + int ret = 0;
> +
> + /*
> +  * To be potentially processing a kprobe fault and to be allowed
> +  * to call kprobe_running(), we have to be non-preemptible.
> +  */
> + if (kprobes_built_in() && !preemptible() && !user_mode(regs)) {
> + if (kprobe_running() && kprobe_fault_handler(regs, trap))
> + ret = 1;
> + }
> + return ret;
> +}

Which skips everything if we're preemptible. Is that an equivalent
change? If so can you please explain why in more detail.

Also why not have it return bool?

cheers


[PATCH] ocxl: Update for AFU descriptor template version 1.1

2019-06-05 Thread Frederic Barrat
From: Alastair D'Silva 

The OpenCAPI discovery and configuration specification has been
updated and introduces version 1.1 of the AFU descriptor template,
with new fields to better define the memory layout of an OpenCAPI
adapter.

The ocxl driver doesn't do much yet to support LPC memory but as we
start seeing (non-LPC) AFU images using the new template, this patch
updates the config space parsing code to avoid spitting a warning.

Signed-off-by: Alastair D'Silva 
Signed-off-by: Frederic Barrat 
---

mpe: this patch is originally from Alastair. I added some minor
tweaking, I hope he won't mind. He has better things to do right now,
and I'd like to see this hit the next merge window.


 drivers/misc/ocxl/config.c | 181 +
 include/misc/ocxl.h|   5 +-
 2 files changed, 165 insertions(+), 21 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index 5e65acb8e134..c8e19bfb5ef9 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -20,11 +20,14 @@
 #define OCXL_DVSEC_TEMPL_MMIO_GLOBAL_SZ  0x28
 #define OCXL_DVSEC_TEMPL_MMIO_PP 0x30
 #define OCXL_DVSEC_TEMPL_MMIO_PP_SZ  0x38
-#define OCXL_DVSEC_TEMPL_MEM_SZ  0x3C
-#define OCXL_DVSEC_TEMPL_WWID0x40
+#define OCXL_DVSEC_TEMPL_ALL_MEM_SZ  0x3C
+#define OCXL_DVSEC_TEMPL_LPC_MEM_START   0x40
+#define OCXL_DVSEC_TEMPL_WWID0x48
+#define OCXL_DVSEC_TEMPL_LPC_MEM_SZ  0x58
 
 #define OCXL_MAX_AFU_PER_FUNCTION 64
-#define OCXL_TEMPL_LEN0x58
+#define OCXL_TEMPL_LEN_1_00x58
+#define OCXL_TEMPL_LEN_1_10x60
 #define OCXL_TEMPL_NAME_LEN   24
 #define OCXL_CFG_TIMEOUT 3
 
@@ -269,34 +272,72 @@ static int read_afu_info(struct pci_dev *dev, struct 
ocxl_fn_config *fn,
return 0;
 }
 
+/**
+ * Read the template version from the AFU
+ * dev: the device for the AFU
+ * fn: the AFU offsets
+ * len: outputs the template length
+ * version: outputs the major<<8,minor version
+ *
+ * Returns 0 on success, negative on failure
+ */
+static int read_template_version(struct pci_dev *dev, struct ocxl_fn_config 
*fn,
+   u16 *len, u16 *version)
+{
+   u32 val32;
+   u8 major, minor;
+   int rc;
+
+   rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_VERSION, );
+   if (rc)
+   return rc;
+
+   *len = EXTRACT_BITS(val32, 16, 31);
+   major = EXTRACT_BITS(val32, 8, 15);
+   minor = EXTRACT_BITS(val32, 0, 7);
+   *version = (major << 8) + minor;
+   return 0;
+}
+
 int ocxl_config_check_afu_index(struct pci_dev *dev,
struct ocxl_fn_config *fn, int afu_idx)
 {
-   u32 val;
-   int rc, templ_major, templ_minor, len;
+   int rc;
+   u16 templ_version;
+   u16 len, expected_len;
 
pci_write_config_byte(dev,
fn->dvsec_afu_info_pos + OCXL_DVSEC_AFU_INFO_AFU_IDX,
afu_idx);
-   rc = read_afu_info(dev, fn, OCXL_DVSEC_TEMPL_VERSION, );
+
+   rc = read_template_version(dev, fn, , _version);
if (rc)
return rc;
 
-   /* AFU index map can have holes */
-   if (!val)
+   /* AFU index map can have holes, in which case we read all 0's */
+   if (!templ_version && !len)
return 0;
 
-   templ_major = EXTRACT_BITS(val, 8, 15);
-   templ_minor = EXTRACT_BITS(val, 0, 7);
dev_dbg(>dev, "AFU descriptor template version %d.%d\n",
-   templ_major, templ_minor);
-
-   len = EXTRACT_BITS(val, 16, 31);
-   if (len != OCXL_TEMPL_LEN) {
-   dev_warn(>dev,
-   "Unexpected template length in AFU information (%#x)\n",
-   len);
+   templ_version >> 8, templ_version & 0xFF);
+
+   switch (templ_version) {
+   case 0x0005: // v0.5 was used prior to the spec approval
+   case 0x0100:
+   expected_len = OCXL_TEMPL_LEN_1_0;
+   break;
+   case 0x0101:
+   expected_len = OCXL_TEMPL_LEN_1_1;
+   break;
+   default:
+   dev_warn(>dev, "Unknown AFU template version %#x\n",
+   templ_version);
+   expected_len = len;
}
+   if (len != expected_len)
+   dev_warn(>dev,
+   "Unexpected template length %#x in AFU information, 
expected %#x for version %#x\n",
+   len, expected_len, templ_version);
return 1;
 }
 
@@ -434,6 +475,102 @@ static int validate_afu(struct pci_dev *dev, struct 
ocxl_afu_config *afu)
return 0;
 }
 
+/**
+ * Populate AFU metadata regarding LPC memory
+ * dev: the device for the AFU
+ * fn: the AFU offsets
+ * afu: the AFU struct to populate the LPC metadata into
+ *
+ * Returns 0 on success, negative on failure
+ */
+static int read_afu_lpc_memory_info(struct pci_dev *dev,
+   struct 

Re: [PATCH v3] scsi: ibmvscsi: Don't use rc uninitialized in ibmvscsi_do_work

2019-06-05 Thread Michael Ellerman
"Martin K. Petersen"  writes:
> Nathan,
>
>> clang warns:
>>
>> drivers/scsi/ibmvscsi/ibmvscsi.c:2126:7: warning: variable 'rc' is used
>> uninitialized whenever switch case is taken [-Wsometimes-uninitialized]
>> case IBMVSCSI_HOST_ACTION_NONE:
>>  ^
>
> Applied to 5.3/scsi-queue, thanks!

Thanks all.

cheers


Re: crash after NX error

2019-06-05 Thread Michael Ellerman
Stewart Smith  writes:
> On my two socket POWER9 system (powernv) with 842 zwap set up, I
> recently got a crash with the Ubuntu kernel (I haven't tried with
> upstream, and this is the first time the system has died like this, so
> I'm not sure how repeatable it is).
>
> [2.891463] zswap: loaded using pool 842-nx/zbud
> ...
> [15626.124646] nx_compress_powernv: ERROR: CSB still not valid after 500 
> us, giving up : 00 00 00 00 
> [16868.932913] Unable to handle kernel paging request for data at address 
> 0x6655f67da816cdb8
> [16868.933726] Faulting instruction address: 0xc0391600
>
>
> cpu 0x68: Vector: 380 (Data Access Out of Range) at [c01c9d98b9a0]
> pc: c0391600: kmem_cache_alloc+0x2e0/0x340
> lr: c03915ec: kmem_cache_alloc+0x2cc/0x340
> sp: c01c9d98bc20
>msr: 9280b033
>dar: 6655f67da816cdb8
>   current = 0xc01ad43cb400
>   paca= 0xcfac7800   softe: 0irq_happened: 0x01
> pid   = 8319, comm = make
> Linux version 4.15.0-50-generic (buildd@bos02-ppc64el-006) (gcc version 7.3.0 
> (Ubuntu 7.3.0-16ubuntu3)) #54-Ubuntu SMP Mon May 6 18:55:18 UTC 2019 (Ubuntu 
> 4.15.0-50.54-generic 4.15.18)
>
> 68:mon> t
> [c01c9d98bc20] c03914d4 kmem_cache_alloc+0x1b4/0x340 (unreliable)
> [c01c9d98bc80] c03b1e14 __khugepaged_enter+0x54/0x220
> [c01c9d98bcc0] c010f0ec copy_process.isra.5.part.6+0xebc/0x1a10
> [c01c9d98bda0] c010fe4c _do_fork+0xec/0x510
> [c01c9d98be30] c000b584 ppc_clone+0x8/0xc
> --- Exception: c00 (System Call) at 7afe9daf87f4
> SP (7fffca606880) is in userspace
>
> So, it looks like there could be a problem in the error path, plausibly
> fixed by this patch:
>
> commit 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
> Author: Haren Myneni 
> Date:   Wed Jun 13 00:32:40 2018 -0700
>
> crypto/nx: Initialize 842 high and normal RxFIFO control registers
> 
> NX increments readOffset by FIFO size in receive FIFO control register
> when CRB is read. But the index in RxFIFO has to match with the
> corresponding entry in FIFO maintained by VAS in kernel. Otherwise NX
> may be processing incorrect CRBs and can cause CRB timeout.
> 
> VAS FIFO offset is 0 when the receive window is opened during
> initialization. When the module is reloaded or in kexec boot, readOffset
> in FIFO control register may not match with VAS entry. This patch adds
> nx_coproc_init OPAL call to reset readOffset and queued entries in FIFO
> control register for both high and normal FIFOs.
> 
> Signed-off-by: Haren Myneni 
> [mpe: Fixup uninitialized variable warning]
> Signed-off-by: Michael Ellerman 
>
> $ git describe --contains 656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5
> v4.19-rc1~24^2~50
>
>
> Which was never backported to any stable release, so probably needs to
> be for v4.14 through v4.18.

Yeah the P9 NX support went in in:
  b0d6c9bab5e4 ("crypto/nx: Add P9 NX support for 842 compression engine")

Which was: v4.14-rc1~119^2~21, so first released in v4.14.


I'm actually less interested in that and more interested in the
subsequent crash. The time stamps are miles apart though, did we just
leave some corrupted memory after the NX failed and then hit it later?
Or did we not correctly signal to the upper level APIs that the request
failed.

I think we need to do some testing with errors injected into the
wait_for_csb() path, to ensure that failures there are not causing
corrupting in zswap. Haren have you done any testing of error injection?

cheers


Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()

2019-06-05 Thread David Hildenbrand
On 05.06.19 10:58, David Hildenbrand wrote:
>>> /*
>>>  * For now, we have a linear search to go find the appropriate
>>>  * memory_block corresponding to a particular phys_index. If
>>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block 
>>> **memory, int block_id,
>>> unsigned long start_pfn;
>>> int ret = 0;
>>>
>>> +   mem = find_memory_block_by_id(block_id, NULL);
>>> +   if (mem) {
>>> +   put_device(>dev);
>>> +   return -EEXIST;
>>> +   }
>>
>> find_memory_block_by_id() is not that close to the main idea in this patch.
>> Would it be better to split this part?
> 
> I played with that but didn't like the temporary results (e.g. having to
> export find_memory_block_by_id()). I'll stick to this for now.
> 
>>
>>> mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>> if (!mem)
>>> return -ENOMEM;
>>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>>> return 0;
>>> }
>>>
>>> +static void unregister_memory(struct memory_block *memory)
>>> +{
>>> +   if (WARN_ON_ONCE(memory->dev.bus != _subsys))
>>> +   return;
>>> +
>>> +   /* drop the ref. we got via find_memory_block() */
>>> +   put_device(>dev);
>>> +   device_unregister(>dev);
>>> +}
>>> +
>>> /*
>>> - * need an interface for the VM to add new memory regions,
>>> - * but without onlining it.
>>> + * Create memory block devices for the given memory area. Start and size
>>> + * have to be aligned to memory block granularity. Memory block devices
>>> + * will be initialized as offline.
>>>  */
>>> -int hotplug_memory_register(int nid, struct mem_section *section)
>>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>>> {
>>> -   int block_id = base_memory_block_id(__section_nr(section));
>>> -   int ret = 0;
>>> +   const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>>> +   int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>>> struct memory_block *mem;
>>> +   unsigned long block_id;
>>> +   int ret = 0;
>>>
>>> -   mutex_lock(_sysfs_mutex);
>>> +   if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>>> +!IS_ALIGNED(size, memory_block_size_bytes(
>>> +   return -EINVAL;
>>>
>>> -   mem = find_memory_block(section);
>>> -   if (mem) {
>>> -   mem->section_count++;
>>> -   put_device(>dev);
>>> -   } else {
>>> +   mutex_lock(_sysfs_mutex);
>>> +   for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>>> ret = init_memory_block(, block_id, MEM_OFFLINE);
>>> if (ret)
>>> -   goto out;
>>> -   mem->section_count++;
>>> +   break;
>>> +   mem->section_count = sections_per_block;
>>> +   }
>>> +   if (ret) {
>>> +   end_block_id = block_id;
>>> +   for (block_id = start_block_id; block_id != end_block_id;
>>> +block_id++) {
>>> +   mem = find_memory_block_by_id(block_id, NULL);
>>> +   mem->section_count = 0;
>>> +   unregister_memory(mem);
>>> +   }
>>> }
>>
>> Would it be better to do this in reverse order?
>>
>> And unregister_memory() would free mem, so it is still necessary to set
>> section_count to 0?
> 
> 1. I kept the existing behavior (setting it to 0) for now. I am planning
> to eventually remove the section count completely (it could be
> beneficial to detect removing of partially populated memory blocks).

Correction: We already use it to block offlining of partially populated
memory blocks \o/

> 
> 2. Reverse order: We would have to start with "block_id - 1", I don't
> like that better.
> 
> Thanks for having a look!
> 


-- 

Thanks,

David / dhildenb


Re: [PATCH] powerpc/nvdimm: Add support for multibyte read/write for metadata

2019-06-05 Thread Michael Ellerman
"Aneesh Kumar K.V"  writes:
> Oliver  writes:
>> On Sun, Jun 2, 2019 at 2:44 PM Aneesh Kumar K.V
>>  wrote:
...
>>> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
>>> b/arch/powerpc/platforms/pseries/papr_scm.c
>>> index 0176ce66673f..e33cebb8ee6c 100644
>>> --- a/arch/powerpc/platforms/pseries/papr_scm.c
>>> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
>>> @@ -97,42 +97,102 @@ static int drc_pmem_unbind(struct papr_scm_priv *p)
>>>  }
>>>
>>>  static int papr_scm_meta_get(struct papr_scm_priv *p,
>>> -   struct nd_cmd_get_config_data_hdr *hdr)
>>> +struct nd_cmd_get_config_data_hdr *hdr)
>>>  {
>>> unsigned long data[PLPAR_HCALL_BUFSIZE];
>>> +   unsigned long offset, data_offset;
>>> +   int len, read;
>>> int64_t ret;
>>>
>>> -   if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
>>> +   if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
>>> return -EINVAL;
>>>
>>> -   ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
>>> -   hdr->in_offset, 1);
>>> -
>>> -   if (ret == H_PARAMETER) /* bad DRC index */
>>> -   return -ENODEV;
>>> -   if (ret)
>>> -   return -EINVAL; /* other invalid parameter */
>>> -
>>> -   hdr->out_buf[0] = data[0] & 0xff;
>>> -
>>> +   for (len = hdr->in_length; len; len -= read) {
>>> +
>>> +   data_offset = hdr->in_length - len;
>>> +   offset = hdr->in_offset + data_offset;
>>> +
>>> +   if (len >= 8)
>>> +   read = 8;
>>> +   else if (len >= 4)
>>> +   read = 4;
>>> +   else if ( len >= 2)
>>> +   read = 2;
>>> +   else
>>> +   read = 1;
>>> +
>>> +   ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
>>> + offset, read);
>>> +
>>> +   if (ret == H_PARAMETER) /* bad DRC index */
>>> +   return -ENODEV;
>>> +   if (ret)
>>> +   return -EINVAL; /* other invalid parameter */
>>> +
>>> +   switch (read) {
>>> +   case 8:
>>> +   *(uint64_t *)(hdr->out_buf + data_offset) = 
>>> be64_to_cpu(data[0]);
>>> +   break;
>>> +   case 4:
>>> +   *(uint32_t *)(hdr->out_buf + data_offset) = 
>>> be32_to_cpu(data[0] & 0x);
>>> +   break;
...
>>
>> I assume you got the qemu bits sorted out with Shiva? Looks good otherwise.
>
> That is correct. I also tested with different xfer values (1, 2, 4, 8)
> on both Qemu and PowerVM.

With a big endian kernel?

cheers


Re: [PATCH] ASoC: fsl_esai: fix the channel swap issue after xrun

2019-06-05 Thread S.j. Wang
Hi
> > > > > Sounds like a bug to me...should fix it first by marking the
> > > > > data registers as volatile.
> > > > >
> > > > The ETDR is a writable register, it is not volatile. Even we
> > > > change it to Volatile, I don't think we can't avoid this issue.
> > > > for the regcache_sync Just to write this register, it is correct 
> > > > behavior.
> > >
> > > Is that so? Quoting the comments of regcache_sync():
> > > "* regcache_sync - Sync the register cache with the hardware.
> > >  *
> > >  * @map: map to configure.
> > >  *
> > >  * Any registers that should not be synced should be marked as
> > >  * volatile."
> > >
> > > If regcache_sync() does sync volatile registers too as you said, I
> > > don't mind having this FIFO reset WAR for now, though I think this
> > > mismatch between the comments and the actual behavior then should
> get people's attention.
> > >
> > > Thank you
> >
> > ETDR is not volatile,  if we mark it is volatile, is it correct?
> 
> Well, you have a point -- it might not be ideally true, but it sounds like a
> correct fix to me according to this comments.
> 
> We can wait for Mark's comments or just send a patch to the mail list for
> review.
> 
> Thanks you

I test this patch, we don't need to reset the FIFO, and regcache_sync didn't
Write the ETDR even the EDTR is not volatile.  This fault maybe caused by
Legacy, in the beginning we add this patch in internal branch, there maybe
Something cause this issue, but now can't reproduced. 

So I will remove the reset of FIFO.

Best regards
Wang Shengjiu  





Re: [PATCH] Documentation/stackprotector: powerpc supports stack protector

2019-06-05 Thread Bhupesh Sharma
Hi Jonathan,

On Fri, May 31, 2019 at 8:44 PM Michael Ellerman  wrote:
>
> Jonathan Corbet  writes:
> > On Thu, 30 May 2019 18:37:46 +0530
> > Bhupesh Sharma  wrote:
> >
> >> > This should probably go via the documentation tree?
> >> >
> >> > Acked-by: Michael Ellerman 
> >>
> >> Thanks for the review Michael.
> >> I am ok with this going through the documentation tree as well.
> >
> > Works for me too, but I don't seem to find the actual patch anywhere I
> > look.  Can you send me a copy?
>
> You can get it from lore:
>
>   
> https://lore.kernel.org/linuxppc-dev/1559212177-7072-1-git-send-email-bhsha...@redhat.com/raw
>
> Or patchwork (automatically adds my ack):
>
>   https://patchwork.ozlabs.org/patch/1107706/mbox/
>
> Or Bhupesh can send it to you :)

Please let me know if I should send out the patch again, this time
Cc'ing you and the doc-list.

Thanks,
Bhupesh


Re: [PATCH v3 09/11] mm/memory_hotplug: Remove memory block devices before arch_remove_memory()

2019-06-05 Thread David Hildenbrand
On 05.06.19 00:07, Wei Yang wrote:
> On Mon, May 27, 2019 at 01:11:50PM +0200, David Hildenbrand wrote:
>> Let's factor out removing of memory block devices, which is only
>> necessary for memory added via add_memory() and friends that created
>> memory block devices. Remove the devices before calling
>> arch_remove_memory().
>>
>> This finishes factoring out memory block device handling from
>> arch_add_memory() and arch_remove_memory().
>>
>> Cc: Greg Kroah-Hartman 
>> Cc: "Rafael J. Wysocki" 
>> Cc: David Hildenbrand 
>> Cc: "mike.tra...@hpe.com" 
>> Cc: Andrew Morton 
>> Cc: Andrew Banman 
>> Cc: Ingo Molnar 
>> Cc: Alex Deucher 
>> Cc: "David S. Miller" 
>> Cc: Mark Brown 
>> Cc: Chris Wilson 
>> Cc: Oscar Salvador 
>> Cc: Jonathan Cameron 
>> Cc: Michal Hocko 
>> Cc: Pavel Tatashin 
>> Cc: Arun KS 
>> Cc: Mathieu Malaterre 
>> Reviewed-by: Dan Williams 
>> Signed-off-by: David Hildenbrand 
>> ---
>> drivers/base/memory.c  | 37 ++---
>> drivers/base/node.c| 11 ++-
>> include/linux/memory.h |  2 +-
>> include/linux/node.h   |  6 ++
>> mm/memory_hotplug.c|  5 +++--
>> 5 files changed, 30 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 5a0370f0c506..f28efb0bf5c7 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -763,32 +763,31 @@ int create_memory_block_devices(unsigned long start, 
>> unsigned long size)
>>  return ret;
>> }
>>
>> -void unregister_memory_section(struct mem_section *section)
>> +/*
>> + * Remove memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * have to be offline.
>> + */
>> +void remove_memory_block_devices(unsigned long start, unsigned long size)
>> {
>> +const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>> +const int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>>  struct memory_block *mem;
>> +int block_id;
>>
>> -if (WARN_ON_ONCE(!present_section(section)))
>> +if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>> + !IS_ALIGNED(size, memory_block_size_bytes(
>>  return;
>>
>>  mutex_lock(_sysfs_mutex);
>> -
>> -/*
>> - * Some users of the memory hotplug do not want/need memblock to
>> - * track all sections. Skip over those.
>> - */
>> -mem = find_memory_block(section);
>> -if (!mem)
>> -goto out_unlock;
>> -
>> -unregister_mem_sect_under_nodes(mem, __section_nr(section));
>> -
>> -mem->section_count--;
>> -if (mem->section_count == 0)
>> +for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>> +mem = find_memory_block_by_id(block_id, NULL);
>> +if (WARN_ON_ONCE(!mem))
>> +continue;
>> +mem->section_count = 0;
> 
> Is this step necessary?

It's what the previous code does, it might not be - I'll leave it like
that for now. As mentioned in another reply, I might remove the
section_count completely, eventually.

-- 

Thanks,

David / dhildenb


Re: [PATCH v3 07/11] mm/memory_hotplug: Create memory block devices after arch_add_memory()

2019-06-05 Thread David Hildenbrand
>> /*
>>  * For now, we have a linear search to go find the appropriate
>>  * memory_block corresponding to a particular phys_index. If
>> @@ -658,6 +670,11 @@ static int init_memory_block(struct memory_block 
>> **memory, int block_id,
>>  unsigned long start_pfn;
>>  int ret = 0;
>>
>> +mem = find_memory_block_by_id(block_id, NULL);
>> +if (mem) {
>> +put_device(>dev);
>> +return -EEXIST;
>> +}
> 
> find_memory_block_by_id() is not that close to the main idea in this patch.
> Would it be better to split this part?

I played with that but didn't like the temporary results (e.g. having to
export find_memory_block_by_id()). I'll stick to this for now.

> 
>>  mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>  if (!mem)
>>  return -ENOMEM;
>> @@ -699,44 +716,53 @@ static int add_memory_block(int base_section_nr)
>>  return 0;
>> }
>>
>> +static void unregister_memory(struct memory_block *memory)
>> +{
>> +if (WARN_ON_ONCE(memory->dev.bus != _subsys))
>> +return;
>> +
>> +/* drop the ref. we got via find_memory_block() */
>> +put_device(>dev);
>> +device_unregister(>dev);
>> +}
>> +
>> /*
>> - * need an interface for the VM to add new memory regions,
>> - * but without onlining it.
>> + * Create memory block devices for the given memory area. Start and size
>> + * have to be aligned to memory block granularity. Memory block devices
>> + * will be initialized as offline.
>>  */
>> -int hotplug_memory_register(int nid, struct mem_section *section)
>> +int create_memory_block_devices(unsigned long start, unsigned long size)
>> {
>> -int block_id = base_memory_block_id(__section_nr(section));
>> -int ret = 0;
>> +const int start_block_id = pfn_to_block_id(PFN_DOWN(start));
>> +int end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
>>  struct memory_block *mem;
>> +unsigned long block_id;
>> +int ret = 0;
>>
>> -mutex_lock(_sysfs_mutex);
>> +if (WARN_ON_ONCE(!IS_ALIGNED(start, memory_block_size_bytes()) ||
>> + !IS_ALIGNED(size, memory_block_size_bytes(
>> +return -EINVAL;
>>
>> -mem = find_memory_block(section);
>> -if (mem) {
>> -mem->section_count++;
>> -put_device(>dev);
>> -} else {
>> +mutex_lock(_sysfs_mutex);
>> +for (block_id = start_block_id; block_id != end_block_id; block_id++) {
>>  ret = init_memory_block(, block_id, MEM_OFFLINE);
>>  if (ret)
>> -goto out;
>> -mem->section_count++;
>> +break;
>> +mem->section_count = sections_per_block;
>> +}
>> +if (ret) {
>> +end_block_id = block_id;
>> +for (block_id = start_block_id; block_id != end_block_id;
>> + block_id++) {
>> +mem = find_memory_block_by_id(block_id, NULL);
>> +mem->section_count = 0;
>> +unregister_memory(mem);
>> +}
>>  }
> 
> Would it be better to do this in reverse order?
> 
> And unregister_memory() would free mem, so it is still necessary to set
> section_count to 0?

1. I kept the existing behavior (setting it to 0) for now. I am planning
to eventually remove the section count completely (it could be
beneficial to detect removing of partially populated memory blocks).

2. Reverse order: We would have to start with "block_id - 1", I don't
like that better.

Thanks for having a look!

-- 

Thanks,

David / dhildenb


Re: [PATCH kernel] powerpc/pci/of: Fix OF flags parsing for 64bit BARs

2019-06-05 Thread Shawn Anastasio

On 6/4/19 10:38 PM, Alexey Kardashevskiy wrote:

When the firmware does PCI BAR resource allocation, it passes the assigned
addresses and flags (prefetch/64bit/...) via the "reg" property of
a PCI device device tree node so the kernel does not need to do
resource allocation.

The flags are stored in resource::flags - the lower byte stores
PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc.
Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated,
such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc.
When parsing the "reg" property, we copy the prefetch flag but we skip
on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync.

The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions:
1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch()
or by passing "/chosen/linux,pci-probe-only");
2. we request resource alignment (by passing pci=resource_alignment=
via the kernel cmd line to request PAGE_SIZE alignment or defining
ppc_md.pcibios_default_alignment which returns anything but 0). Note that
the alignment requests are ignored if PCI_PROBE_ONLY is enabled.

With 1) and 2), the generic PCI code in the kernel unconditionally
decides to:
- reassign the BARs in pci_specified_resource_alignment() (works fine)
- write new BARs to the device - this fails for 64bit BARs as the generic
code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits
of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping
in the hypervisor.

This fixes the issue by copying the flag. This is useful if we want to
enforce certain BAR alignment per platform as handling subpage sized BARs
is proven to cause problems with hotplug (SLOF already aligns BARs to 64k).

Signed-off-by: Alexey Kardashevskiy 
---

This code is there for ages (from 200x) hence no "Fixes:".

Ideally I want to enforce /chosen/linux,pci-probe-only in QEMU as
at the moment:
- pci=resource_alignment= alone does not do anything;
- /chosen/linux,pci-probe-only alone does not cause the kernel to
reassign resources;
- pci=resource_alignment= with /chosen/linux,pci-probe-only is broken
anyway.

---
  arch/powerpc/kernel/pci_of_scan.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/kernel/pci_of_scan.c 
b/arch/powerpc/kernel/pci_of_scan.c
index 24191ea2d9a7..64ad92016b63 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -45,6 +45,8 @@ unsigned int pci_parse_of_flags(u32 addr0, int bridge)
if (addr0 & 0x0200) {
flags = IORESOURCE_MEM | PCI_BASE_ADDRESS_SPACE_MEMORY;
flags |= (addr0 >> 22) & PCI_BASE_ADDRESS_MEM_TYPE_64;
+   if (flags & PCI_BASE_ADDRESS_MEM_TYPE_64)
+   flags |= IORESOURCE_MEM_64;
flags |= (addr0 >> 28) & PCI_BASE_ADDRESS_MEM_TYPE_1M;
if (addr0 & 0x4000)
flags |= IORESOURCE_PREFETCH



I have confirmed that this fixes the case with PCI_PROBE_ONLY
disabled and a ppc_md.pcibios_default_alignment implementation that
returns PAGE_SIZE.

Reviewed-by: Shawn Anastasio 


Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem

2019-06-05 Thread Greg KH
On Tue, Jun 04, 2019 at 01:05:45PM -0700, Matthew Garrett wrote:
> On Tue, Jun 4, 2019 at 1:01 PM Nayna  wrote:
> > It seems efivars were first implemented in sysfs and then later
> > separated out as efivarfs.
> > Refer - Documentation/filesystems/efivarfs.txt.
> >
> > So, the reason wasn't that sysfs should not be used for exposing
> > firmware variables,
> > but for the size limitations which seems to come from UEFI Specification.
> >
> > Is this limitation valid for the new requirement of secure variables ?
> 
> I don't think the size restriction is an issue now, but there's a lot
> of complex semantics around variable deletion and immutability that
> need to be represented somehow.

Ah, yeah, that's the reason it would not work in sysfs, forgot all about
that, thanks.

greg k-h


Re: [PATCH] powerpc/nvdimm: Add support for multibyte read/write for metadata

2019-06-05 Thread Alexey Kardashevskiy



On 02/06/2019 14:43, Aneesh Kumar K.V wrote:
> SCM_READ/WRITE_MEATADATA hcall supports multibyte read/write. This patch
> updates the metadata read/write to use 1, 2, 4 or 8 byte read/write as
> mentioned in PAPR document.
> 
> READ/WRITE_METADATA hcall supports the 1, 2, 4, or 8 bytes read/write.
> For other values hcall results H_P3.
> 
> Hypervisor stores the metadata contents in big-endian format and in-order
> to enable read/write in different granularity, we need to switch the contents
> to big-endian before calling HCALL.
> 
> Based on an patch from Oliver O'Halloran 
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 104 +-
>  1 file changed, 82 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 0176ce66673f..e33cebb8ee6c 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -97,42 +97,102 @@ static int drc_pmem_unbind(struct papr_scm_priv *p)
>  }
>  
>  static int papr_scm_meta_get(struct papr_scm_priv *p,
> - struct nd_cmd_get_config_data_hdr *hdr)
> +  struct nd_cmd_get_config_data_hdr *hdr)
>  {
>   unsigned long data[PLPAR_HCALL_BUFSIZE];
> + unsigned long offset, data_offset;
> + int len, read;
>   int64_t ret;
>  
> - if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
> + if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
>   return -EINVAL;
>  
> - ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
> - hdr->in_offset, 1);
> -
> - if (ret == H_PARAMETER) /* bad DRC index */
> - return -ENODEV;
> - if (ret)
> - return -EINVAL; /* other invalid parameter */
> -
> - hdr->out_buf[0] = data[0] & 0xff;
> -
> + for (len = hdr->in_length; len; len -= read) {
> +
> + data_offset = hdr->in_length - len;
> + offset = hdr->in_offset + data_offset;
> +
> + if (len >= 8)
> + read = 8;
> + else if (len >= 4)
> + read = 4;
> + else if ( len >= 2)

Do not need a space before "len".


> + read = 2;
> + else
> + read = 1;
> +
> + ret = plpar_hcall(H_SCM_READ_METADATA, data, p->drc_index,
> +   offset, read);
> +
> + if (ret == H_PARAMETER) /* bad DRC index */
> + return -ENODEV;
> + if (ret)
> + return -EINVAL; /* other invalid parameter */
> +
> + switch (read) {
> + case 8:
> + *(uint64_t *)(hdr->out_buf + data_offset) = 
> be64_to_cpu(data[0]);
> + break;
> + case 4:
> + *(uint32_t *)(hdr->out_buf + data_offset) = 
> be32_to_cpu(data[0] & 0x);
> + break;
> +
> + case 2:
> + *(uint16_t *)(hdr->out_buf + data_offset) = 
> be16_to_cpu(data[0] & 0x);
> + break;
> +
> + case 1:
> + *(uint32_t *)(hdr->out_buf + data_offset) = (data[0] & 
> 0xff);


Memory corruption, should be uint8_t*.


> + break;
> + }
> + }
>   return 0;
>  }
>  
>  static int papr_scm_meta_set(struct papr_scm_priv *p,
> - struct nd_cmd_set_config_hdr *hdr)
> +  struct nd_cmd_set_config_hdr *hdr)
>  {
> + unsigned long offset, data_offset;
> + int len, wrote;
> + unsigned long data;
> + __be64 data_be;
>   int64_t ret;
>  
> - if (hdr->in_offset >= p->metadata_size || hdr->in_length != 1)
> + if ((hdr->in_offset + hdr->in_length) >= p->metadata_size)
>   return -EINVAL;
>  
> - ret = plpar_hcall_norets(H_SCM_WRITE_METADATA,
> - p->drc_index, hdr->in_offset, hdr->in_buf[0], 1);
> -
> - if (ret == H_PARAMETER) /* bad DRC index */
> - return -ENODEV;
> - if (ret)
> - return -EINVAL; /* other invalid parameter */
> + for (len = hdr->in_length; len; len -= wrote) {
> +
> + data_offset = hdr->in_length - len;
> + offset = hdr->in_offset + data_offset;
> +
> + if (len >= 8) {
> + data = *(uint64_t *)(hdr->in_buf + data_offset);
> + data_be = cpu_to_be64(data);
> + wrote = 8;
> + } else if (len >= 4) {
> + data = *(uint32_t *)(hdr->in_buf + data_offset);
> + data &= 0x;


Why do you need &0x here and below (&0x, &0xff)? uint32_t is
unsigned type so the sign bit won't be extended.


> + data_be = cpu_to_be32(data);
> + wrote = 

Re: [PATCH v5] powerpc/64s: support nospectre_v2 cmdline option

2019-06-05 Thread Andrew Donnellan

On 24/5/19 12:46 pm, Christopher M. Riedl wrote:

Add support for disabling the kernel implemented spectre v2 mitigation
(count cache flush on context switch) via the nospectre_v2 and
mitigations=off cmdline options.

Suggested-by: Michael Ellerman 
Signed-off-by: Christopher M. Riedl 
Reviewed-by: Andrew Donnellan 


snowpatch is whinging about this breaking the build for some reason... 
https://patchwork.ozlabs.org/patch/1104583/




---
v4->v5:
Fix checkpatch complaint
  arch/powerpc/kernel/security.c | 19 ---
  1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index e1c9cf079503..7cfcb294b11c 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -28,7 +28,7 @@ static enum count_cache_flush_type count_cache_flush_type = 
COUNT_CACHE_FLUSH_NO
  bool barrier_nospec_enabled;
  static bool no_nospec;
  static bool btb_flush_enabled;
-#ifdef CONFIG_PPC_FSL_BOOK3E
+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3S_64)
  static bool no_spectrev2;
  #endif
  
@@ -114,7 +114,7 @@ static __init int security_feature_debugfs_init(void)

  device_initcall(security_feature_debugfs_init);
  #endif /* CONFIG_DEBUG_FS */
  
-#ifdef CONFIG_PPC_FSL_BOOK3E

+#if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_BOOK3S_64)
  static int __init handle_nospectre_v2(char *p)
  {
no_spectrev2 = true;
@@ -122,6 +122,9 @@ static int __init handle_nospectre_v2(char *p)
return 0;
  }
  early_param("nospectre_v2", handle_nospectre_v2);
+#endif /* CONFIG_PPC_FSL_BOOK3E || CONFIG_PPC_BOOK3S_64 */
+
+#ifdef CONFIG_PPC_FSL_BOOK3E
  void setup_spectre_v2(void)
  {
if (no_spectrev2 || cpu_mitigations_off())
@@ -399,7 +402,17 @@ static void toggle_count_cache_flush(bool enable)
  
  void setup_count_cache_flush(void)

  {
-   toggle_count_cache_flush(true);
+   bool enable = true;
+
+   if (no_spectrev2 || cpu_mitigations_off()) {
+   if (security_ftr_enabled(SEC_FTR_BCCTRL_SERIALISED) ||
+   security_ftr_enabled(SEC_FTR_COUNT_CACHE_DISABLED))
+   pr_warn("Spectre v2 mitigations not under software control, 
can't disable\n");
+
+   enable = false;
+   }
+
+   toggle_count_cache_flush(enable);
  }
  
  #ifdef CONFIG_DEBUG_FS




--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH v2 06/22] docs: mark orphan documents as such

2019-06-05 Thread Andrew Donnellan

On 5/6/19 12:17 am, Mauro Carvalho Chehab wrote:

Sphinx doesn't like orphan documents:

 Documentation/accelerators/ocxl.rst: WARNING: document isn't included in 
any toctree
 Documentation/arm/stm32/overview.rst: WARNING: document isn't included in 
any toctree
 Documentation/arm/stm32/stm32f429-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32f746-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32f769-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32h743-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/arm/stm32/stm32mp157-overview.rst: WARNING: document isn't 
included in any toctree
 Documentation/gpu/msm-crash-dump.rst: WARNING: document isn't included in 
any toctree
 Documentation/interconnect/interconnect.rst: WARNING: document isn't 
included in any toctree
 Documentation/laptops/lg-laptop.rst: WARNING: document isn't included in 
any toctree
 Documentation/powerpc/isa-versions.rst: WARNING: document isn't included 
in any toctree
 Documentation/virtual/kvm/amd-memory-encryption.rst: WARNING: document 
isn't included in any toctree
 Documentation/virtual/kvm/vcpu-requests.rst: WARNING: document isn't 
included in any toctree

So, while they aren't on any toctree, add :orphan: to them, in order
to silent this warning.

Signed-off-by: Mauro Carvalho Chehab 


ocxl:

Acked-by: Andrew Donnellan 

We should find somewhere to put it...


---
  Documentation/accelerators/ocxl.rst | 2 ++
  Documentation/arm/stm32/overview.rst| 2 ++
  Documentation/arm/stm32/stm32f429-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32f746-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32f769-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32h743-overview.rst  | 2 ++
  Documentation/arm/stm32/stm32mp157-overview.rst | 2 ++
  Documentation/gpu/msm-crash-dump.rst| 2 ++
  Documentation/interconnect/interconnect.rst | 2 ++
  Documentation/laptops/lg-laptop.rst | 2 ++
  Documentation/powerpc/isa-versions.rst  | 2 ++
  11 files changed, 22 insertions(+)

diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
index 14cefc020e2d..b1cea19a90f5 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  
  OpenCAPI (Open Coherent Accelerator Processor Interface)
  
diff --git a/Documentation/arm/stm32/overview.rst 
b/Documentation/arm/stm32/overview.rst
index 85cfc8410798..f7e734153860 100644
--- a/Documentation/arm/stm32/overview.rst
+++ b/Documentation/arm/stm32/overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  
  STM32 ARM Linux Overview
  
diff --git a/Documentation/arm/stm32/stm32f429-overview.rst 
b/Documentation/arm/stm32/stm32f429-overview.rst
index 18feda97f483..65bbb1c3b423 100644
--- a/Documentation/arm/stm32/stm32f429-overview.rst
+++ b/Documentation/arm/stm32/stm32f429-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32F429 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32f746-overview.rst b/Documentation/arm/stm32/stm32f746-overview.rst

index b5f4b6ce7656..42d593085015 100644
--- a/Documentation/arm/stm32/stm32f746-overview.rst
+++ b/Documentation/arm/stm32/stm32f746-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32F746 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32f769-overview.rst b/Documentation/arm/stm32/stm32f769-overview.rst

index 228656ced2fe..f6adac862b17 100644
--- a/Documentation/arm/stm32/stm32f769-overview.rst
+++ b/Documentation/arm/stm32/stm32f769-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32F769 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32h743-overview.rst b/Documentation/arm/stm32/stm32h743-overview.rst

index 3458dc00095d..c525835e7473 100644
--- a/Documentation/arm/stm32/stm32h743-overview.rst
+++ b/Documentation/arm/stm32/stm32h743-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32H743 Overview
  ==
  
diff --git a/Documentation/arm/stm32/stm32mp157-overview.rst b/Documentation/arm/stm32/stm32mp157-overview.rst

index 62e176d47ca7..2c52cd020601 100644
--- a/Documentation/arm/stm32/stm32mp157-overview.rst
+++ b/Documentation/arm/stm32/stm32mp157-overview.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  STM32MP157 Overview
  ===
  
diff --git a/Documentation/gpu/msm-crash-dump.rst b/Documentation/gpu/msm-crash-dump.rst

index 757cd257e0d8..240ef200f76c 100644
--- a/Documentation/gpu/msm-crash-dump.rst
+++ b/Documentation/gpu/msm-crash-dump.rst
@@ -1,3 +1,5 @@
+:orphan:
+
  =
  MSM Crash Dump Format
  =
diff --git 

Re: [PATCH] ocxl: do not use C++ style comments in uapi header

2019-06-05 Thread Andrew Donnellan

On 4/6/19 10:12 pm, Masahiro Yamada wrote:

On Tue, Jun 4, 2019 at 8:54 PM Frederic Barrat  wrote:




Le 04/06/2019 à 13:16, Masahiro Yamada a écrit :

Linux kernel tolerates C++ style comments these days. Actually, the
SPDX License tags for .c files start with //.

On the other hand, uapi headers are written in more strict C, where
the C++ comment style is forbidden.

Signed-off-by: Masahiro Yamada 
---


Thanks!
Acked-by: Frederic Barrat 



Please hold on this patch until
we get consensus about the C++ comment style.

Discussion just started here:
https://lore.kernel.org/patchwork/patch/1083801/


If you choose to proceed with this patch:

Acked-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [WIP RFC PATCH 0/6] Generic Firmware Variable Filesystem

2019-06-05 Thread Greg KH
On Tue, Jun 04, 2019 at 04:33:14PM -0400, Nayna wrote:
> 
> 
> On 06/03/2019 03:29 AM, Greg KH wrote:
> > On Mon, Jun 03, 2019 at 04:04:32PM +1000, Daniel Axtens wrote:
> > > Hi Nayna,
> > > 
> > > > > As PowerNV moves towards secure boot, we need a place to put secure
> > > > > variables. One option that has been canvassed is to make our secure
> > > > > variables look like EFI variables. This is an early sketch of another
> > > > > approach where we create a generic firmware variable file system,
> > > > > fwvarfs, and an OPAL Secure Variable backend for it.
> > > > Is there a need of new filesystem ? I am wondering why can't these be
> > > > exposed via sysfs / securityfs ?
> > > > Probably, something like... /sys/firmware/secureboot or
> > > > /sys/kernel/security/secureboot/  ?
> > > I suppose we could put secure variables in sysfs, but I'm not sure
> > > that's what sysfs was intended for. I understand sysfs as "a
> > > filesystem-based view of kernel objects" (from
> > > Documentation/filesystems/configfs/configfs.txt), and I don't think a
> > > secure variable is really a kernel object in the same way most other
> > > things in sysfs are... but I'm open to being convinced.
> > What makes them more "secure" than anything else that is in sysfs today?
> > I didn't see anything in this patchset that provided "additional
> > security", did I miss it?
> > 
> > > securityfs seems to be reserved for LSMs, I don't think we can put
> > > things there.
> > Yeah, I wouldn't mess with that.
> 
> Thanks Greg for clarifying!! I am curious, the TPM exposes the BIOS
> event log to userspace via securityfs. Is there a reason for not
> exposing these security variables to userspace via securityfs as well?

securityfs is for LSMs to use.  If the TPM drivers also use it, well,
that's between those authors and the securityfs developers.

BIOS/firmware variables are a much different thing than a TPM log.

thanks,

greg k-h