Re: [kernel,v2,1/2] powerpc/iommu: Stop using @current in mm_iommu_xxx

2016-07-22 Thread Nicholas Piggin
On Wed, 20 Jul 2016 14:34:30 +1000
Alexey Kardashevskiy  wrote:



>  static long tce_iommu_register_pages(struct tce_container *container,
> @@ -128,10 +129,17 @@ static long tce_iommu_register_pages(struct
> tce_container *container, ((vaddr + size) < vaddr))
>   return -EINVAL;
>  
> - ret = mm_iommu_get(vaddr, entries, &mem);
> + if (!container->mm) {
> + if (!current->mm)
> + return -ESRCH; /* process exited */

This shouldn't happen if we're a userspace process.

> +
> + atomic_inc(¤t->mm->mm_count);
> + container->mm = current->mm;
> + }
> +
> + ret = mm_iommu_get(container->mm, vaddr, entries, &mem);

Is it possible for processes (different mm) to be using the same
container? 


> @@ -354,6 +362,8 @@ static void tce_iommu_release(void *iommu_data)
>   tce_iommu_free_table(tbl);
>   }
>  
> + if (container->mm)
> + mmdrop(container->mm);
>   tce_iommu_disable(container);
>   mutex_destroy(&container->lock);

I'm wondering why keep the mm around at all. There is a bit of
locked_vm accounting there (which maybe doesn't exactly do the
right thing if we're talking about current task's rlimit if the
mm does not belong to current anyway).

The interesting cases are only the ones where a thread does
something with container->mm when current->mm != container->mm
(either a different process or a kernel thread). In what
situations does that happen?

Thanks,
Nick
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/64: implement a slice mask cache

2016-07-22 Thread Balbir Singh
On Fri, Jul 22, 2016 at 10:57:28PM +1000, Nicholas Piggin wrote:
> Calculating the slice mask can become a signifcant overhead for
> get_unmapped_area. The mask is relatively small and does not change
> frequently, so we can cache it in the mm context.
> 
> This saves about 30% kernel time on a 4K user address allocation
> in a microbenchmark.
> 
> Comments on the approach taken? I think there is the option for fixed
> allocations to avoid some of the slice calculation entirely, but first
> I think it will be good to have a general speedup that covers all
> mmaps.
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Anton Blanchard 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu.h |  8 +++
>  arch/powerpc/mm/slice.c  | 39 
> ++--
>  2 files changed, 45 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
> b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 5854263..0d15af4 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -71,6 +71,14 @@ typedef struct {
>  #ifdef CONFIG_PPC_MM_SLICES
>   u64 low_slices_psize;   /* SLB page size encodings */
>   unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
> + struct slice_mask mask_4k;
> +# ifdef CONFIG_PPC_64K_PAGES
> + struct slice_mask mask_64k;
> +# endif
> +# ifdef CONFIG_HUGETLB_PAGE
> + struct slice_mask mask_16m;
> + struct slice_mask mask_16g;
> +# endif

Should we cache these in mmu_psize_defs? I am not 100% sure
if want to overload that structure, but it provides a convient
way of saying mmu_psize_defs[psize].mask instead of all
the if checks

>  #else
>   u16 sllp;   /* SLB page size encoding */
>  #endif
> diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
> index 2b27458..559ea5f 100644
> --- a/arch/powerpc/mm/slice.c
> +++ b/arch/powerpc/mm/slice.c
> @@ -147,7 +147,7 @@ static struct slice_mask slice_mask_for_free(struct 
> mm_struct *mm)
>   return ret;
>  }
>  
> -static struct slice_mask slice_mask_for_size(struct mm_struct *mm, int psize)
> +static struct slice_mask calc_slice_mask_for_size(struct mm_struct *mm, int 
> psize)
>  {
>   unsigned char *hpsizes;
>   int index, mask_index;
> @@ -171,6 +171,36 @@ static struct slice_mask slice_mask_for_size(struct 
> mm_struct *mm, int psize)
>   return ret;
>  }
>  
> +static void recalc_slice_mask_cache(struct mm_struct *mm)
> +{
> + mm->context.mask_4k = calc_slice_mask_for_size(mm, MMU_PAGE_4K);
> +#ifdef CONFIG_PPC_64K_PAGES
> + mm->context.mask_64k = calc_slice_mask_for_size(mm, MMU_PAGE_64K);
> +#endif
> +# ifdef CONFIG_HUGETLB_PAGE
> + /* Radix does not come here */
> + mm->context.mask_16m = calc_slice_mask_for_size(mm, MMU_PAGE_16M);
> + mm->context.mask_16g = calc_slice_mask_for_size(mm, MMU_PAGE_16G);
> +# endif
> +}

Should the function above be called under slice_convert_lock?

> +
> +static struct slice_mask slice_mask_for_size(struct mm_struct *mm, int psize)
> +{
> + if (psize == MMU_PAGE_4K)
> + return mm->context.mask_4k;
> +#ifdef CONFIG_PPC_64K_PAGES
> + if (psize == MMU_PAGE_64K)
> + return mm->context.mask_64k;
> +#endif
> +# ifdef CONFIG_HUGETLB_PAGE
> + if (psize == MMU_PAGE_16M)
> + return mm->context.mask_16m;
> + if (psize == MMU_PAGE_16G)
> + return mm->context.mask_16g;
> +# endif
> + BUG();
> +}
> +
>  static int slice_check_fit(struct slice_mask mask, struct slice_mask 
> available)
>  {
>   return (mask.low_slices & available.low_slices) == mask.low_slices &&
> @@ -233,6 +263,8 @@ static void slice_convert(struct mm_struct *mm, struct 
> slice_mask mask, int psiz
>  
>   spin_unlock_irqrestore(&slice_convert_lock, flags);
>  
> + recalc_slice_mask_cache(mm);
> +
>   copro_flush_all_slbs(mm);
>  }
>  
> @@ -625,7 +657,7 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
> int psize)
>   goto bail;
>  
>   mm->context.user_psize = psize;
> - wmb();
> + wmb(); /* Why? */
>  
>   lpsizes = mm->context.low_slices_psize;
>   for (i = 0; i < SLICE_NUM_LOW; i++)
> @@ -652,6 +684,9 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
> int psize)
> mm->context.low_slices_psize,
> mm->context.high_slices_psize);
>  
> + spin_unlock_irqrestore(&slice_convert_lock, flags);
> + recalc_slice_mask_cache(mm);
> + return;
>   bail:
>   spin_unlock_irqrestore(&slice_convert_lock, flags);
>  }
> -- 
> 2.8.1
> 
> ___
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v4 00/12] mm: Hardened usercopy

2016-07-22 Thread Laura Abbott

On 07/20/2016 01:26 PM, Kees Cook wrote:

Hi,

[This is now in my kspp -next tree, though I'd really love to add some
additional explicit Tested-bys, Reviewed-bys, or Acked-bys. If you've
looked through any part of this or have done any testing, please consider
sending an email with your "*-by:" line. :)]

This is a start of the mainline port of PAX_USERCOPY[1]. After writing
tests (now in lkdtm in -next) for Casey's earlier port[2], I kept tweaking
things further and further until I ended up with a whole new patch series.
To that end, I took Rik, Laura, and other people's feedback along with
additional changes and clean-ups.

Based on my understanding, PAX_USERCOPY was designed to catch a
few classes of flaws (mainly bad bounds checking) around the use of
copy_to_user()/copy_from_user(). These changes don't touch get_user() and
put_user(), since these operate on constant sized lengths, and tend to be
much less vulnerable. There are effectively three distinct protections in
the whole series, each of which I've given a separate CONFIG, though this
patch set is only the first of the three intended protections. (Generally
speaking, PAX_USERCOPY covers what I'm calling CONFIG_HARDENED_USERCOPY
(this) and CONFIG_HARDENED_USERCOPY_WHITELIST (future), and
PAX_USERCOPY_SLABS covers CONFIG_HARDENED_USERCOPY_SPLIT_KMALLOC
(future).)

This series, which adds CONFIG_HARDENED_USERCOPY, checks that objects
being copied to/from userspace meet certain criteria:
- if address is a heap object, the size must not exceed the object's
  allocated size. (This will catch all kinds of heap overflow flaws.)
- if address range is in the current process stack, it must be within the
  a valid stack frame (if such checking is possible) or at least entirely
  within the current process's stack. (This could catch large lengths that
  would have extended beyond the current process stack, or overflows if
  their length extends back into the original stack.)
- if the address range is part of kernel data, rodata, or bss, allow it.
- if address range is page-allocated, that it doesn't span multiple
  allocations (excepting Reserved and CMA pages).
- if address is within the kernel text, reject it.
- everything else is accepted

The patches in the series are:
- Support for examination of CMA page types:
1- mm: Add is_migrate_cma_page
- Support for arch-specific stack frame checking (which will likely be
  replaced in the future by Josh's more comprehensive unwinder):
2- mm: Implement stack frame object validation
- The core copy_to/from_user() checks, without the slab object checks:
3- mm: Hardened usercopy
- Per-arch enablement of the protection:
4- x86/uaccess: Enable hardened usercopy
5- ARM: uaccess: Enable hardened usercopy
6- arm64/uaccess: Enable hardened usercopy
7- ia64/uaccess: Enable hardened usercopy
8- powerpc/uaccess: Enable hardened usercopy
9- sparc/uaccess: Enable hardened usercopy
   10- s390/uaccess: Enable hardened usercopy
- The heap allocator implementation of object size checking:
   11- mm: SLAB hardened usercopy support
   12- mm: SLUB hardened usercopy support

Some notes:

- This is expected to apply on top of -next which contains fixes for the
  position of _etext on both arm and arm64, though it has some conflicts
  with KASAN that should be trivial to fix up. Also in -next are the
  tests for this protection (in lkdtm), prefixed with USERCOPY_.

- I couldn't detect a measurable performance change with these features
  enabled. Kernel build times were unchanged, hackbench was unchanged,
  etc. I think we could flip this to "on by default" at some point, but
  for now, I'm leaving it off until I can get some more definitive
  measurements. I would love if someone with greater familiarity with
  perf could give this a spin and report results.

- The SLOB support extracted from grsecurity seems entirely broken. I
  have no idea what's going on there, I spent my time testing SLAB and
  SLUB. Having someone else look at SLOB would be nice, but this series
  doesn't depend on it.

Additional features that would be nice, but aren't blocking this series:

- Needs more architecture support for stack frame checking (only x86 now,
  but it seems Josh will have a good solution for this soon).


Thanks!

-Kees

[1] https://grsecurity.net/download.php "grsecurity - test kernel patch"
[2] http://www.openwall.com/lists/kernel-hardening/2016/05/19/5

v4:
- handle CMA pages, labbott
- update stack checker comments, labbott
- check for vmalloc addresses, labbott
- deal with KASAN in -next changing arm64 copy*user calls
- check for linear mappings at runtime instead of via CONFIG

v3:
- switch to using BUG for better Oops integration
- when checking page allocations, check each for Reserved
- use enums for the stack check return for readability

v2:
- added s390 support
- handle slub red zone
- disallow writes to rodata area
- stack frame walker now C

Re: [PATCH V5 5/5] powerpc/kvm/stats: Implement existing and add new halt polling vcpu stats

2016-07-22 Thread David Matlack via Linuxppc-dev
On Thu, Jul 21, 2016 at 8:41 PM, Suraj Jitindar Singh
 wrote:
> vcpu stats are used to collect information about a vcpu which can be viewed
> in the debugfs. For example halt_attempted_poll and halt_successful_poll
> are used to keep track of the number of times the vcpu attempts to and
> successfully polls. These stats are currently not used on powerpc.
>
> Implement incrementation of the halt_attempted_poll and
> halt_successful_poll vcpu stats for powerpc. Since these stats are summed
> over all the vcpus for all running guests it doesn't matter which vcpu
> they are attributed to, thus we choose the current runner vcpu of the
> vcore.
>
> Also add new vcpu stats: halt_poll_success_ns, halt_poll_fail_ns and
> halt_wait_ns to be used to accumulate the total time spend polling
> successfully, polling unsuccessfully and waiting respectively, and
> halt_successful_wait to accumulate the number of times the vcpu waits.
> Given that halt_poll_success_ns, halt_poll_fail_ns and halt_wait_ns are
> expressed in nanoseconds it is necessary to represent these as 64-bit
> quantities, otherwise they would overflow after only about 4 seconds.
>
> Given that the total time spend either polling or waiting will be known and
> the number of times that each was done, it will be possible to determine
> the average poll and wait times. This will give the ability to tune the kvm
> module parameters based on the calculated average wait and poll times.
>
> Signed-off-by: Suraj Jitindar Singh 
> Reviewed-by: David Matlack 
>
> ---
> Change Log:
>
> V3 -> V4:
> - Instead of accounting just wait and poll time, separate these
>   into successful_poll_time, failed_poll_time and wait_time.
> V4 -> V5:
> - Add single_task_running() check to polling loop

I was expecting to see this in PATCH 3/5 with the halt-polling
implementation. But otherwise, looks good, and the net effect is the
same.

> ---
>  arch/powerpc/include/asm/kvm_host.h |  4 
>  arch/powerpc/kvm/book3s.c   |  4 
>  arch/powerpc/kvm/book3s_hv.c| 38 
> +++--
>  3 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index f6304c5..f15ffc0 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -114,8 +114,12 @@ struct kvm_vcpu_stat {
> u64 emulated_inst_exits;
> u64 dec_exits;
> u64 ext_intr_exits;
> +   u64 halt_poll_success_ns;
> +   u64 halt_poll_fail_ns;
> +   u64 halt_wait_ns;
> u64 halt_successful_poll;
> u64 halt_attempted_poll;
> +   u64 halt_successful_wait;
> u64 halt_poll_invalid;
> u64 halt_wakeup;
> u64 dbell_exits;
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 47018fc..71eb8f3 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -52,8 +52,12 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
> { "dec", VCPU_STAT(dec_exits) },
> { "ext_intr",VCPU_STAT(ext_intr_exits) },
> { "queue_intr",  VCPU_STAT(queue_intr) },
> +   { "halt_poll_success_ns",   VCPU_STAT(halt_poll_success_ns) },
> +   { "halt_poll_fail_ns",  VCPU_STAT(halt_poll_fail_ns) },
> +   { "halt_wait_ns",   VCPU_STAT(halt_wait_ns) },
> { "halt_successful_poll", VCPU_STAT(halt_successful_poll), },
> { "halt_attempted_poll", VCPU_STAT(halt_attempted_poll), },
> +   { "halt_successful_wait",   VCPU_STAT(halt_successful_wait) },
> { "halt_poll_invalid", VCPU_STAT(halt_poll_invalid) },
> { "halt_wakeup", VCPU_STAT(halt_wakeup) },
> { "pf_storage",  VCPU_STAT(pf_storage) },
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index a9de1d4..b1d9e88 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -2679,15 +2679,16 @@ static int kvmppc_vcore_check_block(struct 
> kvmppc_vcore *vc)
>   */
>  static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
>  {
> +   ktime_t cur, start_poll, start_wait;
> int do_sleep = 1;
> -   ktime_t cur, start;
> u64 block_ns;
> DECLARE_SWAITQUEUE(wait);
>
> /* Poll for pending exceptions and ceded state */
> -   cur = start = ktime_get();
> +   cur = start_poll = ktime_get();
> if (vc->halt_poll_ns) {
> -   ktime_t stop = ktime_add_ns(start, vc->halt_poll_ns);
> +   ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns);
> +   ++vc->runner->stat.halt_attempted_poll;
>
> vc->vcore_state = VCORE_POLLING;
> spin_unlock(&vc->lock);
> @@ -2698,13 +2699,15 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore 
> *vc)
> break;
> }
> cur = ktime_get();

Re: [RFC 0/3] extend kexec_file_load system call

2016-07-22 Thread Thiago Jung Bauermann
Am Freitag, 22 Juli 2016, 12:54:28 schrieb Michael Ellerman:
> Thiago Jung Bauermann  writes:
> > So even if not ideal, the solution above is desirable for powerpc. We
> > would like to preserve the ability of allowing userspace to pass
> > parameters to the OS via the DTB, even if secure boot is enabled.
> > 
> > I would like to turn the above into a proposal:
> > 
> > Extend the syscall as shown in this RFC from Takahiro AKASHI, but
> > instead of accepting a complete DTB from userspace, the syscall accepts
> > a DTB containing only a /chosen node. If the DTB contains any other
> > node, the syscall fails with EINVAL. If the DTB contains any subnode in
> > /chosen, or if there's a compatible or device_type property in /chosen,
> > the syscall fails with EINVAL as well.
> > 
> > The kernel can then add the properties in /chosen to the device tree
> > that it will pass to the next kernel.
> > 
> > What do you think?
> 
> I think we will inevitably have someone who wants to pass something
> other than a child of /chosen.
> 
> At that point we would be faced with adding yet another syscall, or at
> best a new flag.
> 
> I think we'd be better allowing userspace to pass a DTB, and having an
> explicit whitelist (in the kernel) of which nodes & properties are
> allowed in that DTB.

Sounds good to me.

> For starters it would only contain /chosen/stdout-path (for example).
> But we would be able to add new nodes & properties in future.

If we allow things outside of chosen, we can keep the offb.c hook in 
Petitboot and whitelist the framebuffer properties it adds to the vga node.

> The downside is userspace would have no way of detecting the content of
> the white list, other than trial and error. But in practice I'm not sure
> that would be a big problem.

For our use case in OpenPower I don't think it would be a problem, since the 
userspace and the kernel are developed together.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] cxlflash: Regression patch and updating Maintainers list

2016-07-22 Thread Martin K. Petersen
> "Uma" == Uma Krishnan  writes:

Uma> First patch in this set fixes a regression that was casued by the
Uma> Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL
Uma> Flash cards"), which is currently staged for 4.8 in next/master.
Uma> Second patch updates the Maintainers list for cxlflash driver.

Uma> This series is intended for 4.8 and is bisectable. These patches
Uma> are cut against next/master that contains the original commit.

Applied to 4.8/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] of: fix memory leak related to safe_name()

2016-07-22 Thread Rob Herring
On Wed, Jul 20, 2016 at 1:03 AM, Mathieu Malaterre
 wrote:
> On Fri, Jun 24, 2016 at 10:38 PM, Rob Herring  wrote:
>> On Fri, Jun 17, 2016 at 2:51 AM, Mathieu Malaterre
>>  wrote:
>>> v3 tested here multiple times ! memleak is now gone.
>>>
>>> Tested-by: Mathieu Malaterre 
>>>
>>> Thanks
>>>
>>> On Thu, Jun 16, 2016 at 7:51 PM, Frank Rowand  
>>> wrote:
 From: Frank Rowand 

 Fix a memory leak resulting from memory allocation in safe_name().
 This patch fixes all call sites of safe_name().
>>
>> Applied, thanks.
>>
>> Rob
>
> Could this patch be considered for stable ?

Yes, I tagged it for stable. I didn't send it for 4.7, so it won't go
in until after 4.8-rc1 though.

Rob
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 02/11] mm: Hardened usercopy

2016-07-22 Thread Josh Poimboeuf
On Thu, Jul 21, 2016 at 11:34:25AM -0700, Kees Cook wrote:
> On Wed, Jul 20, 2016 at 11:52 PM, Michael Ellerman  
> wrote:
> > Kees Cook  writes:
> >
> >> diff --git a/mm/usercopy.c b/mm/usercopy.c
> >> new file mode 100644
> >> index ..e4bf4e7ccdf6
> >> --- /dev/null
> >> +++ b/mm/usercopy.c
> >> @@ -0,0 +1,234 @@
> > ...
> >> +
> >> +/*
> >> + * Checks if a given pointer and length is contained by the current
> >> + * stack frame (if possible).
> >> + *
> >> + *   0: not at all on the stack
> >> + *   1: fully within a valid stack frame
> >> + *   2: fully on the stack (when can't do frame-checking)
> >> + *   -1: error condition (invalid stack position or bad stack frame)
> >> + */
> >> +static noinline int check_stack_object(const void *obj, unsigned long len)
> >> +{
> >> + const void * const stack = task_stack_page(current);
> >> + const void * const stackend = stack + THREAD_SIZE;
> >
> > That allows access to the entire stack, including the struct thread_info,
> > is that what we want - it seems dangerous? Or did I miss a check
> > somewhere else?
> 
> That seems like a nice improvement to make, yeah.
> 
> > We have end_of_stack() which computes the end of the stack taking
> > thread_info into account (end being the opposite of your end above).
> 
> Amusingly, the object_is_on_stack() check in sched.h doesn't take
> thread_info into account either. :P Regardless, I think using
> end_of_stack() may not be best. To tighten the check, I think we could
> add this after checking that the object is on the stack:
> 
> #ifdef CONFIG_STACK_GROWSUP
> stackend -= sizeof(struct thread_info);
> #else
> stack += sizeof(struct thread_info);
> #endif
> 
> e.g. then if the pointer was in the thread_info, the second test would
> fail, triggering the protection.

FWIW, this won't work right on x86 after Andy's
CONFIG_THREAD_INFO_IN_TASK patches get merged.

-- 
Josh
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/eeh: trivial fix to non-conventional PCI address output on EEH log

2016-07-22 Thread Guilherme G. Piccoli
This is a very minor/trivial fix for the output of PCI address on EEH logs.
The PCI address on "OF node" field currently is using ":" as a separator
for the function, but the usual separator is ".". This patch changes the
separator for dot, so the PCI address is printed as usual.

No functional changes were introduced.

Signed-off-by: Guilherme G. Piccoli 
---
 arch/powerpc/kernel/eeh.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index c9bc78e..7429556 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -168,10 +168,10 @@ static size_t eeh_dump_dev_log(struct eeh_dev *edev, char 
*buf, size_t len)
int n = 0, l = 0;
char buffer[128];
 
-   n += scnprintf(buf+n, len-n, "%04x:%02x:%02x:%01x\n",
+   n += scnprintf(buf+n, len-n, "%04x:%02x:%02x.%01x\n",
   edev->phb->global_number, pdn->busno,
   PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
-   pr_warn("EEH: of node=%04x:%02x:%02x:%01x\n",
+   pr_warn("EEH: of node=%04x:%02x:%02x.%01x\n",
edev->phb->global_number, pdn->busno,
PCI_SLOT(pdn->devfn), PCI_FUNC(pdn->devfn));
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] cxl: fix sparse warnings

2016-07-22 Thread Matthew R. Ochs
> On Jul 22, 2016, at 4:01 AM, Andrew Donnellan  
> wrote:
> 
> Make native_irq_wait() static and use NULL rather than 0 to initialise
> phb->cfg_data in cxl_pci_vphb_add() to remove sparse warnings.
> 
> Signed-off-by: Andrew Donnellan 

Reviewed-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] MAINTAINERS: Update cxlflash maintainers

2016-07-22 Thread Matthew R. Ochs
> On Jul 21, 2016, at 3:44 PM, Uma Krishnan  wrote:
> 
> Adding myself as a cxlflash maintainer.
> 
> Signed-off-by: Uma Krishnan 

Acked-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] cxlflash: Verify problem state area is mapped before notifying shutdown

2016-07-22 Thread Matthew R. Ochs
> On Jul 21, 2016, at 3:44 PM, Uma Krishnan  wrote:
> 
> If an EEH or some other hard error occurs while the
> adapter instance was being initialized, on the subsequent
> shutdown of the device, the system could crash with:
> 
> [c00f1da03b60] c05eccfc pci_device_shutdown+0x6c/0x100
> [c00f1da03ba0] c06d67d4 device_shutdown+0x1b4/0x2c0
> [c00f1da03c40] c00ea30c kernel_restart_prepare+0x5c/0x80
> [c00f1da03c70] c00ea48c kernel_restart+0x2c/0xc0
> [c00f1da03ce0] c00ea970 SyS_reboot+0x1c0/0x2d0
> [c00f1da03e30] c0009204 system_call+0x38/0xb4
> 
> This crash is due to the AFU not being mapped when the shutdown
> notification routine is called and is a regression that was inserted
> recently with Commit 704c4b0ddc03 ("cxlflash: Shutdown notify support
> for CXL Flash cards").
> 
> As a fix, shutdown notification should only occur when the AFU is mapped.
> 
> Fixes: 704c4b0ddc03 ("cxlflash: Shutdown notify support for CXL Flash cards")
> Signed-off-by: Uma Krishnan 

Acked-by: Matthew R. Ochs 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [for-4.8, 1/2] powerpc/mm: Switch user slb fault handling to translation enabled

2016-07-22 Thread Benjamin Herrenschmidt
On Fri, 2016-07-22 at 22:37 +1000, Nicholas Piggin wrote:
> > We also handle fault with proper stack initialized. This enable us
> to
> > callout to C in fault handling routines. We don't do this for
> kernel
> > mapping, because of the possibility of taking recursive fault if
> > kernel stack in not yet mapped by an slb entry.
> > 
> > This enable us to handle Power9 slb fault better. We will add
> bolted
> > entries for the entire kernel mapping in segment table and user slb
> > entries we take fault and insert on demand. With translation on, we
> > should be able to access segment table from fault handler.
> 
> What does this cost on P8? Is that a problem? Might need to do
> feature bits.

Also what is the need ?

The segment table is only for Nest MMU clients, we should probably
handle it separately.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/64: implement a slice mask cache

2016-07-22 Thread Nicholas Piggin
Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. The mask is relatively small and does not change
frequently, so we can cache it in the mm context.

This saves about 30% kernel time on a 4K user address allocation
in a microbenchmark.

Comments on the approach taken? I think there is the option for fixed
allocations to avoid some of the slice calculation entirely, but first
I think it will be good to have a general speedup that covers all
mmaps.

Cc: Benjamin Herrenschmidt 
Cc: Anton Blanchard 
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  8 +++
 arch/powerpc/mm/slice.c  | 39 ++--
 2 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5854263..0d15af4 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -71,6 +71,14 @@ typedef struct {
 #ifdef CONFIG_PPC_MM_SLICES
u64 low_slices_psize;   /* SLB page size encodings */
unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
+   struct slice_mask mask_4k;
+# ifdef CONFIG_PPC_64K_PAGES
+   struct slice_mask mask_64k;
+# endif
+# ifdef CONFIG_HUGETLB_PAGE
+   struct slice_mask mask_16m;
+   struct slice_mask mask_16g;
+# endif
 #else
u16 sllp;   /* SLB page size encoding */
 #endif
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 2b27458..559ea5f 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -147,7 +147,7 @@ static struct slice_mask slice_mask_for_free(struct 
mm_struct *mm)
return ret;
 }
 
-static struct slice_mask slice_mask_for_size(struct mm_struct *mm, int psize)
+static struct slice_mask calc_slice_mask_for_size(struct mm_struct *mm, int 
psize)
 {
unsigned char *hpsizes;
int index, mask_index;
@@ -171,6 +171,36 @@ static struct slice_mask slice_mask_for_size(struct 
mm_struct *mm, int psize)
return ret;
 }
 
+static void recalc_slice_mask_cache(struct mm_struct *mm)
+{
+   mm->context.mask_4k = calc_slice_mask_for_size(mm, MMU_PAGE_4K);
+#ifdef CONFIG_PPC_64K_PAGES
+   mm->context.mask_64k = calc_slice_mask_for_size(mm, MMU_PAGE_64K);
+#endif
+# ifdef CONFIG_HUGETLB_PAGE
+   /* Radix does not come here */
+   mm->context.mask_16m = calc_slice_mask_for_size(mm, MMU_PAGE_16M);
+   mm->context.mask_16g = calc_slice_mask_for_size(mm, MMU_PAGE_16G);
+# endif
+}
+
+static struct slice_mask slice_mask_for_size(struct mm_struct *mm, int psize)
+{
+   if (psize == MMU_PAGE_4K)
+   return mm->context.mask_4k;
+#ifdef CONFIG_PPC_64K_PAGES
+   if (psize == MMU_PAGE_64K)
+   return mm->context.mask_64k;
+#endif
+# ifdef CONFIG_HUGETLB_PAGE
+   if (psize == MMU_PAGE_16M)
+   return mm->context.mask_16m;
+   if (psize == MMU_PAGE_16G)
+   return mm->context.mask_16g;
+# endif
+   BUG();
+}
+
 static int slice_check_fit(struct slice_mask mask, struct slice_mask available)
 {
return (mask.low_slices & available.low_slices) == mask.low_slices &&
@@ -233,6 +263,8 @@ static void slice_convert(struct mm_struct *mm, struct 
slice_mask mask, int psiz
 
spin_unlock_irqrestore(&slice_convert_lock, flags);
 
+   recalc_slice_mask_cache(mm);
+
copro_flush_all_slbs(mm);
 }
 
@@ -625,7 +657,7 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)
goto bail;
 
mm->context.user_psize = psize;
-   wmb();
+   wmb(); /* Why? */
 
lpsizes = mm->context.low_slices_psize;
for (i = 0; i < SLICE_NUM_LOW; i++)
@@ -652,6 +684,9 @@ void slice_set_user_psize(struct mm_struct *mm, unsigned 
int psize)
  mm->context.low_slices_psize,
  mm->context.high_slices_psize);
 
+   spin_unlock_irqrestore(&slice_convert_lock, flags);
+   recalc_slice_mask_cache(mm);
+   return;
  bail:
spin_unlock_irqrestore(&slice_convert_lock, flags);
 }
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [for-4.8, 1/2] powerpc/mm: Switch user slb fault handling to translation enabled

2016-07-22 Thread Nicholas Piggin
On Wed, 13 Jul 2016 15:10:49 +0530
"Aneesh Kumar K.V"  wrote:

> We also handle fault with proper stack initialized. This enable us to
> callout to C in fault handling routines. We don't do this for kernel
> mapping, because of the possibility of taking recursive fault if
> kernel stack in not yet mapped by an slb entry.
> 
> This enable us to handle Power9 slb fault better. We will add bolted
> entries for the entire kernel mapping in segment table and user slb
> entries we take fault and insert on demand. With translation on, we
> should be able to access segment table from fault handler.

What does this cost on P8? Is that a problem? Might need to do
feature bits.

> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 55
> 
> arch/powerpc/mm/slb.c| 11  2 files changed,
> 61 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S
> b/arch/powerpc/kernel/exceptions-64s.S index
> 2747e901fb99..2132bf55573c 100644 ---
> a/arch/powerpc/kernel/exceptions-64s.S +++
> b/arch/powerpc/kernel/exceptions-64s.S @@ -794,7 +794,7 @@
> data_access_slb_relon_pSeries: mfspr  r3,SPRN_DAR
>   mfspr   r12,SPRN_SRR1
>  #ifndef CONFIG_RELOCATABLE
> - b   slb_miss_realmode
> + b   handle_slb_miss_relon
>  #else
>   /*
>* We can't just use a direct branch to slb_miss_realmode
> @@ -803,7 +803,7 @@ data_access_slb_relon_pSeries:
>*/
>   mfctr   r11
>   ld  r10,PACAKBASE(r13)
> - LOAD_HANDLER(r10, slb_miss_realmode)
> + LOAD_HANDLER(r10, handle_slb_miss_relon)
>   mtctr   r10
>   bctr
>  #endif
> @@ -819,11 +819,11 @@ instruction_access_slb_relon_pSeries:
>   mfspr   r3,SPRN_SRR0/* SRR0 is faulting
> address */ mfspr  r12,SPRN_SRR1
>  #ifndef CONFIG_RELOCATABLE
> - b   slb_miss_realmode
> + b   handle_slb_miss_relon
>  #else
>   mfctr   r11
>   ld  r10,PACAKBASE(r13)
> - LOAD_HANDLER(r10, slb_miss_realmode)
> + LOAD_HANDLER(r10, handle_slb_miss_relon)
>   mtctr   r10
>   bctr
>  #endif
> @@ -961,7 +961,23 @@ h_data_storage_common:
>   bl  unknown_exception
>   b   ret_from_except
>  
> +/* r3 point to DAR */
>   .align  7
> + .globl slb_miss_user
> +slb_miss_user:
> + std r3,PACA_EXSLB+EX_DAR(r13)
> + /* Restore r3 as expected by PROLOG_COMMON below */
> + ld  r3,PACA_EXSLB+EX_R3(r13)
> + EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
> + RECONCILE_IRQ_STATE(r10, r11)
> + ld  r4,PACA_EXSLB+EX_DAR(r13)
> + li  r5,0x380
> + std r4,_DAR(r1)
> + addir3,r1,STACK_FRAME_OVERHEAD
> + bl  handle_slb_miss
> + b   ret_from_except_lite
> +
> +.align   7
>   .globl instruction_access_common
>  instruction_access_common:
>   EXCEPTION_PROLOG_COMMON(0x400, PACA_EXGEN)
> @@ -1379,11 +1395,17 @@ unrecover_mce:
>   * We assume we aren't going to take any exceptions during this
> procedure. */
>  slb_miss_realmode:
> - mflrr10
>  #ifdef CONFIG_RELOCATABLE
>   mtctr   r11
>  #endif
> + /*
> +  * Handle user slb miss with translation enabled
> +  */
> + cmpdi   r3,0
> + bge 3f
>  
> +slb_miss_kernel:
> + mflrr10
>   stw r9,PACA_EXSLB+EX_CCR(r13)   /* save CR in
> exc. frame */ std r10,PACA_EXSLB+EX_LR(r13)   /* save LR
> */ 
> @@ -1429,6 +1451,29 @@
> ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX) mtspr
> SPRN_SRR1,r10 rfid
>   b   .
> +3:
> + /*
> +  * Enable IR/DR and handle the fault
> +  */
> + EXCEPTION_PROLOG_PSERIES_1(slb_miss_user, EXC_STD)
> + /*
> +  * handler with relocation on
> +  */
> +handle_slb_miss_relon:
> +#ifdef CONFIG_RELOCATABLE
> + mtctr   r11
> +#endif

This is turning into a bit of spaghetti. I think it can be
improved though.

I have a patch that I think can save a few instructions in the
relocatable case for SLB miss. I think that may give enough space
inline to branch to the correct handler. I should submit it soon.


> + /*
> +  * Handle user slb miss with stack initialized.
> +  */
> + cmpdi   r3,0
> + bge 4f
> + /*
> +  * go back to slb_miss_realmode
> +  */

Are these comments adding much? I think if the names of some of the
labels was improved, you might find they're not necessary. Some
existing labels like slb_miss_realmode don't help too much either. That
handler runs in real and virtual mode, and now the realmode exception
can go elsewhere too.


> + b   slb_miss_kernel

bltslb_miss_kernel?


> +4:
> + EXCEPTION_RELON_PROLOG_PSERIES_1(slb_miss_user, EXC_STD)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] cxl: fix sparse warnings

2016-07-22 Thread Andrew Donnellan
Make native_irq_wait() static and use NULL rather than 0 to initialise
phb->cfg_data in cxl_pci_vphb_add() to remove sparse warnings.

Signed-off-by: Andrew Donnellan 
---
 drivers/misc/cxl/native.c | 2 +-
 drivers/misc/cxl/vphb.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/cxl/native.c b/drivers/misc/cxl/native.c
index 3bcdaee..e606fdc 100644
--- a/drivers/misc/cxl/native.c
+++ b/drivers/misc/cxl/native.c
@@ -924,7 +924,7 @@ static irqreturn_t native_irq_multiplexed(int irq, void 
*data)
return fail_psl_irq(afu, &irq_info);
 }
 
-void native_irq_wait(struct cxl_context *ctx)
+static void native_irq_wait(struct cxl_context *ctx)
 {
u64 dsisr;
int timeout = 1000;
diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
index dee8def..7ada5f1 100644
--- a/drivers/misc/cxl/vphb.c
+++ b/drivers/misc/cxl/vphb.c
@@ -221,7 +221,7 @@ int cxl_pci_vphb_add(struct cxl_afu *afu)
/* Setup the PHB using arch provided callback */
phb->ops = &cxl_pcie_pci_ops;
phb->cfg_addr = NULL;
-   phb->cfg_data = 0;
+   phb->cfg_data = NULL;
phb->private_data = afu;
phb->controller_ops = cxl_pci_controller_ops;
 
-- 
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 05/14] powerpc/pseries: 4GB exception handler offsets

2016-07-22 Thread Nicholas Piggin
On Thu, 21 Jul 2016 14:34:10 +
David Laight  wrote:

> From: Nicholas Piggin
> > Sent: 21 July 2016 07:44  
> ...
> > @@ -739,7 +739,8 @@ kvmppc_skip_Hinterrupt:
> >   * Ensure that any handlers that get invoked from the exception
> > prologs
> >   * above are below the first 64KB (0x1) of the kernel image
> > because
> >   * the prologs assemble the addresses of these handlers using the
> > - * LOAD_HANDLER macro, which uses an ori instruction.
> > + * LOAD_HANDLER_4G macro, which uses an ori instruction. Care must
> > also
> > + * be taken because relative branches can only address 32K in each
> > direction. */  
> 
> That comment now looks wrong.

You're right, I'll correct it.

Thanks,
Nick
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC] powerpc/64: syscall ABI

2016-07-22 Thread Nicholas Piggin
This adds some documentation for the 64-bit syscall ABI, because
we don't seem to have a canonical document anywhere. I have been
mostly looking at 64S, so comments about 32-bit and embedded would
be welcome too.

I have just documented existing practice. The only small issue I
have come across is glibc not getting the clobbers quite matching
the kernel (e.g., xer, and some vsyscalls actually trash cr1 too,
whereas glibc is only clobbering cr0). In practice the glibc
assembly for the calls tends to be located in wrapper functions,
so that's why it hasn't caused any problems. I'll submit patches
for glibc and manpages etc after getting this document discussed
and merged.

Cc: Alan Modra 
---
 Documentation/powerpc/syscall-abi.txt | 103 ++
 1 file changed, 103 insertions(+)
 create mode 100644 Documentation/powerpc/syscall-abi.txt

diff --git a/Documentation/powerpc/syscall-abi.txt 
b/Documentation/powerpc/syscall-abi.txt
new file mode 100644
index 000..65a9f62
--- /dev/null
+++ b/Documentation/powerpc/syscall-abi.txt
@@ -0,0 +1,103 @@
+Power Architecture 64-bit Linux system call ABI
+
+===
+syscall
+===
+syscall calling sequence[*] matches the Power Architecture 64-bit ELF ABI
+specification C function calling sequence, including register preservation
+rules, with the following differences.
+
+[*] Some syscalls (typically low-level management functions) may have different
+calling sequences (e.g., rt_sigreturn).
+
+Parameters and return value
+---
+The system call number is specified in r0.
+
+There is a maximum of 6 integer parameters to a syscall, passed in r3-r8.
+
+Both a return value and a return error code are returned. cr0.SO is the return
+error code, and r3 is the return value or error code. When cr0.SO is clear, the
+syscall succeeded and r3 is the return value. When cr0.SO is set, the syscall
+failed and r3 is the error code that generally corresponds to errno.
+
+Stack
+-
+System calls do not modify the caller's stack frame. For example, the caller's
+stack frame LR and CR save fields are not used.
+
+Register preservation rules
+---
+Register preservation rules match the ELF ABI calling sequence with the
+following differences:
+
+r0: Volatile.   (System call number.)
+r3: Volatile.   (Parameter 1, and return value.)
+r4-r8:  Volatile.   (Parameters 2-6.)
+cr0:Volatile(cr0.SO is the return error condition)
+cr1, cr5-7: Nonvolatile.
+lr: Nonvolatile.
+
+All floating point and vector data registers as well as control and status
+registers are nonvolatile.
+
+Invocation
+--
+The syscall is performed with the sc instruction, and returns with execution
+continuing at the instruction following the sc instruction.
+
+Transactional Memory
+
+Syscall behavior can change if the processor is in transactional or suspended
+transaction state, and the syscall can affect the behavior of the transaction.
+
+If the processor is in suspended state when a syscall is made, the syscall will
+be performed as normal, and will return as normal. The syscall will be
+performed in suspended state, so its side effects will be persistent according
+to the usual transactional memory semantics. A syscall may or may not result in
+the transaction being doomed by hardware.
+
+If the processor is in transactional state when a syscall is made, then the
+behavior depends on the presence of PPC_FEATURE2_HTM_NOSC in the AT_HWCAP2 ELF
+auxiliary vector.
+
+- If present, which is the case for newer kernels, then the syscall will
+  not be performed and the transaction will be doomed by the kernel with the
+  failure code TM_CAUSE_SYSCALL | TM_CAUSE_PERSISTENT in the TEXASR SPR.
+
+- If not present (older kernels), then the kernel will suspend the
+  transactional state and the syscall will proceed as in the case of a
+  suspended state syscall, and will resume the transactional state before
+  returning to the caller. This case is not well defined or supported, so
+  this behavior should not be relied upon.
+
+===
+vsyscall
+===
+vsyscall calling sequence matches the syscall calling sequence, with the
+following differences. Some vsyscalls may have different calling sequences.
+
+Parameters and return value
+---
+r0 is not used as an input. The vsyscall is selected by its address.
+
+Stack
+-
+The vsyscall may or may not use the caller's stack frame save areas.
+
+Register preservation rules
+---
+r0: Volatile.
+cr1, cr5-7: Volatile.
+lr: Volatile.
+
+Invocation
+--
+The vsyscall is performed with a branch-with-link instruction to the
+vsyscall function address.
+
+Transactional Memory
+
+vsyscalls will run in the same transa

Re: [PATCH] powernv: Use __printf in pe_level_printk

2016-07-22 Thread kbuild test robot
Hi,

[auto build test WARNING on v4.7-rc7]
[also build test WARNING on next-20160721]
[cannot apply to powerpc/next]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Joe-Perches/powernv-Use-__printf-in-pe_level_printk/20160715-171449
config: powerpc-allmodconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/platforms/powernv/pci-ioda.c:49:0:
   arch/powerpc/platforms/powernv/pci-ioda.c: In function 
'pnv_ioda_deconfigure_pe':
>> arch/powerpc/platforms/powernv/pci-ioda.c:784:15: warning: format '%ld' 
>> expects argument of type 'long int', but argument 4 has type 'int64_t {aka 
>> long long int}' [-Wformat=]
  pe_warn(pe, "OPAL error %ld remove self from PELTV\n", rc);
  ^
   arch/powerpc/platforms/powernv/pci.h:222:36: note: in definition of macro 
'pe_warn'
 pe_level_printk(pe, KERN_WARNING, fmt, ##__VA_ARGS__)
   ^
   arch/powerpc/platforms/powernv/pci-ioda.c:788:14: warning: format '%ld' 
expects argument of type 'long int', but argument 4 has type 'int64_t {aka long 
long int}' [-Wformat=]
  pe_err(pe, "OPAL error %ld trying to setup PELT table\n", rc);
 ^
   arch/powerpc/platforms/powernv/pci.h:220:32: note: in definition of macro 
'pe_err'
 pe_level_printk(pe, KERN_ERR, fmt, ##__VA_ARGS__)
   ^
   arch/powerpc/platforms/powernv/pci-ioda.c: In function 
'pnv_ioda_setup_bus_PE':
>> arch/powerpc/platforms/powernv/pci-ioda.c:1067:15: warning: format '%d' 
>> expects argument of type 'int', but argument 4 has type 'resource_size_t 
>> {aka long long unsigned int}' [-Wformat=]
  pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
  ^
   arch/powerpc/platforms/powernv/pci.h:224:33: note: in definition of macro 
'pe_info'
 pe_level_printk(pe, KERN_INFO, fmt, ##__VA_ARGS__)
^
   arch/powerpc/platforms/powernv/pci-ioda.c:1067:15: warning: format '%d' 
expects argument of type 'int', but argument 5 has type 'resource_size_t {aka 
long long unsigned int}' [-Wformat=]
  pe_info(pe, "Secondary bus %d..%d associated with PE#%d\n",
  ^
   arch/powerpc/platforms/powernv/pci.h:224:33: note: in definition of macro 
'pe_info'
 pe_level_printk(pe, KERN_INFO, fmt, ##__VA_ARGS__)
^
   arch/powerpc/platforms/powernv/pci-ioda.c:1070:15: warning: format '%d' 
expects argument of type 'int', but argument 4 has type 'resource_size_t {aka 
long long unsigned int}' [-Wformat=]
  pe_info(pe, "Secondary bus %d associated with PE#%d\n",
  ^
   arch/powerpc/platforms/powernv/pci.h:224:33: note: in definition of macro 
'pe_info'
 pe_level_printk(pe, KERN_INFO, fmt, ##__VA_ARGS__)
^
   arch/powerpc/platforms/powernv/pci-ioda.c: In function 
'pnv_pci_ioda2_release_dma_pe':
   arch/powerpc/platforms/powernv/pci-ioda.c:1358:15: warning: format '%ld' 
expects argument of type 'long int', but argument 4 has type 'int64_t {aka long 
long int}' [-Wformat=]
  pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
  ^
   arch/powerpc/platforms/powernv/pci.h:222:36: note: in definition of macro 
'pe_warn'
 pe_level_printk(pe, KERN_WARNING, fmt, ##__VA_ARGS__)
   ^
   arch/powerpc/platforms/powernv/pci-ioda.c: In function 
'pnv_pci_ioda1_setup_dma_pe':
   arch/powerpc/platforms/powernv/pci-ioda.c:2100:15: warning: format '%ld' 
expects argument of type 'long int', but argument 4 has type 'int64_t {aka long 
long int}' [-Wformat=]
   pe_err(pe, " Failed to configure 32-bit TCE table,"
  ^
   arch/powerpc/platforms/powernv/pci.h:220:32: note: in definition of macro 
'pe_err'
 pe_level_printk(pe, KERN_ERR, fmt, ##__VA_ARGS__)
   ^
   arch/powerpc/platforms/powernv/pci-ioda.c: In function 
'pnv_pci_ioda2_set_window':
>> arch/powerpc/platforms/powernv/pci-ioda.c:2160:14: warning: format '%x' 
>> expects argument of type 'unsigned int', but argument 7 has type 'long 
>> unsigned int' [-Wformat=]
 pe_info(pe, "Setting up window#%d %llx..%llx pg=%x\n", num,
 ^
   arch/powerpc/platforms/powernv/pci.h:224:33: note: in definition of macro 
'pe_info'
 pe_level_printk(pe, KERN_INFO, fmt, ##__VA_ARGS__)
^
   arch/powerpc/platforms/powernv/pci-ioda.c:2176:14: warning: format '%ld' 
expects argument of type 'long int', b

[PATCH] powerpc/tm: do not use r13 for tabort_syscall

2016-07-22 Thread Nicholas Piggin
tabort_syscall runs with RI=1, so a nested recoverable machine
check will load the paca into r13 and overwrite what we loaded
it with, because exceptions returning to privileged mode do not
restore r13.

This has survived testing with sc instruction inside transaction
(bare sc, not glibc syscall because glibc can tabort before sc).
Verified the transaction is failing failing with with
TM_CAUSE_SYSCALL.

Signed-off-by: Nick Piggin 
Cc: Michael Neuling 
Cc: Sam Bobroff 
Cc: Michael Ellerman 

---

 arch/powerpc/kernel/entry_64.S | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 73e461a..387dee3 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -368,13 +368,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 tabort_syscall:
/* Firstly we need to enable TM in the kernel */
mfmsr   r10
-   li  r13, 1
-   rldimi  r10, r13, MSR_TM_LG, 63-MSR_TM_LG
-   mtmsrd  r10, 0
+   li  r9,1
+   rldimi  r10,r9,MSR_TM_LG,63-MSR_TM_LG
+   mtmsrd  r10,0
 
/* tabort, this dooms the transaction, nothing else */
-   li  r13, (TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
-   TABORT(R13)
+   li  r9,(TM_CAUSE_SYSCALL|TM_CAUSE_PERSISTENT)
+   TABORT(R9)
 
/*
 * Return directly to userspace. We have corrupted user register state,
@@ -382,11 +382,11 @@ tabort_syscall:
 * resume after the tbegin of the aborted transaction with the
 * checkpointed register state.
 */
-   li  r13, MSR_RI
-   andcr10, r10, r13
-   mtmsrd  r10, 1
-   mtspr   SPRN_SRR0, r11
-   mtspr   SPRN_SRR1, r12
+   li  r9,MSR_RI
+   andcr10,r10,r9
+   mtmsrd  r10,1
+   mtspr   SPRN_SRR0,r11
+   mtspr   SPRN_SRR1,r12
 
rfid
b   .   /* prevent speculative execution */
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/powernv: document cxl dependency on special case in pnv_eeh_reset()

2016-07-22 Thread Andrew Donnellan
pnv_eeh_reset() has special handling for PEs whose primary bus is the root
bus or the bus immediately underneath the root port.

The cxl bi-modal card support added in b0b5e5918ad1 ("cxl: Add
cxl_check_and_switch_mode() API to switch bi-modal cards") relies on this
behaviour when hot-resetting the CAPI adapter following a mode switch.
Document this in pnv_eeh_reset() so we don't accidentally break it.

Suggested-by: Gavin Shan 
Signed-off-by: Andrew Donnellan 

---

Gavin requested that I add this comment a few weeks ago but I didn't get
around to including it in b0b5e5918ad1. Sorry Gavin!
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 86544ea..3af854e 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -1094,6 +1094,13 @@ static int pnv_eeh_reset(struct eeh_pe *pe, int option)
if (pe->type & EEH_PE_VF)
return pnv_eeh_reset_vf_pe(pe, option);
 
+   /*
+* If dealing with the root bus (or the bus underneath the
+* root port), we reset the bus underneath the root port.
+*
+* The cxl driver depends on this behaviour for bi-modal card
+* switching.
+*/
if (pci_is_root_bus(bus) ||
pci_is_root_bus(bus->parent))
return pnv_eeh_root_reset(hose, option);
-- 
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev