Re: [PATCH 09/25] powerpc: ability to create execute-disabled pkeys

2017-10-23 Thread Aneesh Kumar K.V
Ram Pai  writes:

> powerpc has hardware support to disable execute on a pkey.
> This patch enables the ability to create execute-disabled
> keys.

Can you summarize here how this works?  Access to IAMR is
privileged so how will keys framework work with IAMR? 

-aneesh



Re: [PATCH 4/7] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages

2017-10-23 Thread Aneesh Kumar K.V



On 10/24/2017 12:52 AM, Ram Pai wrote:

On Thu, Sep 14, 2017 at 06:13:57PM +1000, Benjamin Herrenschmidt wrote:

On Fri, 2017-09-08 at 15:44 -0700, Ram Pai wrote:

The second part of the PTE will hold
(H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
NOTE: None of the bits in the secondary PTE were not used
by 64k-HPTE backed PTE.


Have you measured the performance impact of this ? The second part of
the PTE being in a different cache line there could be one...


hmm..missed responding to this comment.

I did a preliminay measurement running mmap bench in the selftest.
Ran it multiple times. almost always the numbers were either equal-to
or better-than without the patch-series.


mmap bench doesn't do any fault. It is just mmap/munmap in loop.

-aneesh



RE: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC

2017-10-23 Thread Qiang Zhao
Hi all,

Could anybody review this patchset and take action on them? Thank you!

Best Regards
Qiang Zhao

> -Original Message-
> From: Zhao Qiang [mailto:qiang.z...@nxp.com]
> Sent: Monday, August 07, 2017 11:07 AM
> To: t...@linutronix.de
> Cc: o...@buserror.net; Xiaobo Xie ; linux-
> ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Qiang Zhao
> 
> Subject: [PATCH v10 0/4] this patchset is to remove PPCisms for QEIC
> 
> QEIC is supported more than just powerpc boards, so remove PPCisms.
> 
> changelog:
>   Changes for v8:
>   - use IRQCHIP_DECLARE() instead of subsys_initcall in qeic driver
>   - remove include/soc/fsl/qe/qe_ic.h
>   Changes for v9:
>   - rebase
>   - fix the compile issue when apply the second patch, in fact, there was
> no compile issue
> when apply all the patches of this patchset
>   Changes for v10:
>   - simplify codes, remove duplicated codes
> 
> Zhao Qiang (4):
>   irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
>   Changes for v2:
>   - modify the subject and commit msg
>   Changes for v3:
>   - merge .h file to .c, rename it with irq-qeic.c
>   Changes for v4:
>   - modify comments
>   Changes for v5:
>   - disable rename detection
>   Changes for v6:
>   - rebase
>   Changes for v7:
>   - na
> 
>   irqchip/qeic: merge qeic init code from platforms to a common function
>   Changes for v2:
>   - modify subject and commit msg
>   - add check for qeic by type
>   Changes for v3:
>   - na
>   Changes for v4:
>   - na
>   Changes for v5:
>   - na
>   Changes for v6:
>   - rebase
>   Changes for v7:
>   - na
>   Changes for v8:
>   - use IRQCHIP_DECLARE() instead of subsys_initcall
> 
>   irqchip/qeic: merge qeic_of_init into qe_ic_init
>   Changes for v2:
>   - modify subject and commit msg
>   - return 0 and add put node when return in qe_ic_init
>   Changes for v3:
>   - na
>   Changes for v4:
>   - na
>   Changes for v5:
>   - na
>   Changes for v6:
>   - rebase
>   Changes for v7:
>   - na
> 
>   irqchip/qeic: remove PPCisms for QEIC
>   Changes for v6:
>   - new added
>   Changes for v7:
>   - fix warning
>   Changes for v8:
>   - remove include/soc/fsl/qe/qe_ic.h
> 
> Zhao Qiang (4):
>   irqchip/qeic: move qeic driver from drivers/soc/fsl/qe
>   irqchip/qeic: merge qeic init code from platforms to a common function
>   irqchip/qeic: merge qeic_of_init into qe_ic_init
>   irqchip/qeic: remove PPCisms for QEIC
> 
>  MAINTAINERS|   6 +
>  arch/powerpc/platforms/83xx/km83xx.c   |   1 -
>  arch/powerpc/platforms/83xx/misc.c |  16 -
>  arch/powerpc/platforms/83xx/mpc832x_mds.c  |   1 -
>  arch/powerpc/platforms/83xx/mpc832x_rdb.c  |   1 -
>  arch/powerpc/platforms/83xx/mpc836x_mds.c  |   1 -
>  arch/powerpc/platforms/83xx/mpc836x_rdk.c  |   1 -
>  arch/powerpc/platforms/85xx/corenet_generic.c  |  10 -
>  arch/powerpc/platforms/85xx/mpc85xx_mds.c  |  15 -
>  arch/powerpc/platforms/85xx/mpc85xx_rdb.c  |  17 -
>  arch/powerpc/platforms/85xx/twr_p102x.c|  15 -
>  drivers/irqchip/Makefile   |   1 +
>  drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} | 358 
> -
>  drivers/soc/fsl/qe/Makefile|   2 +-
>  drivers/soc/fsl/qe/qe_ic.h | 103 --
>  include/soc/fsl/qe/qe_ic.h | 139 
>  16 files changed, 218 insertions(+), 469 deletions(-)  rename
> drivers/{soc/fsl/qe/qe_ic.c => irqchip/irq-qeic.c} (58%)  delete mode 100644
> drivers/soc/fsl/qe/qe_ic.h  delete mode 100644 include/soc/fsl/qe/qe_ic.h
> 
> --
> 2.1.0.27.g96db324



Re: [PATCH 4/7] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages

2017-10-23 Thread Ram Pai
On Mon, Oct 23, 2017 at 02:22:44PM +0530, Aneesh Kumar K.V wrote:
> Benjamin Herrenschmidt  writes:
> 
> > On Fri, 2017-09-08 at 15:44 -0700, Ram Pai wrote:
> >> The second part of the PTE will hold
> >> (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
> >> NOTE: None of the bits in the secondary PTE were not used
> >> by 64k-HPTE backed PTE.
> >
> > Have you measured the performance impact of this ? The second part of
> > the PTE being in a different cache line there could be one...
> >
> 
> I am also looking at a patch series removing the slot tracking
> completely. With randomize address turned off and no swap in guest/host
> and making sure we touched most of guest ram, I don't find much impact
> in performance when we don't track the slot at all. I will post the
> patch series with numbers in a day or two. But my test was
> 
> while (5000) {
>   mmap(128M)
>   touch every page of 2048 pages
>   munmap()
> }
> 
> I could also be the best case in my run because i might have always
> found the hash pte slot in the primary. In one measurement with swap on
> and address randmization enabled, i did find a 50% impact. But then i
> was not able to recreate that again. So could be something i did wrong
> in the test setup.
> 
> Ram,
> 
> Will you be able to get a test run with the above loop?

Yes. results with patch look good; better than w/o patch.


/---\
|Itteratn| secs w/ patch|secs w/o patch |
-
|1   | 45.572621| 49.046994 |
|2   | 46.049545| 49.378756 |
|3   | 46.103657| 49.223591 |
|4   | 46.298903| 48.991245 |
|5   | 46.353202| 48.988033 |
|6   | 45.440878| 49.175846 |
|7   | 46.860373| 49.008395 |
|8   | 46.221390| 49.236964 |
|9   | 45.794993| 49.171927 |
|10  | 46.569491| 48.995628 |
|---|
|average  | 46.1265053  | 49.1217379|
\---/


The code is as follows:


diff --git a/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c 
b/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
index 8d084a2..ef2ad87 100644
--- a/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
+++ b/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
@@ -10,14 +10,14 @@
 
 #include "utils.h"
 
-#define ITERATIONS 500
+#define ITERATIONS 5000
 
 #define MEMSIZE (128 * 1024 * 1024)
 
 int test_mmap(void)
 {
struct timespec ts_start, ts_end;
-   unsigned long i = ITERATIONS;
+   unsigned long i = ITERATIONS, j;
 
clock_gettime(CLOCK_MONOTONIC, _start);
 
@@ -25,6 +25,10 @@ int test_mmap(void)
char *c = mmap(NULL, MEMSIZE, PROT_READ|PROT_WRITE,
   MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
FAIL_IF(c == MAP_FAILED);
+
+   for (j=0; j < (MEMSIZE >> 16); j++)
+   c[j<<16] = 0xf;
+
munmap(c, MEMSIZE);
}
 



Re: [PATCH v2 2/3] powerpc: use NMI IPI for smp_send_stop

2017-10-23 Thread kbuild test robot
Hi Nicholas,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/avoid-secondary-hold-spinloop-when-possible/20171023-173012
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-iss476-smp_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/smp.o: In function `smp_send_stop':
>> arch/powerpc/kernel/smp.c:563: undefined reference to `smp_send_nmi_ipi'

vim +563 arch/powerpc/kernel/smp.c

   560  
   561  void smp_send_stop(void)
   562  {
 > 563  smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, stop_this_cpu, 100);
   564  }
   565  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 02/25] powerpc: define an additional vma bit for protection keys.

2017-10-23 Thread Ram Pai
On Thu, Sep 14, 2017 at 06:11:32PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2017-09-14 at 14:38 +1000, Balbir Singh wrote:
> > On Fri,  8 Sep 2017 15:44:50 -0700
> > Ram Pai  wrote:
> > 
> > > powerpc needs an additional vma bit to support 32 keys.
> > > Till the additional vma bit lands in include/linux/mm.h
> > > we have to define  it  in powerpc specific header file.
> > > This is  needed to get pkeys working on power.
> > > 
> > > Signed-off-by: Ram Pai 
> > > ---
> > 
> > "This" being an arch specific hack for the additional bit?
> 
> Arch VMA bits ? really ? I'd rather we limit ourselves to 16 keys first
> then push for adding the extra bit to the generic code.

(hmm.. this mail did not land in my mailbox :( Sorry for the delay. Just
saw it in the mailing list.)

I think this is good idea. I can code it such a way that we
support 16 or 32 keys depending on the availability of the vma bit.

No more hunks in the code.


RP



Re: [PATCH v2 2/3] powerpc: use NMI IPI for smp_send_stop

2017-10-23 Thread kbuild test robot
Hi Nicholas,

[auto build test ERROR on powerpc/next]
[also build test ERROR on v4.14-rc6 next-20171018]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Nicholas-Piggin/avoid-secondary-hold-spinloop-when-possible/20171023-173012
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-ge_imp3a_defconfig (attached as .config)
compiler: powerpc-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/smp.o: In function `smp_send_stop':
>> smp.c:(.text+0x774): undefined reference to `smp_send_nmi_ipi'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH 1/2] mm/mmu_notifier: avoid double notification when it is useless v2

2017-10-23 Thread Jerome Glisse
On Sat, Oct 21, 2017 at 11:47:03AM -0400, Jerome Glisse wrote:
> On Sat, Oct 21, 2017 at 04:54:40PM +1100, Balbir Singh wrote:
> > On Thu, 2017-10-19 at 12:58 -0400, Jerome Glisse wrote:
> > > On Thu, Oct 19, 2017 at 09:53:11PM +1100, Balbir Singh wrote:
> > > > On Thu, Oct 19, 2017 at 2:28 PM, Jerome Glisse  
> > > > wrote:
> > > > > On Thu, Oct 19, 2017 at 02:04:26PM +1100, Balbir Singh wrote:
> > > > > > On Mon, 16 Oct 2017 23:10:02 -0400
> > > > > > jgli...@redhat.com wrote:
> > > > > > > From: Jérôme Glisse 

[...]

> > > > > > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > > > > > > index c037d3d34950..ff5bc647b51d 100644
> > > > > > > --- a/mm/huge_memory.c
> > > > > > > +++ b/mm/huge_memory.c
> > > > > > > @@ -1186,8 +1186,15 @@ static int 
> > > > > > > do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, pmd_t orig_pmd,
> > > > > > > goto out_free_pages;
> > > > > > > VM_BUG_ON_PAGE(!PageHead(page), page);
> > > > > > > 
> > > > > > > +   /*
> > > > > > > +* Leave pmd empty until pte is filled note we must notify 
> > > > > > > here as
> > > > > > > +* concurrent CPU thread might write to new page before the 
> > > > > > > call to
> > > > > > > +* mmu_notifier_invalidate_range_end() happens which can lead 
> > > > > > > to a
> > > > > > > +* device seeing memory write in different order than CPU.
> > > > > > > +*
> > > > > > > +* See Documentation/vm/mmu_notifier.txt
> > > > > > > +*/
> > > > > > > pmdp_huge_clear_flush_notify(vma, haddr, vmf->pmd);
> > > > > > > -   /* leave pmd empty until pte is filled */
> > > > > > > 
> > > > > > > pgtable = pgtable_trans_huge_withdraw(vma->vm_mm, vmf->pmd);
> > > > > > > pmd_populate(vma->vm_mm, &_pmd, pgtable);
> > > > > > > @@ -2026,8 +2033,15 @@ static void 
> > > > > > > __split_huge_zero_page_pmd(struct vm_area_struct *vma,
> > > > > > > pmd_t _pmd;
> > > > > > > int i;
> > > > > > > 
> > > > > > > -   /* leave pmd empty until pte is filled */
> > > > > > > -   pmdp_huge_clear_flush_notify(vma, haddr, pmd);
> > > > > > > +   /*
> > > > > > > +* Leave pmd empty until pte is filled note that it is fine 
> > > > > > > to delay
> > > > > > > +* notification until mmu_notifier_invalidate_range_end() as 
> > > > > > > we are
> > > > > > > +* replacing a zero pmd write protected page with a zero pte 
> > > > > > > write
> > > > > > > +* protected page.
> > > > > > > +*
> > > > > > > +* See Documentation/vm/mmu_notifier.txt
> > > > > > > +*/
> > > > > > > +   pmdp_huge_clear_flush(vma, haddr, pmd);
> > > > > > 
> > > > > > Shouldn't the secondary TLB know if the page size changed?
> > > > > 
> > > > > It should not matter, we are talking virtual to physical on behalf
> > > > > of a device against a process address space. So the hardware should
> > > > > not care about the page size.
> > > > > 
> > > > 
> > > > Does that not indicate how much the device can access? Could it try
> > > > to access more than what is mapped?
> > > 
> > > Assuming device has huge TLB and 2MB huge page with 4K small page.
> > > You are going from one 1 TLB covering a 2MB zero page to 512 TLB
> > > each covering 4K. Both case is read only and both case are pointing
> > > to same data (ie zero).
> > > 
> > > It is fine to delay the TLB invalidate on the device to the call of
> > > mmu_notifier_invalidate_range_end(). The device will keep using the
> > > huge TLB for a little longer but both CPU and device are looking at
> > > same data.
> > > 
> > > Now if there is a racing thread that replace one of the 512 zeor page
> > > after the split but before mmu_notifier_invalidate_range_end() that
> > > code path would call mmu_notifier_invalidate_range() before changing
> > > the pte to point to something else. Which should shoot down the device
> > > TLB (it would be a serious device bug if this did not work).
> > 
> > OK.. This seems reasonable, but I'd really like to see if it can be
> > tested
> 
> Well hard to test, many factors first each device might react differently.
> Device that only store TLB at 4k granularity are fine. Clever device that
> can store TLB for 4k, 2M, ... can ignore an invalidation that is smaller
> than their TLB entry ie getting a 4K invalidation would not invalidate a
> 2MB TLB entry in the device. I consider this as buggy. I will go look at
> the PCIE ATS specification one more time and see if there is any wording
> related that. I might bring up a question to the PCIE standard body if not.

So inside PCIE ATS there is the definition of "minimum translation or
invalidate size" which says 4096 bytes. So my understanding is that
hardware must support 4K invalidation in all the case and thus we shoud
be safe from possible hazard above.

But none the less i will repost without the optimization for huge page
to be more concervative as anyway we want to be correct before we care
about last bit of optimization.

Cheers,
Jérôme


Re: lpar issue for ZONE_DEVICE p2pmem in 4.14-rc

2017-10-23 Thread Stephen Bates

> [3.537780] lpar: Attempting to resize HPT to shift 21
> [3.539251] Unable to resize hash page table to target order 21: -1
> [3.541079] Unable to create mapping for hot added memory 
> 0xc0002100..0xc00021000400: -2

> For #1 above please check if your qemu supports H_RESIZE_HPT_* hcalls? 

Balbir do you have any suggestions as to how to test for this support? Note I 
am running this on my x86_64 host so there is no virtualization hardware in my 
QEMU. My qemu is very recent (QEMU emulator version 2.10.50 
(v2.10.0-1026-gd8f932c-dirty)).

> For create mapping failures, the rc is -ENOENT. Can you help debug this 
> further? We could do hcall tracing or enable debugging.

Sure I can help debug. My original email also had all you needed to recreate 
this issue so that’s an option too?

Stephen




Re: [PATCH 4/7] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages

2017-10-23 Thread Ram Pai
On Thu, Sep 14, 2017 at 06:13:57PM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2017-09-08 at 15:44 -0700, Ram Pai wrote:
> > The second part of the PTE will hold
> > (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
> > NOTE: None of the bits in the secondary PTE were not used
> > by 64k-HPTE backed PTE.
> 
> Have you measured the performance impact of this ? The second part of
> the PTE being in a different cache line there could be one...

hmm..missed responding to this comment.

I did a preliminay measurement running mmap bench in the selftest.
Ran it multiple times. almost always the numbers were either equal-to
or better-than without the patch-series.

RP



Re: [PATCH 06/25] powerpc: cleaup AMR,iAMR when a key is allocated or freed

2017-10-23 Thread Ram Pai
On Mon, Oct 23, 2017 at 03:13:45PM +0530, Aneesh Kumar K.V wrote:
> Balbir Singh  writes:
> 
> > On Fri,  8 Sep 2017 15:44:54 -0700
> > Ram Pai  wrote:
> >
> >> cleanup the bits corresponding to a key in the AMR, and IAMR
> >> register, when the key is newly allocated/activated or is freed.
> >> We dont want some residual bits cause the hardware enforce
> >> unintended behavior when the key is activated or freed.
> >> 
> >> Signed-off-by: Ram Pai 
> >> ---
> >>  arch/powerpc/include/asm/pkeys.h |   12 
> >>  1 files changed, 12 insertions(+), 0 deletions(-)
> >> 
> >> diff --git a/arch/powerpc/include/asm/pkeys.h 
> >> b/arch/powerpc/include/asm/pkeys.h
> >> index 5a83ed7..53bf13b 100644
> >> --- a/arch/powerpc/include/asm/pkeys.h
> >> +++ b/arch/powerpc/include/asm/pkeys.h
> >> @@ -54,6 +54,8 @@ static inline bool mm_pkey_is_allocated(struct mm_struct 
> >> *mm, int pkey)
> >>mm_set_pkey_is_allocated(mm, pkey));
> >>  }
> >>  
> >> +extern void __arch_activate_pkey(int pkey);
> >> +extern void __arch_deactivate_pkey(int pkey);
> >>  /*
> >>   * Returns a positive, 5-bit key on success, or -1 on failure.
> >>   */
> >> @@ -80,6 +82,12 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
> >>  
> >>ret = ffz((u32)mm_pkey_allocation_map(mm));
> >>mm_set_pkey_allocated(mm, ret);
> >> +
> >> +  /*
> >> +   * enable the key in the hardware
> >> +   */
> >> +  if (ret > 0)
> >> +  __arch_activate_pkey(ret);
> >>return ret;
> >>  }
> >>  
> >> @@ -91,6 +99,10 @@ static inline int mm_pkey_free(struct mm_struct *mm, 
> >> int pkey)
> >>if (!mm_pkey_is_allocated(mm, pkey))
> >>return -EINVAL;
> >>  
> >> +  /*
> >> +   * Disable the key in the hardware
> >> +   */
> >> +  __arch_deactivate_pkey(pkey);
> >>mm_set_pkey_free(mm, pkey);
> >>  
> >>return 0;
> >
> > I think some of these patches can be merged, too much fine granularity
> > is hurting my ability to see the larger function/implementation.
> 
> 
> Completely agree

Me agree too :) Had to fine-grain it to satisfy comments
received during the initial versions.  I had about 10-12 patches in total.

Thanks, will merge a couple of these patches.

RP



Re: [PATCH 06/25] powerpc: cleaup AMR,iAMR when a key is allocated or freed

2017-10-23 Thread Ram Pai
On Mon, Oct 23, 2017 at 03:13:33PM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > cleanup the bits corresponding to a key in the AMR, and IAMR
> > register, when the key is newly allocated/activated or is freed.
> > We dont want some residual bits cause the hardware enforce
> > unintended behavior when the key is activated or freed.
> >
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/pkeys.h |   12 
> >  1 files changed, 12 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index 5a83ed7..53bf13b 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -54,6 +54,8 @@ static inline bool mm_pkey_is_allocated(struct mm_struct 
> > *mm, int pkey)
> > mm_set_pkey_is_allocated(mm, pkey));
> >  }
> >
> > +extern void __arch_activate_pkey(int pkey);
> > +extern void __arch_deactivate_pkey(int pkey);
> >  /*
> >   * Returns a positive, 5-bit key on success, or -1 on failure.
> >   */
> > @@ -80,6 +82,12 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
> >
> > ret = ffz((u32)mm_pkey_allocation_map(mm));
> > mm_set_pkey_allocated(mm, ret);
> > +
> > +   /*
> > +* enable the key in the hardware
> > +*/
> > +   if (ret > 0)
> > +   __arch_activate_pkey(ret);
> > return ret;
> >  }
> 
> We are already arch specific because we are defining them in
> arch/powerpc/include/asm/, then why __arch_activate_pkey() ? 

almost all the memory-key internal functions are named with a two
leading underbars.  So just *loosely* following a convention within that
file. The corresponding functions without the two underbars are the one
exposed to the arch-neutral code.

RP



Re: [PATCH 03/25] powerpc: track allocation status of all pkeys

2017-10-23 Thread Ram Pai
On Mon, Oct 23, 2017 at 03:11:28PM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > Total 32 keys are available on power7 and above. However
> > pkey 0,1 are reserved. So effectively we  have  30 pkeys.
> 
> When you say reserved, reserved by whom? Is that part of ISA or PAPR ?
> Also do you expect that to change. If not why all these indirection?
> Can we have the mask as a #define for 4K and 64K page size
> config?

The reserved keys cannot be determined
statically. It depends on the platform and the key info exported to us
by the device-tree.  Hence it cannot be macro'd.

One of the subsequent patch, reads the device tree and sets pkeys_total
appropriately. FYI.

BTW: key 0 is reserved by the ISA.  key 1 is reserved, but its nebulous.
There is a programming note in the ISA to avoid key 1. Also testing
and experimentation with key 1 lead to unpredicatable behavior on
powervm. Key 31 may or may not be used by the hypervisor. The device
tree has to be referred to determine that.

Bottomline, the mask can be determined only during runtime.

RP



Re: [PATCH 02/25] powerpc: define an additional vma bit for protection keys.

2017-10-23 Thread Ram Pai
On Mon, Oct 23, 2017 at 02:58:55PM +0530, Aneesh Kumar K.V wrote:
> "Aneesh Kumar K.V"  writes:
> 
> > Ram Pai  writes:
> >
> >> powerpc needs an additional vma bit to support 32 keys.
> >> Till the additional vma bit lands in include/linux/mm.h
> >> we have to define  it  in powerpc specific header file.
> >> This is  needed to get pkeys working on power.
> >>
> >> Signed-off-by: Ram Pai 
> >> ---
> >>  arch/powerpc/include/asm/pkeys.h |   18 ++
> >>  1 files changed, 18 insertions(+), 0 deletions(-)
> >>
> >> diff --git a/arch/powerpc/include/asm/pkeys.h 
> >> b/arch/powerpc/include/asm/pkeys.h
> >> index c02305a..44e01a2 100644
> >> --- a/arch/powerpc/include/asm/pkeys.h
> >> +++ b/arch/powerpc/include/asm/pkeys.h
> >> @@ -3,6 +3,24 @@
> >>
> >>  extern bool pkey_inited;
> >>  extern bool pkey_execute_disable_support;
> >> +
> >> +/*
> >> + * powerpc needs an additional vma bit to support 32 keys.
> >> + * Till the additional vma bit lands in include/linux/mm.h
> >> + * we have to carry the hunk below. This is  needed to get
> >> + * pkeys working on power. -- Ram
> >> + */
> >> +#ifndef VM_HIGH_ARCH_BIT_4
> >> +#define VM_HIGH_ARCH_BIT_436
> >> +#define VM_HIGH_ARCH_4BIT(VM_HIGH_ARCH_BIT_4)
> >> +#define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
> >> +#define VM_PKEY_BIT0  VM_HIGH_ARCH_0
> >> +#define VM_PKEY_BIT1  VM_HIGH_ARCH_1
> >> +#define VM_PKEY_BIT2  VM_HIGH_ARCH_2
> >> +#define VM_PKEY_BIT3  VM_HIGH_ARCH_3
> >> +#define VM_PKEY_BIT4  VM_HIGH_ARCH_4
> >> +#endif
> >> +
> >>  #define ARCH_VM_PKEY_FLAGS 0
> >
> > Do we want them in pkeys.h ? Even if they are arch specific for the
> > existing ones we have them in include/linux/mm.h. IIUC, vmflags details
> > are always in mm.h? This will be the first exception to that?
> 
> 
> Also can we move that 
> 
> #if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
> # define VM_PKEY_SHIFTVM_HIGH_ARCH_BIT_0
> # define VM_PKEY_BIT0 VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
> # define VM_PKEY_BIT1 VM_HIGH_ARCH_1
> # define VM_PKEY_BIT2 VM_HIGH_ARCH_2
> # define VM_PKEY_BIT3 VM_HIGH_ARCH_3
> #endif
> 
> to
> 
> #if defined (CONFIG_ARCH_HAS_PKEYS)
> # define VM_PKEY_SHIFTVM_HIGH_ARCH_BIT_0
> # define VM_PKEY_BIT0 VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
> # define VM_PKEY_BIT1 VM_HIGH_ARCH_1
> # define VM_PKEY_BIT2 VM_HIGH_ARCH_2
> # define VM_PKEY_BIT3 VM_HIGH_ARCH_3
> #endif
> 
> 
> And then later update the generic to handle PKEY_BIT4?


Yes. The above changes have been implemented in a patch sent to the mm
mailing list as well as to lkml.

https://lkml.org/lkml/2017/9/15/504

RP



Re: [PATCH 02/25] powerpc: define an additional vma bit for protection keys.

2017-10-23 Thread Ram Pai
On Mon, Oct 23, 2017 at 02:55:51PM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > powerpc needs an additional vma bit to support 32 keys.
> > Till the additional vma bit lands in include/linux/mm.h
> > we have to define  it  in powerpc specific header file.
> > This is  needed to get pkeys working on power.
> >
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/pkeys.h |   18 ++
> >  1 files changed, 18 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index c02305a..44e01a2 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -3,6 +3,24 @@
> >
> >  extern bool pkey_inited;
> >  extern bool pkey_execute_disable_support;
> > +
> > +/*
> > + * powerpc needs an additional vma bit to support 32 keys.
> > + * Till the additional vma bit lands in include/linux/mm.h
> > + * we have to carry the hunk below. This is  needed to get
> > + * pkeys working on power. -- Ram
> > + */
> > +#ifndef VM_HIGH_ARCH_BIT_4
> > +#define VM_HIGH_ARCH_BIT_4 36
> > +#define VM_HIGH_ARCH_4 BIT(VM_HIGH_ARCH_BIT_4)
> > +#define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
> > +#define VM_PKEY_BIT0   VM_HIGH_ARCH_0
> > +#define VM_PKEY_BIT1   VM_HIGH_ARCH_1
> > +#define VM_PKEY_BIT2   VM_HIGH_ARCH_2
> > +#define VM_PKEY_BIT3   VM_HIGH_ARCH_3
> > +#define VM_PKEY_BIT4   VM_HIGH_ARCH_4
> > +#endif
> > +
> >  #define ARCH_VM_PKEY_FLAGS 0
> 
> Do we want them in pkeys.h ? Even if they are arch specific for the
> existing ones we have them in include/linux/mm.h. IIUC, vmflags details
> are always in mm.h? This will be the first exception to that?

I am trying to get the real fix in include/linux/mm.h  
If that happens sooner than this hunk is not needed. 
Till then this is an exception, but it is the **ONLY** exception.

I think your point is to have this hunk in include/linux/mm.h  ?

If yes, than it would be easier to push the real fix in
include/linux/mm.h  instead of pushing this hunk in the there.

RP



[PATCH 4/4] powerpc/kprobes: refactor kprobe_lookup_name for safer string operations

2017-10-23 Thread Naveen N. Rao
Use safer string manipulation functions when dealing with a
user-provided string in kprobe_lookup_name().

Reported-by: David Laight 
Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes.c | 47 ++-
 1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index a14c61855705..bbb4795660f8 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -53,7 +53,7 @@ bool arch_within_kprobe_blacklist(unsigned long addr)
 
 kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset)
 {
-   kprobe_opcode_t *addr;
+   kprobe_opcode_t *addr = NULL;
 
 #ifdef PPC64_ELF_ABI_v2
/* PPC64 ABIv2 needs local entry point */
@@ -85,36 +85,29 @@ kprobe_opcode_t *kprobe_lookup_name(const char *name, 
unsigned int offset)
 * Also handle  format.
 */
char dot_name[MODULE_NAME_LEN + 1 + KSYM_NAME_LEN];
-   const char *modsym;
bool dot_appended = false;
-   if ((modsym = strchr(name, ':')) != NULL) {
-   modsym++;
-   if (*modsym != '\0' && *modsym != '.') {
-   /* Convert to  */
-   strncpy(dot_name, name, modsym - name);
-   dot_name[modsym - name] = '.';
-   dot_name[modsym - name + 1] = '\0';
-   strncat(dot_name, modsym,
-   sizeof(dot_name) - (modsym - name) - 2);
-   dot_appended = true;
-   } else {
-   dot_name[0] = '\0';
-   strncat(dot_name, name, sizeof(dot_name) - 1);
-   }
-   } else if (name[0] != '.') {
-   dot_name[0] = '.';
-   dot_name[1] = '\0';
-   strncat(dot_name, name, KSYM_NAME_LEN - 2);
+   const char *c;
+   ssize_t ret = 0;
+   int len = 0;
+
+   if ((c = strnchr(name, MODULE_NAME_LEN, ':')) != NULL) {
+   c++;
+   len = c - name;
+   memcpy(dot_name, name, len);
+   } else
+   c = name;
+
+   if (*c != '\0' && *c != '.') {
+   dot_name[len++] = '.';
dot_appended = true;
-   } else {
-   dot_name[0] = '\0';
-   strncat(dot_name, name, KSYM_NAME_LEN - 1);
}
-   addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
-   if (!addr && dot_appended) {
-   /* Let's try the original non-dot symbol lookup */
+   ret = strscpy(dot_name + len, c, KSYM_NAME_LEN);
+   if (ret > 0)
+   addr = (kprobe_opcode_t *)kallsyms_lookup_name(dot_name);
+
+   /* Fallback to the original non-dot symbol lookup */
+   if (!addr && dot_appended)
addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
-   }
 #else
addr = (kprobe_opcode_t *)kallsyms_lookup_name(name);
 #endif
-- 
2.14.2



[PATCH 3/4] powerpc/kprobes: Blacklist emulate_update_regs() from kprobes

2017-10-23 Thread Naveen N. Rao
Commit 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't
modify *regs") introduced emulate_update_regs() to perform part of what
emulate_step() was doing earlier. However, this function was not added
to the kprobes blacklist. Add it so as to prevent it from being probed.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/lib/sstep.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 8c3955e183d4..70274b7b4773 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -2717,6 +2717,7 @@ void emulate_update_regs(struct pt_regs *regs, struct 
instruction_op *op)
}
regs->nip = next_pc;
 }
+NOKPROBE_SYMBOL(emulate_update_regs);
 
 /*
  * Emulate a previously-analysed load or store instruction.
-- 
2.14.2



[PATCH 2/4] powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace

2017-10-23 Thread Naveen N. Rao
Per Documentation/kprobes.txt, we don't necessarily need to disable
interrupts before invoking the kprobe handlers. Masami submitted
similar changes for x86 via commit a19b2e3d783964 ("kprobes/x86: Remove
IRQ disabling from ftrace-based/optimized kprobes"). Do the same for
powerpc.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/kprobes-ftrace.c | 10 ++
 arch/powerpc/kernel/optprobes.c  | 10 --
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
index 4b1f34f685b1..7a1f99f1b47f 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -75,11 +75,7 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
 {
struct kprobe *p;
struct kprobe_ctlblk *kcb;
-   unsigned long flags;
 
-   /* Disable irq for emulating a breakpoint and avoiding preempt */
-   local_irq_save(flags);
-   hard_irq_disable();
preempt_disable();
 
p = get_kprobe((kprobe_opcode_t *)nip);
@@ -105,16 +101,14 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned 
long parent_nip,
else {
/*
 * If pre_handler returns !0, it sets regs->nip and
-* resets current kprobe. In this case, we still need
-* to restore irq, but not preemption.
+* resets current kprobe. In this case, we should not
+* re-enable preemption.
 */
-   local_irq_restore(flags);
return;
}
}
 end:
preempt_enable_no_resched();
-   local_irq_restore(flags);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
 
diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 60ba7f1370a8..8237884ca389 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -115,14 +115,10 @@ static unsigned long can_optimize(struct kprobe *p)
 static void optimized_callback(struct optimized_kprobe *op,
   struct pt_regs *regs)
 {
-   unsigned long flags;
-
/* This is possible if op is under delayed unoptimizing */
if (kprobe_disabled(>kp))
return;
 
-   local_irq_save(flags);
-   hard_irq_disable();
preempt_disable();
 
if (kprobe_running()) {
@@ -135,13 +131,7 @@ static void optimized_callback(struct optimized_kprobe *op,
__this_cpu_write(current_kprobe, NULL);
}
 
-   /*
-* No need for an explicit __hard_irq_enable() here.
-* local_irq_restore() will re-enable interrupts,
-* if they were hard disabled.
-*/
preempt_enable_no_resched();
-   local_irq_restore(flags);
 }
 NOKPROBE_SYMBOL(optimized_callback);
 
-- 
2.14.2



[PATCH 1/4] powerpc/kprobes: Disable preemption before invoking probe handler for optprobes

2017-10-23 Thread Naveen N. Rao
Per Documentation/kprobes.txt, probe handlers need to be invoked with
preemption disabled. Update optimized_callback() to do so. Also move
get_kprobe_ctlblk() invocation post preemption disable, since it
accesses pre-cpu data.

This was not an issue so far since optprobes wasn't selected if
CONFIG_PREEMPT was enabled. Commit a30b85df7d599f ("kprobes: Use
synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y") changes
this.

Signed-off-by: Naveen N. Rao 
---
 arch/powerpc/kernel/optprobes.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c
index 91e037ab20a1..60ba7f1370a8 100644
--- a/arch/powerpc/kernel/optprobes.c
+++ b/arch/powerpc/kernel/optprobes.c
@@ -115,7 +115,6 @@ static unsigned long can_optimize(struct kprobe *p)
 static void optimized_callback(struct optimized_kprobe *op,
   struct pt_regs *regs)
 {
-   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
unsigned long flags;
 
/* This is possible if op is under delayed unoptimizing */
@@ -124,13 +123,14 @@ static void optimized_callback(struct optimized_kprobe 
*op,
 
local_irq_save(flags);
hard_irq_disable();
+   preempt_disable();
 
if (kprobe_running()) {
kprobes_inc_nmissed_count(>kp);
} else {
__this_cpu_write(current_kprobe, >kp);
regs->nip = (unsigned long)op->kp.addr;
-   kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+   get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE;
opt_pre_handler(>kp, regs);
__this_cpu_write(current_kprobe, NULL);
}
@@ -140,6 +140,7 @@ static void optimized_callback(struct optimized_kprobe *op,
 * local_irq_restore() will re-enable interrupts,
 * if they were hard disabled.
 */
+   preempt_enable_no_resched();
local_irq_restore(flags);
 }
 NOKPROBE_SYMBOL(optimized_callback);
-- 
2.14.2



Re: [PATCH 3/7] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-10-23 Thread Ram Pai
On Mon, Oct 23, 2017 at 02:17:39PM +0530, Aneesh Kumar K.V wrote:
> Michael Ellerman  writes:
> 
> > Ram Pai  writes:
> >
> >> diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
> >> index 1a68cb1..c6c5559 100644
> >> --- a/arch/powerpc/mm/hash64_64k.c
> >> +++ b/arch/powerpc/mm/hash64_64k.c
> >> @@ -126,18 +113,13 @@ int __hash_page_4K(unsigned long ea, unsigned long 
> >> access, unsigned long vsid,
> >>if (__rpte_sub_valid(rpte, subpg_index)) {
> >>int ret;
> >>  
> >> -  hash = hpt_hash(vpn, shift, ssize);
> >> -  hidx = __rpte_to_hidx(rpte, subpg_index);
> >> -  if (hidx & _PTEIDX_SECONDARY)
> >> -  hash = ~hash;
> >> -  slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
> >> -  slot += hidx & _PTEIDX_GROUP_IX;
> >> +  gslot = pte_get_hash_gslot(vpn, shift, ssize, rpte,
> >> +  subpg_index);
> >> +  ret = mmu_hash_ops.hpte_updatepp(gslot, rflags, vpn,
> >> +  MMU_PAGE_4K, MMU_PAGE_4K, ssize, flags);
> >
> > This was formatted correctly before:
> >   
> >> -  ret = mmu_hash_ops.hpte_updatepp(slot, rflags, vpn,
> >> -   MMU_PAGE_4K, MMU_PAGE_4K,
> >> -   ssize, flags);
> >>/*
> >> -   *if we failed because typically the HPTE wasn't really here
> >> +   * if we failed because typically the HPTE wasn't really here
> >
> > If you're fixing it up please make it "If ...".
> >
> >> * we try an insertion.
> >> */
> >>if (ret == -1)
> >> @@ -148,6 +130,15 @@ int __hash_page_4K(unsigned long ea, unsigned long 
> >> access, unsigned long vsid,
> >>}
> >>  
> >>  htab_insert_hpte:
> >> +
> >> +  /*
> >> +   * initialize all hidx entries to invalid value,
> >> +   * the first time the PTE is about to allocate
> >> +   * a 4K hpte
> >> +   */
> >
> > Should be:
> > /*
> >  * Initialize all hidx entries to invalid value, the first time
> >  * the PTE is about to allocate a 4K HPTE.
> >  */
> >
> >> +  if (!(old_pte & H_PAGE_COMBO))
> >> +  rpte.hidx = ~0x0UL;
> >> +
> >
> > Paul had the idea that if we biased the slot number by 1, we could make
> > the "invalid" value be == 0.
> >
> > That would avoid needing to that above, and also mean the value is
> > correctly invalid from the get-go, which would be good IMO.
> >
> > I think now that you've added the slot accessors it would be pretty easy
> > to do.
> 
> That would be imply, we loose one slot in primary group, which means we
> will do extra work in some case because our primary now has only 7
> slots. And in case of pseries, the hypervisor will always return the
> least available slot, which implie we will do extra hcalls in case of an
> hpte insert to an empty group?


No. that is not the idea.  The idea is that slot 'F' in the seconday
will continue to be a invalid slot, but will be represented as
offset-by-one in the PTE.  In other words, 0 will be repesented as 1,
1 as 2   and  n as (n+1)%32

The idea seems feasible.  It has the advantage -- where 0 in the PTE
means invalid slot. But it can be confusing to the casual code-
reader. Will need to put in a big-huge comment to explain that.

RP



[PATCH V3 2/2] pseries/initnodes: Ensure nodes initialized for hotplug

2017-10-23 Thread Michael Bringmann
To: linuxppc-dev@lists.ozlabs.org
To: linux-ker...@vger.kernel.org
Cc: Michael Ellerman 
Cc: Michael Bringmann 
Cc: John Allen 
Cc: Nathan Fontenot 
Subject: [PATCH V3 2/2] pseries/initnodes: Ensure nodes initialized for hotplug

pseries/nodes: On pseries systems which allow 'hot-add' of CPU,
it may occur that the new resources are to be inserted into nodes
that were not used for memory resources at bootup.  Many different
configurations of PowerPC resources may need to be supported depending
upon the environment.  This patch fixes some problems encountered at
runtime with configurations that support memory-less nodes, or that
hot-add resources during system execution after boot.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/numa.c |   35 +--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f885ab7..37f697dc1 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
nid = of_node_to_nid_single(cpu);
 
 out_present:
-   if (nid < 0 || !node_online(nid))
+   if (nid < 0 || !node_possible(nid))
nid = first_online_node;
 
map_cpu_to_node(lcpu, nid);
@@ -1311,6 +1311,35 @@ static long vphn_get_associativity(unsigned long cpu,
return rc;
 }
 
+static int verify_node_preparation(int nid)
+{
+   /*
+* Need to allocate/initialize NODE_DATA from a node with
+* memory (see memblock_alloc_try_nid).  Code executed after
+* boot (like local_memory_node) often does not know enough
+* to recover fully for memoryless nodes. 
+*/
+   if (NODE_DATA(nid) == NULL) {
+   int ret;
+   printk(KERN_INFO "%s:%d node %d (%d)\n", __FUNCTION__, 
__LINE__, nid, first_online_node);
+   setup_node_data(nid, 0, 0);
+   printk(KERN_INFO "%s:%d node %d (%d)\n", __FUNCTION__, 
__LINE__, nid, first_online_node);
+   ret = try_online_node(nid);
+   printk(KERN_INFO "%s:%d node %d (%d) %d %p\n", __FUNCTION__, 
__LINE__, nid, first_online_node, ret, NODE_DATA(nid));
+
+   /*
+* Node structures successfully initialized, but
+* we still need the node to be offline.
+*/
+   node_set_offline(nid);
+   }
+
+   if (NODE_DATA(nid)->node_spanned_pages == 0)
+   return first_online_node;
+
+   return nid;
+}
+
 /*
  * Update the CPU maps and sysfs entries for a single CPU when its NUMA
  * characteristics change. This function doesn't perform any locking and is
@@ -1419,9 +1448,11 @@ int numa_update_cpu_topology(bool cpus_locked)
/* Use associativity from first thread for all siblings */
vphn_get_associativity(cpu, associativity);
new_nid = associativity_to_nid(associativity);
-   if (new_nid < 0 || !node_online(new_nid))
+   if (new_nid < 0 || !node_possible(new_nid))
new_nid = first_online_node;
 
+   new_nid = verify_node_preparation(new_nid);
+
if (new_nid == numa_cpu_lookup_table[cpu]) {
cpumask_andnot(_associativity_changes_mask,
_associativity_changes_mask,



[PATCH V3 1/2] pseries/nodes: Ensure enough nodes avail for operations

2017-10-23 Thread Michael Bringmann
pseries/nodes: On pseries systems which allow 'hot-add' of CPU or
memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized.
This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot.

This patch extracts the value of the lowest domain level (number
of allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the property is not present at boot, no operation will be performed
to define or enable additional nodes.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/numa.c |   39 ---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ec098b3..f885ab7 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,36 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init find_possible_nodes(void)
+{
+   struct device_node *rtas;
+   u32 numnodes, i;
+
+   if (min_common_depth <= 0)
+   return;
+
+   rtas = of_find_node_by_path("/rtas");
+   if (!rtas)
+   return;
+
+   if (of_property_read_u32_index(rtas,
+   "ibm,max-associativity-domains",
+   min_common_depth, ))
+   goto out;
+
+   pr_info("numa: Nodes = %d (mcd = %d)\n", numnodes,
+   min_common_depth);
+
+   for (i = 0; i < numnodes; i++) {
+   if (!node_possible(i))
+   node_set(i, node_possible_map);
+   }
+
+out:
+   if (rtas)
+   of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
int nid, cpu;
@@ -905,12 +935,15 @@ void __init initmem_init(void)
memblock_dump_all();
 
/*
-* Reduce the possible NUMA nodes to the online NUMA nodes,
-* since we do not support node hotplug. This ensures that  we
-* lower the maximum NUMA node ID to what is actually present.
+* Modify the set of possible NUMA nodes to reflect information
+* available about the set of online nodes, and the set of nodes
+* that we expect to make use of for this platform's affinity
+* calculations.
 */
nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+   find_possible_nodes();
+
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn;
 



[PATCH V3 0/2] pseries/nodes: Fix issues with memoryless nodes

2017-10-23 Thread Michael Bringmann
pseries/nodes: Ensure enough nodes avail for operations

pseries/initnodes: Ensure nodes initialized for hotplug

Signed-off-by: Michael Bringmann 

Michael Bringmann (2):
  pseries/nodes: Ensure enough nodes avail for operations
  pseries/initnodes: Ensure nodes initialized for hotplug



Re: [PATCH 3/4] char/tpm: Improve a size determination in nine functions

2017-10-23 Thread Dan Carpenter
On Thu, Oct 19, 2017 at 04:58:23PM +, alexander.stef...@infineon.com wrote:
> > On Tue, Oct 17, 2017 at 11:50:05AM +, alexander.stef...@infineon.com
> > wrote:
> > > > > Replace the specification of data structures by pointer dereferences
> > > > > as the parameter for the operator "sizeof" to make the corresponding
> > > > > size
> > > > > determination a bit safer according to the Linux coding style
> > > > > convention.
> > > >
> > > >
> > > > This patch does one style in favor of the other.
> > >
> > > I actually prefer that style, so I'd welcome this change :)
> > >
> > > > At the end it's Jarkko's call, though I would NAK this as I think some
> > > > one already told this to you for some other similar patch(es).
> > > >
> > > >
> > > > I even would suggest to stop doing this noisy stuff, which keeps people
> > > > busy for nothing.
> > >
> > > Cleaning up old code is also worth something, even if does not change
> > > one bit in the assembly output in the end...
> > >
> > > Alexander
> > 
> > Quite insignificant clean up it is that does more harm that gives any
> > benefit as any new change adds debt to backporting.
> > 
> > Anyway, this has been a useful patch set for me in the sense that I have
> > clearer picture now on discarding/accepting commits.
> 
> Indeed. I have now a better understanding for why some code looks as ugly as 
> it does.
> 

These patches aren't a part of a sensible attempt to clean up the code.

The first two patches are easy to review and have a clear benefit so
that's fine any time.

Patch 3 updates the code to a new style guideline but it doesn't really
improve readability that much.  Those sorts of patches are hard to
review because you have to verify that the object code doesn't change.
Plus it messes up git blame.  It's good for new code and staging, but
for old code, it's probably only worth it if there are other patches
which make worth the price.  You're basically applying it to make the
patch sender happy.

With patch 4, the tpm_ibmvtpm_probe() function was buggy and it's still
buggy after the patch is applied.  It's a waste of time re-arranging the
code if you aren't going to fix the bugs.

regards,
dan carpenter


Re: [PATCH 1/4] powerpc/tm: Add commandline option to disable hardware transactional memory

2017-10-23 Thread Breno Leitao
On Sat, Oct 21, 2017 at 11:58:47AM +1100, Michael Neuling wrote:
> On Fri, 2017-10-20 at 09:45 -0200, Breno Leitao wrote:
> > Mikey, Cyril,
> > 
> > On Thu, Oct 12, 2017 at 09:17:16PM +1100, Michael Ellerman wrote:
> > > From: Cyril Bur 
> > > 
> > > Currently the kernel relies on firmware to inform it whether or not the
> > > CPU supports HTM and as long as the kernel was built with
> > > CONFIG_PPC_TRANSACTIONAL_MEM=y then it will allow userspace to make
> > > use of the facility.
> > > 
> > > There may be situations where it would be advantageous for the kernel
> > > to not allow userspace to use HTM, currently the only way to achieve
> > > this is to recompile the kernel with CONFIG_PPC_TRANSACTIONAL_MEM=n.
> > > 
> > > This patch adds a simple commandline option so that HTM can be
> > > disabled at boot time.
> > > 
> > > Signed-off-by: Cyril Bur 
> > > [mpe: Simplify to a bool, move to prom.c, put doco in the right place.
> > >  Always disable, regardless of initial state, to avoid user confusion.]
> > > Signed-off-by: Michael Ellerman 
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt |  4 
> > >  arch/powerpc/kernel/prom.c  | 31
> > > +
> > >  2 files changed, 35 insertions(+)
> > > 
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > > b/Documentation/admin-guide/kernel-parameters.txt
> > > index 05496622b4ef..ef03e6e16bdb 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -3185,6 +3185,10 @@
> > >   allowed (eg
> > > kernel_enable_fpu()/kernel_disable_fpu()).
> > >   There is some performance impact when enabling
> > > this.
> > >  
> > > + ppc_tm= [PPC]
> > > + Format: {"off"}
> > > + Disable Hardware Transactional Memory
> > > +
> > >   print-fatal-signals=
> > >   [KNL] debug: print fatal signals
> > >  
> > > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > > index f83056297441..d9bd6555f980 100644
> > > --- a/arch/powerpc/kernel/prom.c
> > > +++ b/arch/powerpc/kernel/prom.c
> > > @@ -658,6 +658,35 @@ static void __init early_reserve_mem(void)
> > >  #endif
> > >  }
> > >  
> > > +#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
> > > +static bool tm_disabled __initdata;
> > 
> > I think the name 'tm_disabled' might cause more confusion on the TM
> > code. Mainly because we already have tm_enable() and tm_enabled()
> > functions which are related to the MSR register and TM bit, and, with
> > your new variable, tm_enabled() and tm_disabled are not going to be
> > exclusionary. Neither tm_enable() with be able to toggle the tm_disabled
> > value.
> 
> Got a proposal for better names?

That is the hardest part, but I thought about something as:

 * tm_disabled_on_boot
 * tm_off
 * tm_explicit_disabled
 * tm_feature_disabled
 * tm_enablement_disabled

I think these names, although a bit longer, might avoid the confusion
with tm_enable/tm_enabled nomenclature.



Re: [PATCH 03/25] powerpc: track allocation status of all pkeys

2017-10-23 Thread Aneesh Kumar K.V
Ram Pai  writes:

> Total 32 keys are available on power7 and above. However
> pkey 0,1 are reserved. So effectively we  have  30 pkeys.

When you say reserved, reserved by whom? Is that part of ISA or PAPR ?
Also do you expect that to change. If not why all these indirection?
Can we have the mask as a #define for 4K and 64K page size
config?

>
> On 4K kernels, we do not  have  5  bits  in  the  PTE to
> represent  all the keys; we only have 3bits.Two of those
> keys are reserved; pkey 0 and pkey 1. So effectively  we
> have 6 pkeys.
>
> This patch keeps track of reserved keys, allocated  keys
> and keys that are currently free.
>
> Also it  adds  skeletal  functions  and macros, that the
> architecture-independent code expects to be available.


-aneesh



Re: [PATCH 02/25] powerpc: define an additional vma bit for protection keys.

2017-10-23 Thread Aneesh Kumar K.V
Ram Pai  writes:

> powerpc needs an additional vma bit to support 32 keys.
> Till the additional vma bit lands in include/linux/mm.h
> we have to define  it  in powerpc specific header file.
> This is  needed to get pkeys working on power.
>
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/pkeys.h |   18 ++
>  1 files changed, 18 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index c02305a..44e01a2 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -3,6 +3,24 @@
>
>  extern bool pkey_inited;
>  extern bool pkey_execute_disable_support;
> +
> +/*
> + * powerpc needs an additional vma bit to support 32 keys.
> + * Till the additional vma bit lands in include/linux/mm.h
> + * we have to carry the hunk below. This is  needed to get
> + * pkeys working on power. -- Ram
> + */
> +#ifndef VM_HIGH_ARCH_BIT_4
> +#define VM_HIGH_ARCH_BIT_4   36
> +#define VM_HIGH_ARCH_4   BIT(VM_HIGH_ARCH_BIT_4)
> +#define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
> +#define VM_PKEY_BIT0 VM_HIGH_ARCH_0
> +#define VM_PKEY_BIT1 VM_HIGH_ARCH_1
> +#define VM_PKEY_BIT2 VM_HIGH_ARCH_2
> +#define VM_PKEY_BIT3 VM_HIGH_ARCH_3
> +#define VM_PKEY_BIT4 VM_HIGH_ARCH_4
> +#endif
> +
>  #define ARCH_VM_PKEY_FLAGS 0

Do we want them in pkeys.h ? Even if they are arch specific for the
existing ones we have them in include/linux/mm.h. IIUC, vmflags details
are always in mm.h? This will be the first exception to that?


>
>  static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
> -- 
> 1.7.1



Re: [PATCH 3/7] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-10-23 Thread Aneesh Kumar K.V
Michael Ellerman  writes:

> Ram Pai  writes:
>
>> diff --git a/arch/powerpc/mm/hash64_64k.c b/arch/powerpc/mm/hash64_64k.c
>> index 1a68cb1..c6c5559 100644
>> --- a/arch/powerpc/mm/hash64_64k.c
>> +++ b/arch/powerpc/mm/hash64_64k.c
>> @@ -126,18 +113,13 @@ int __hash_page_4K(unsigned long ea, unsigned long 
>> access, unsigned long vsid,
>>  if (__rpte_sub_valid(rpte, subpg_index)) {
>>  int ret;
>>  
>> -hash = hpt_hash(vpn, shift, ssize);
>> -hidx = __rpte_to_hidx(rpte, subpg_index);
>> -if (hidx & _PTEIDX_SECONDARY)
>> -hash = ~hash;
>> -slot = (hash & htab_hash_mask) * HPTES_PER_GROUP;
>> -slot += hidx & _PTEIDX_GROUP_IX;
>> +gslot = pte_get_hash_gslot(vpn, shift, ssize, rpte,
>> +subpg_index);
>> +ret = mmu_hash_ops.hpte_updatepp(gslot, rflags, vpn,
>> +MMU_PAGE_4K, MMU_PAGE_4K, ssize, flags);
>
> This was formatted correctly before:
>   
>> -ret = mmu_hash_ops.hpte_updatepp(slot, rflags, vpn,
>> - MMU_PAGE_4K, MMU_PAGE_4K,
>> - ssize, flags);
>>  /*
>> - *if we failed because typically the HPTE wasn't really here
>> + * if we failed because typically the HPTE wasn't really here
>
> If you're fixing it up please make it "If ...".
>
>>   * we try an insertion.
>>   */
>>  if (ret == -1)
>> @@ -148,6 +130,15 @@ int __hash_page_4K(unsigned long ea, unsigned long 
>> access, unsigned long vsid,
>>  }
>>  
>>  htab_insert_hpte:
>> +
>> +/*
>> + * initialize all hidx entries to invalid value,
>> + * the first time the PTE is about to allocate
>> + * a 4K hpte
>> + */
>
> Should be:
>   /*
>* Initialize all hidx entries to invalid value, the first time
>  * the PTE is about to allocate a 4K HPTE.
>*/
>
>> +if (!(old_pte & H_PAGE_COMBO))
>> +rpte.hidx = ~0x0UL;
>> +
>
> Paul had the idea that if we biased the slot number by 1, we could make
> the "invalid" value be == 0.
>
> That would avoid needing to that above, and also mean the value is
> correctly invalid from the get-go, which would be good IMO.
>
> I think now that you've added the slot accessors it would be pretty easy
> to do.

That would be imply, we loose one slot in primary group, which means we
will do extra work in some case because our primary now has only 7
slots. And in case of pseries, the hypervisor will always return the
least available slot, which implie we will do extra hcalls in case of an
hpte insert to an empty group?

-aneesh



Re: [PATCH 06/25] powerpc: cleaup AMR, iAMR when a key is allocated or freed

2017-10-23 Thread Aneesh Kumar K.V
Balbir Singh  writes:

> On Fri,  8 Sep 2017 15:44:54 -0700
> Ram Pai  wrote:
>
>> cleanup the bits corresponding to a key in the AMR, and IAMR
>> register, when the key is newly allocated/activated or is freed.
>> We dont want some residual bits cause the hardware enforce
>> unintended behavior when the key is activated or freed.
>> 
>> Signed-off-by: Ram Pai 
>> ---
>>  arch/powerpc/include/asm/pkeys.h |   12 
>>  1 files changed, 12 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/pkeys.h 
>> b/arch/powerpc/include/asm/pkeys.h
>> index 5a83ed7..53bf13b 100644
>> --- a/arch/powerpc/include/asm/pkeys.h
>> +++ b/arch/powerpc/include/asm/pkeys.h
>> @@ -54,6 +54,8 @@ static inline bool mm_pkey_is_allocated(struct mm_struct 
>> *mm, int pkey)
>>  mm_set_pkey_is_allocated(mm, pkey));
>>  }
>>  
>> +extern void __arch_activate_pkey(int pkey);
>> +extern void __arch_deactivate_pkey(int pkey);
>>  /*
>>   * Returns a positive, 5-bit key on success, or -1 on failure.
>>   */
>> @@ -80,6 +82,12 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
>>  
>>  ret = ffz((u32)mm_pkey_allocation_map(mm));
>>  mm_set_pkey_allocated(mm, ret);
>> +
>> +/*
>> + * enable the key in the hardware
>> + */
>> +if (ret > 0)
>> +__arch_activate_pkey(ret);
>>  return ret;
>>  }
>>  
>> @@ -91,6 +99,10 @@ static inline int mm_pkey_free(struct mm_struct *mm, int 
>> pkey)
>>  if (!mm_pkey_is_allocated(mm, pkey))
>>  return -EINVAL;
>>  
>> +/*
>> + * Disable the key in the hardware
>> + */
>> +__arch_deactivate_pkey(pkey);
>>  mm_set_pkey_free(mm, pkey);
>>  
>>  return 0;
>
> I think some of these patches can be merged, too much fine granularity
> is hurting my ability to see the larger function/implementation.


Completely agree

-aneesh



Re: [PATCH 06/25] powerpc: cleaup AMR, iAMR when a key is allocated or freed

2017-10-23 Thread Aneesh Kumar K.V
Ram Pai  writes:

> cleanup the bits corresponding to a key in the AMR, and IAMR
> register, when the key is newly allocated/activated or is freed.
> We dont want some residual bits cause the hardware enforce
> unintended behavior when the key is activated or freed.
>
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/pkeys.h |   12 
>  1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 5a83ed7..53bf13b 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -54,6 +54,8 @@ static inline bool mm_pkey_is_allocated(struct mm_struct 
> *mm, int pkey)
>   mm_set_pkey_is_allocated(mm, pkey));
>  }
>
> +extern void __arch_activate_pkey(int pkey);
> +extern void __arch_deactivate_pkey(int pkey);
>  /*
>   * Returns a positive, 5-bit key on success, or -1 on failure.
>   */
> @@ -80,6 +82,12 @@ static inline int mm_pkey_alloc(struct mm_struct *mm)
>
>   ret = ffz((u32)mm_pkey_allocation_map(mm));
>   mm_set_pkey_allocated(mm, ret);
> +
> + /*
> +  * enable the key in the hardware
> +  */
> + if (ret > 0)
> + __arch_activate_pkey(ret);
>   return ret;
>  }

We are already arch specific because we are defining them in
arch/powerpc/include/asm/, then why __arch_activate_pkey() ? 

>
> @@ -91,6 +99,10 @@ static inline int mm_pkey_free(struct mm_struct *mm, int 
> pkey)
>   if (!mm_pkey_is_allocated(mm, pkey))
>   return -EINVAL;
>
> + /*
> +  * Disable the key in the hardware
> +  */
> + __arch_deactivate_pkey(pkey);
>   mm_set_pkey_free(mm, pkey);
>
>   return 0;
> -- 
> 1.7.1



Re: [PATCH 02/25] powerpc: define an additional vma bit for protection keys.

2017-10-23 Thread Aneesh Kumar K.V
"Aneesh Kumar K.V"  writes:

> Ram Pai  writes:
>
>> powerpc needs an additional vma bit to support 32 keys.
>> Till the additional vma bit lands in include/linux/mm.h
>> we have to define  it  in powerpc specific header file.
>> This is  needed to get pkeys working on power.
>>
>> Signed-off-by: Ram Pai 
>> ---
>>  arch/powerpc/include/asm/pkeys.h |   18 ++
>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/pkeys.h 
>> b/arch/powerpc/include/asm/pkeys.h
>> index c02305a..44e01a2 100644
>> --- a/arch/powerpc/include/asm/pkeys.h
>> +++ b/arch/powerpc/include/asm/pkeys.h
>> @@ -3,6 +3,24 @@
>>
>>  extern bool pkey_inited;
>>  extern bool pkey_execute_disable_support;
>> +
>> +/*
>> + * powerpc needs an additional vma bit to support 32 keys.
>> + * Till the additional vma bit lands in include/linux/mm.h
>> + * we have to carry the hunk below. This is  needed to get
>> + * pkeys working on power. -- Ram
>> + */
>> +#ifndef VM_HIGH_ARCH_BIT_4
>> +#define VM_HIGH_ARCH_BIT_4  36
>> +#define VM_HIGH_ARCH_4  BIT(VM_HIGH_ARCH_BIT_4)
>> +#define VM_PKEY_SHIFT VM_HIGH_ARCH_BIT_0
>> +#define VM_PKEY_BIT0VM_HIGH_ARCH_0
>> +#define VM_PKEY_BIT1VM_HIGH_ARCH_1
>> +#define VM_PKEY_BIT2VM_HIGH_ARCH_2
>> +#define VM_PKEY_BIT3VM_HIGH_ARCH_3
>> +#define VM_PKEY_BIT4VM_HIGH_ARCH_4
>> +#endif
>> +
>>  #define ARCH_VM_PKEY_FLAGS 0
>
> Do we want them in pkeys.h ? Even if they are arch specific for the
> existing ones we have them in include/linux/mm.h. IIUC, vmflags details
> are always in mm.h? This will be the first exception to that?


Also can we move that 

#if defined (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS)
# define VM_PKEY_SHIFT  VM_HIGH_ARCH_BIT_0
# define VM_PKEY_BIT0   VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
# define VM_PKEY_BIT1   VM_HIGH_ARCH_1
# define VM_PKEY_BIT2   VM_HIGH_ARCH_2
# define VM_PKEY_BIT3   VM_HIGH_ARCH_3
#endif

to

#if defined (CONFIG_ARCH_HAS_PKEYS)
# define VM_PKEY_SHIFT  VM_HIGH_ARCH_BIT_0
# define VM_PKEY_BIT0   VM_HIGH_ARCH_0  /* A protection key is a 4-bit value */
# define VM_PKEY_BIT1   VM_HIGH_ARCH_1
# define VM_PKEY_BIT2   VM_HIGH_ARCH_2
# define VM_PKEY_BIT3   VM_HIGH_ARCH_3
#endif


And then later update the generic to handle PKEY_BIT4?

-aneesh



Re: [PATCH 4/7] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages

2017-10-23 Thread Aneesh Kumar K.V
Benjamin Herrenschmidt  writes:

> On Fri, 2017-09-08 at 15:44 -0700, Ram Pai wrote:
>> The second part of the PTE will hold
>> (H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
>> NOTE: None of the bits in the secondary PTE were not used
>> by 64k-HPTE backed PTE.
>
> Have you measured the performance impact of this ? The second part of
> the PTE being in a different cache line there could be one...
>

I am also looking at a patch series removing the slot tracking
completely. With randomize address turned off and no swap in guest/host
and making sure we touched most of guest ram, I don't find much impact
in performance when we don't track the slot at all. I will post the
patch series with numbers in a day or two. But my test was

while (5000) {
  mmap(128M)
  touch every page of 2048 pages
  munmap()
}

I could also be the best case in my run because i might have always
found the hash pte slot in the primary. In one measurement with swap on
and address randmization enabled, i did find a 50% impact. But then i
was not able to recreate that again. So could be something i did wrong
in the test setup.

Ram,

Will you be able to get a test run with the above loop?

-aneesh



Re: [PATCH 1/4] powerpc/tm: Add commandline option to disable hardware transactional memory

2017-10-23 Thread Michael Neuling
On Mon, 2017-10-23 at 09:01 +, David Laight wrote:
> From: Michael Neuling
> > Sent: 21 October 2017 02:00
> > To: David Laight; 'Breno Leitao'; Michael Ellerman
> > Cc: stew...@linux.vnet.ibm.com; linuxppc-...@ozlabs.org; cyril...@gmail.com
> > Subject: Re: [PATCH 1/4] powerpc/tm: Add commandline option to disable
> > hardware transactional memory
> > 
> > On Fri, 2017-10-20 at 12:58 +, David Laight wrote:
> > > > > This patch adds a simple commandline option so that HTM can be
> > > > > disabled at boot time.
> > > 
> > > ISTM that being able to disable it after boot would be more useful.
> > > (ie in a startup script)
> > 
> > I agree bug unfortunately that's impossible.
> > 
> > If a process is already running in tm suspend, there is no way to stop it
> > other
> > than killing the process.  At that point you may as well kexec with a new
> > cmdline option
> 
> Isn't that unlikely when the first rc scripts are being run?
> Setting an early rc script is generally easier than adding a command line
> parameter.

Unlikely or not, it's a case we'd have to handle that would add significant
complexity compared to what's being proposed for (IMHO) little gain.

Also, this doesn't exclude that option later if we decided we could do it.

> I don't know about ppc, but grub on x86 makes it painfully almost
> impossible to have two boot entries that have different command line
> options.

Ok.

Mikey


RE: [PATCH 1/4] powerpc/tm: Add commandline option to disable hardware transactional memory

2017-10-23 Thread David Laight
From: Michael Neuling
> Sent: 21 October 2017 02:00
> To: David Laight; 'Breno Leitao'; Michael Ellerman
> Cc: stew...@linux.vnet.ibm.com; linuxppc-...@ozlabs.org; cyril...@gmail.com
> Subject: Re: [PATCH 1/4] powerpc/tm: Add commandline option to disable 
> hardware transactional memory
> 
> On Fri, 2017-10-20 at 12:58 +, David Laight wrote:
> > > > This patch adds a simple commandline option so that HTM can be
> > > > disabled at boot time.
> >
> > ISTM that being able to disable it after boot would be more useful.
> > (ie in a startup script)
> 
> I agree bug unfortunately that's impossible.
> 
> If a process is already running in tm suspend, there is no way to stop it 
> other
> than killing the process.  At that point you may as well kexec with a new
> cmdline option

Isn't that unlikely when the first rc scripts are being run?
Setting an early rc script is generally easier than adding a command line
parameter.

I don't know about ppc, but grub on x86 makes it painfully almost
impossible to have two boot entries that have different command line
options.

David



Re: [PATCHv3 6/7] symbol lookup: use new kernel and module dereference functions

2017-10-23 Thread Sergey Senozhatsky
On (10/20/17 15:08), Petr Mladek wrote:
> On Thu 2017-10-19 15:42:35, Sergey Senozhatsky wrote:
> > Sorry for the delay and thanks for taking a look.
> > 
> > I'll try to re-spin the patch set by the end of this week/early next
> > week.
> > 
> > 
> > On (10/04/17 13:53), Petr Mladek wrote:
> > [..]
> > > Note that kallsyms_lookup() and module_address_lookup() is used
> > > in many other situations.
> > 
> > we dereference only things that can be dereferenced.
> > so calling it on already dereferenced address, or address
> > that does need to be dereferenced is OK.
> 
> My concern is that it changes the behavior. It will suddenly return
> another information for addresses that were not dereference before.

OK. I'd be really-really surprised to find out that anyone did
kallsyms_lookup()/module_address_lookup() on func descriptors,
but I understand your concerns. I'll try to keep everything
within vsprintf().

-ss


Re: [PATCH v2] kernel/module_64.c: Add REL24 relocation support of livepatch symbols

2017-10-23 Thread Kamalesh Babulal

On Saturday 21 October 2017 06:29 AM, Balbir Singh wrote:

On Fri, 2017-10-20 at 14:07 +0200, Torsten Duwe wrote:

On Wed, Oct 18, 2017 at 11:47:35AM +0530, Kamalesh Babulal wrote:


Consider a trivial patch, supplied to kpatch tool for generating a
livepatch module:

--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -132,7 +132,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
seq_printf(m, "VmallocTotal:   %8lu kB\n",
   (unsigned long)VMALLOC_TOTAL >> 10);
show_val_kb(m, "VmallocUsed:", 0ul);
-   show_val_kb(m, "VmallocChunk:   ", 0ul);
+   show_val_kb(m, "VMALLOCCHUNK:   ", 0ul);



Am I assuming correctly that "kpatch tool" simply recompiles all code the
way it would get compiled in a regular kernel build?


kpatch is open source and is available on github. This patch is specific
to the way kpatch works


My understanding is
that live patching modules need to be carefully prepared, which involves
source code reorganisation and recompilation. In that process, you can
easily declare show_val_kb() extern, and get the suitable instruction sequence
for the call.


Yes, we agree. For the current versions of kpatch, which involve a process of
applying the patch and building the kernel without and with the patch and doing
an elf diff (programatically), we cannot deviate from that process as it's
architecture independent. This patch solves arch specific issues related
to that process.


Yes, that's the essence of the kpatch tool on building livepatchable
kernel module, by doing an elf diff on the kernel with and without the
patch applied. show_val_kb() is a simple example, consider more complex
patch(s), if they need to be prepared manually as suggested. It beats
the whole purpose of a kpatch tool, which programmatically prepares a
livepatch module with close to no manual preparation required. It's
the architecture limitation, which is addressed in this patch.

This patch is outcome of long discussion at kpatch
https://github.com/dynup/kpatch/pull/650

--
cheers,
Kamalesh.



[PATCH kernel] powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn

2017-10-23 Thread Alexey Kardashevskiy
The pcidev value stored in pci_dn is only used for NPU/NPU2
initialization. We can easily drop the cached pointer and
use an ancient helper - pci_get_domain_bus_and_slot() instead in order
to reduce complexity.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/include/asm/pci-bridge.h | 2 --
 arch/powerpc/platforms/powernv/npu-dma.c  | 5 -
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 0b8aa1fe2d5f..752718a2949d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -197,8 +197,6 @@ struct pci_dn {
struct  iommu_table_group *table_group; /* for phb's or bridges */
 
int pci_ext_config_space;   /* for pci devices */
-
-   struct  pci_dev *pcidev;/* back-pointer to the pci device */
 #ifdef CONFIG_EEH
struct eeh_dev *edev;   /* eeh device */
 #endif
diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index 2cb6cbea4b3b..b5844a950852 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -39,7 +39,10 @@
  */
 static struct pci_dev *get_pci_dev(struct device_node *dn)
 {
-   return PCI_DN(dn)->pcidev;
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   return pci_get_domain_bus_and_slot(pci_domain_nr(pdn->phb->bus),
+  pdn->busno, pdn->devfn);
 }
 
 /* Given a NPU device get the associated PCI device. */
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index bc60faa9f6c2..269f119e4b3c 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1047,7 +1047,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct 
pci_dev *dev)
 * At some point we want to remove the PDN completely anyways
 */
pci_dev_get(dev);
-   pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
pe->flags = PNV_IODA_PE_DEV;
pe->pdev = dev;
@@ -1094,7 +1093,6 @@ static void pnv_ioda_setup_same_PE(struct pci_bus *bus, 
struct pnv_ioda_pe *pe)
continue;
 
pe->device_count++;
-   pdn->pcidev = dev;
pdn->pe_number = pe->pe_number;
if ((pe->flags & PNV_IODA_PE_BUS_ALL) && dev->subordinate)
pnv_ioda_setup_same_PE(dev->subordinate, pe);
@@ -1209,7 +1207,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct 
pci_dev *npu_pdev)
pci_dev_get(npu_pdev);
npu_pdn = pci_get_pdn(npu_pdev);
rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
-   npu_pdn->pcidev = npu_pdev;
npu_pdn->pe_number = pe_num;
phb->ioda.pe_rmap[rid] = pe->pe_number;
 
-- 
2.11.0



[PATCH v2 3/3] powerpc/powernv: Avoid waiting for secondary hold spinloop with OPAL

2017-10-23 Thread Nicholas Piggin
OPAL boot does not insert secondaries at 0x60 to wait at the secondary
hold spinloop. Instead they are started later, and inserted at
generic_secondary_smp_init(), which is after the secondary hold
spinloop.

Avoid waiting on this spinloop when booting with OPAL firmware. This
wait always times out that case.

This saves 100ms boot time on powernv, and 10s of seconds of real time
when booting on the simulator in SMP.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/head_64.S  | 16 +++-
 arch/powerpc/kernel/setup_64.c | 10 +-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c9e760ec7530..0deef350004f 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -55,12 +55,18 @@
  *
  *  For pSeries or server processors:
  *   1. The MMU is off & open firmware is running in real mode.
- *   2. The kernel is entered at __start
+ *   2. The primary CPU enters at __start.
+ *   3. If the RTAS supports "query-cpu-stopped-state", then secondary
+ *  CPUs will enter as directed by "start-cpu" RTAS call, which is
+ *  generic_secondary_smp_init, with PIR in r3.
+ *   4. Else the secondary CPUs will enter at secondary_hold (0x60) as
+ *  directed by the "start-cpu" RTS call, with PIR in r3.
  * -or- For OPAL entry:
- *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
- *  with device-tree in gpr3. We also get OPAL base in r8 and
- * entry in r9 for debugging purposes
- *   2. Secondary processors enter at 0x60 with PIR in gpr3
+ *   1. The MMU is off, processor in HV mode.
+ *   2. The primary CPU enters at 0 with device-tree in r3, OPAL base
+ *  in r8, and entry in r9 for debugging purposes.
+ *   3. Secondary CPUs enter as directed by OPAL_START_CPU call, which
+ *  is at generic_secondary_smp_init, with PIR in r3.
  *
  *  For Book3E processors:
  *   1. The MMU is on running in AS0 in a state defined in ePAPR
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3f2453858f60..ecaab70e4b78 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -363,8 +363,16 @@ void early_setup_secondary(void)
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
 static bool use_spinloop(void)
 {
-   if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
+   if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
+   /*
+* See comments in head_64.S -- not all platforms insert
+* secondaries at __secondary_hold and wait at the spin
+* loop.
+*/
+   if (firmware_has_feature(FW_FEATURE_OPAL))
+   return false;
return true;
+   }
 
/*
 * When book3e boots from kexec, the ePAPR spin table does
-- 
2.13.3



[PATCH v2 2/3] powerpc: use NMI IPI for smp_send_stop

2017-10-23 Thread Nicholas Piggin
Use the NMI IPI rather than smp_call_function for smp_send_stop.
Have stopped CPUs hard disable interrupts rather than just soft
disable.

This function is used in crash/panic/shutdown paths to bring other
CPUs down as quickly and reliably as possible, and minimizing their
potential to cause trouble.

Avoiding the Linux smp_call_function infrastructure and (if supported)
using true NMI IPIs makes this more robust.

Also use spin loop primitives in the stop callback, mainly to help
processing speed of the active thread speed in the simulator.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/smp.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e0a4c1f82e25..ce891030d925 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -547,19 +547,20 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct 
pt_regs *))
 }
 #endif
 
-static void stop_this_cpu(void *dummy)
+static void __noreturn stop_this_cpu(struct pt_regs *regs)
 {
/* Remove this CPU */
set_cpu_online(smp_processor_id(), false);
 
-   local_irq_disable();
+   hard_irq_disable();
+   spin_begin();
while (1)
-   ;
+   spin_cpu_relax();
 }
 
 void smp_send_stop(void)
 {
-   smp_call_function(stop_this_cpu, NULL, 0);
+   smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, stop_this_cpu, 100);
 }
 
 struct thread_info *current_set[NR_CPUS];
-- 
2.13.3



[PATCH v2 1/3] powerpc/powernv: Always stop secondaries before reboot/shutdown

2017-10-23 Thread Nicholas Piggin
Currently powernv reboot and shutdown requests just leave secondaries
to do their own things. This is undesirable because they can trigger
any number of watchdogs while waiting for reboot, but also we don't
know what else they might be doing, or they might be stuck somewhere
causing trouble.

The opal scheduled flash update code already ran into watchdog problems
due to flashing taking a long time, but it's possible for regular
reboots to trigger problems too (this is with watchdog_thresh set to 1,
but I have seen it with watchdog_thresh at the default value once too):

  reboot: Restarting system
  [  360.038896709,5] OPAL: Reboot request...
  Watchdog CPU:0 Hard LOCKUP
  Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
  Watchdog CPU:16 Hard LOCKUP
  watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]

So remove the special case for flash update, and unconditionally do
smp_send_stop before rebooting.

Return the CPUs to Linux stop loops rather than OPAL. The reason for
this is that the path to firmware is longer, and the CPUs may have
been interrupted from firmware, which may cause problems to re-enter
it. It's better to put them into a simple spin loop to maximize the
chance of a successful reboot.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/opal.h |  2 +-
 arch/powerpc/platforms/powernv/opal-flash.c | 28 +---
 arch/powerpc/platforms/powernv/setup.c  | 15 +--
 3 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 04c32b08ffa1..ce58f4139ff5 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -317,7 +317,7 @@ struct rtc_time;
 extern unsigned long opal_get_boot_time(void);
 extern void opal_nvram_init(void);
 extern void opal_flash_update_init(void);
-extern void opal_flash_term_callback(void);
+extern void opal_flash_update_print_message(void);
 extern int opal_elog_init(void);
 extern void opal_platform_dump_init(void);
 extern void opal_sys_param_init(void);
diff --git a/arch/powerpc/platforms/powernv/opal-flash.c 
b/arch/powerpc/platforms/powernv/opal-flash.c
index 2fa3ac80cb4e..632871d78576 100644
--- a/arch/powerpc/platforms/powernv/opal-flash.c
+++ b/arch/powerpc/platforms/powernv/opal-flash.c
@@ -303,26 +303,9 @@ static int opal_flash_update(int op)
return rc;
 }
 
-/* Return CPUs to OPAL before starting FW update */
-static void flash_return_cpu(void *info)
-{
-   int cpu = smp_processor_id();
-
-   if (!cpu_online(cpu))
-   return;
-
-   /* Disable IRQ */
-   hard_irq_disable();
-
-   /* Return the CPU to OPAL */
-   opal_return_cpu();
-}
-
 /* This gets called just before system reboots */
-void opal_flash_term_callback(void)
+void opal_flash_update_print_message(void)
 {
-   struct cpumask mask;
-
if (update_flash_data.status != FLASH_IMG_READY)
return;
 
@@ -333,15 +316,6 @@ void opal_flash_term_callback(void)
 
/* Small delay to help getting the above message out */
msleep(500);
-
-   /* Return secondary CPUs to firmware */
-   cpumask_copy(, cpu_online_mask);
-   cpumask_clear_cpu(smp_processor_id(), );
-   if (!cpumask_empty())
-   smp_call_function_many(,
-  flash_return_cpu, NULL, false);
-   /* Hard disable interrupts */
-   hard_irq_disable();
 }
 
 /*
diff --git a/arch/powerpc/platforms/powernv/setup.c 
b/arch/powerpc/platforms/powernv/setup.c
index b23d8beb0f1e..ff9422776a81 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -112,17 +112,12 @@ static void pnv_prepare_going_down(void)
 */
opal_event_shutdown();
 
-   /* Soft disable interrupts */
-   local_irq_disable();
+   /* Print flash update message if one is scheduled. */
+   opal_flash_update_print_message();
 
-   /*
-* Return secondary CPUs to firwmare if a flash update
-* is pending otherwise we will get all sort of error
-* messages about CPU being stuck etc.. This will also
-* have the side effect of hard disabling interrupts so
-* past this point, the kernel is effectively dead.
-*/
-   opal_flash_term_callback();
+   smp_send_stop();
+
+   hard_irq_disable();
 }
 
 static void  __noreturn pnv_restart(char *cmd)
-- 
2.13.3



[PATCH v2 0/3] avoid secondary hold spinloop when possible

2017-10-23 Thread Nicholas Piggin
I've made this series only avoid the secondary spinloop on powernv.
pSeries is a little bit more complicated, some cases in kexec the
secondaries will be at 0x60. I haven't had time to get all that
sorted out for this merge window, so OPAL-only this time.

Nicholas Piggin (3):
  powerpc/powernv: Always stop secondaries before reboot/shutdown
  powerpc: use NMI IPI for smp_send_stop
  powerpc/powernv: Avoid waiting for secondary hold spinloop with OPAL

 arch/powerpc/include/asm/opal.h |  2 +-
 arch/powerpc/kernel/head_64.S   | 16 +++-
 arch/powerpc/kernel/setup_64.c  | 10 +-
 arch/powerpc/kernel/smp.c   |  9 +
 arch/powerpc/platforms/powernv/opal-flash.c | 28 +---
 arch/powerpc/platforms/powernv/setup.c  | 15 +--
 6 files changed, 32 insertions(+), 48 deletions(-)

-- 
2.13.3



[PATCH 3/3] powerpc/powernv: Use FIXUP_ENDIAN_HV in OPAL return

2017-10-23 Thread Nicholas Piggin
Close the recoverability gap for OPAL calls by using FIXUP_ENDIAN_HV
in the return path.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/opal-wrappers.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 37cd170201a2..6f4b00a2ac46 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -94,7 +94,7 @@ opal_return:
 * bytes (always BE) since MSR:LE will end up fixed up as a side
 * effect of the rfid.
 */
-   FIXUP_ENDIAN
+   FIXUP_ENDIAN_HV
ld  r2,PACATOC(r13);
lwz r4,8(r1);
ld  r5,PPC_LR_STKOFF(r1);
@@ -120,7 +120,7 @@ opal_real_call:
hrfid
 
 opal_return_realmode:
-   FIXUP_ENDIAN
+   FIXUP_ENDIAN_HV
ld  r2,PACATOC(r13);
lwz r11,8(r1);
ld  r12,PPC_LR_STKOFF(r1)
-- 
2.13.3



[PATCH 2/3] powerpc/book3s: Add an HV variant of FIXUP_ENDIAN that is recoverable

2017-10-23 Thread Nicholas Piggin
Add an HV variant of FIXUP_ENDIAN which uses HSRR[01] and does not
clear MSR[RI], which improves recoverability.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc_asm.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index d6b56aebe602..ae94b3626b6c 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -774,6 +774,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #ifdef CONFIG_PPC_BOOK3E
 #define FIXUP_ENDIAN
 #else
+/*
+ * This version may be used in in HV or non-HV context.
+ * MSR[EE] must be disabled.
+ */
 #define FIXUP_ENDIAN  \
tdi   0,0,0x48;   /* Reverse endian of b . + 8  */ \
b 191f;   /* Skip trampoline if endian is good  */ \
@@ -789,6 +793,24 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
.long 0x244c; /* rfid   */ \
 191:
 
+/*
+ * This version that may only be used with MSR[HV]=1
+ * - Does not clear MSR[RI], so more robust.
+ * - Slightly smaller and faster.
+ */
+#define FIXUP_ENDIAN_HV   \
+   tdi   0,0,0x48;   /* Reverse endian of b . + 8  */ \
+   b 191f;   /* Skip trampoline if endian is good  */ \
+   .long 0xa600607d; /* mfmsr r11  */ \
+   .long 0x01006b69; /* xori r11,r11,1 */ \
+   .long 0x05009f42; /* bcl 20,31,$+4  */ \
+   .long 0xa602487d; /* mflr r10   */ \
+   .long 0x14004a39; /* addi r10,r10,20*/ \
+   .long 0xa64b5a7d; /* mthsrr0 r10*/ \
+   .long 0xa64b7b7d; /* mthsrr1 r11*/ \
+   .long 0x2402004c; /* hrfid  */ \
+191:
+
 #endif /* !CONFIG_PPC_BOOK3E */
 
 #endif /*  __ASSEMBLY__ */
-- 
2.13.3



[PATCH 1/3] powerpc/book3s: use label for FIXUP_ENDIAN macro branch

2017-10-23 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/ppc_asm.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h 
b/arch/powerpc/include/asm/ppc_asm.h
index 36f3e41c9fbe..d6b56aebe602 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -776,7 +776,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
 #else
 #define FIXUP_ENDIAN  \
tdi   0,0,0x48;   /* Reverse endian of b . + 8  */ \
-   b $+44;   /* Skip trampoline if endian is good  */ \
+   b 191f;   /* Skip trampoline if endian is good  */ \
.long 0xa600607d; /* mfmsr r11  */ \
.long 0x01006b69; /* xori r11,r11,1 */ \
.long 0x4039; /* li r10,0   */ \
@@ -786,7 +786,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_601)
.long 0x14004a39; /* addi r10,r10,20*/ \
.long 0xa6035a7d; /* mtsrr0 r10 */ \
.long 0xa6037b7d; /* mtsrr1 r11 */ \
-   .long 0x244c  /* rfid   */
+   .long 0x244c; /* rfid   */ \
+191:
 
 #endif /* !CONFIG_PPC_BOOK3E */
 
-- 
2.13.3



[PATCH 0/3] improve recoverability of OPAL calls

2017-10-23 Thread Nicholas Piggin
Here's a few patches to improve recoverability for FIXUP_ENDIAN
on powernv. We should try to minimise SRR[01] (and MSR[RI]=0) usage
as much as possible. Whether that's by using HSRR or mtmsrd, it
usually results in faster and smaller code too.

There's a few other places we can improve, but I've had these
patches around for a while.

Thanks,
Nick


Nicholas Piggin (3):
  powerpc/book3s: use label for FIXUP_ENDIAN macro branch
  powerpc/book3s: Add an HV variant of FIXUP_ENDIAN that is recoverable
  powerpc/powernv: Use FIXUP_ENDIAN_HV in OPAL return

 arch/powerpc/include/asm/ppc_asm.h | 27 --
 arch/powerpc/platforms/powernv/opal-wrappers.S |  4 ++--
 2 files changed, 27 insertions(+), 4 deletions(-)

-- 
2.13.3