[RFC PATCH v2] powerpc/64s: Move idle code to powernv C code

2018-07-20 Thread Nicholas Piggin
Reimplement Book3S idle code to C, in the powernv platform code.
Assembly stubs are used to save and restore the stack frame and
non-volatile GPRs before going to idle, but these are small and
mostly agnostic to microarchitecture implementation details.

The optimisation where EC=ESL=0 idle modes did not have to save
GPRs or mtmsrd L=0 is restored, because it's simple to do.

Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
but saves and restores them all explicitly. This can easily be
extended to tracking the set of system-wide SPRs that do not have
to be saved each time.

Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
way this stuff will cope with non-trivial new CPU implementation
details, firmware changes, etc., without becoming unmaintainable.

Since RFC v1:
- Now tested and working with POWER9 hash and radix.
- KVM support added. This took a bit of work to untangle and might
  still have some issues, but POWER9 seems to work including hash on
  radix with dependent threads mode.
- This snowballed a bit because of KVM and other details making it
  not feasible to leave POWER7/8 code alone. That's only half done
  at the moment.
- So far this trades about 800 lines of asm for 500 of C. With POWER7/8
  support done it might be another hundred or so lines of C.

Would appreciate any feedback on the approach in particular the
significantly different (and hopefully cleaner) KVM approach.

Thanks,
Nick

---
 include/asm/book3s/64/mmu-hash.h |1 
 include/asm/cpuidle.h|   17 
 include/asm/paca.h   |   40 -
 include/asm/processor.h  |8 
 include/asm/reg.h|7 
 kernel/asm-offsets.c |   12 
 kernel/dt_cpu_ftrs.c |   21 
 kernel/exceptions-64s.S  |   17 
 kernel/idle_book3s.S |  996 +++
 kernel/setup-common.c|4 
 kvm/book3s_hv_rmhandlers.S   |   94 ++-
 mm/slb.c |7 
 platforms/powernv/idle.c |  713 +++
 platforms/powernv/subcore.c  |2 
 xmon/xmon.c  |   26 -
 15 files changed, 817 insertions(+), 1148 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..c626319a962d 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -486,6 +486,7 @@ static inline void hpte_init_pseries(void) { }
 extern void hpte_init_native(void);
 
 extern void slb_initialize(void);
+extern void __slb_flush_and_rebolt(void);
 extern void slb_flush_and_rebolt(void);
 
 extern void slb_vmalloc_update(void);
diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83eb196..edf33dc0d098 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -28,6 +28,7 @@
  * yet woken from the winkle state.
  */
 #define PNV_CORE_IDLE_LOCK_BIT 0x1000
+#define NR_PNV_CORE_IDLE_LOCK_BIT  28
 
 #define PNV_CORE_IDLE_WINKLE_COUNT 0x0001
 #define PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT 0x0008
@@ -68,22 +69,6 @@
 #define ERR_DEEP_STATE_ESL_MISMATCH-2
 
 #ifndef __ASSEMBLY__
-/* Additional SPRs that need to be saved/restored during stop */
-struct stop_sprs {
-   u64 pid;
-   u64 ldbar;
-   u64 fscr;
-   u64 hfscr;
-   u64 mmcr1;
-   u64 mmcr2;
-   u64 mmcra;
-};
-
-extern u32 pnv_fastsleep_workaround_at_entry[];
-extern u32 pnv_fastsleep_workaround_at_exit[];
-
-extern u64 pnv_first_deep_stop_state;
-
 unsigned long pnv_cpu_offline(unsigned int cpu);
 int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags);
 static inline void report_invalid_psscr_val(u64 psscr_val, int err)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4e9cede5a7e7..27f0e1d6a462 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -168,7 +168,6 @@ struct paca_struct {
u8 irq_happened;/* irq happened while soft-disabled */
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
-   u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
u8 pmcregs_in_use;  /* pseries puts this in lppaca */
 #endif
@@ -178,23 +177,30 @@ struct paca_struct {
 #endif
 
 #ifdef CONFIG_PPC_POWERNV
-   /* Per-core mask tracking idle threads and a lock bit-[L][] */
-   u32 *core_idle_state_ptr;
-   u8 thread_idle_state;   /* PNV_THREAD_RUNNING/NAP/SLEEP */
-   /* Mask to indicate thread id in core */
-   u8 thread_mask;
-   /* Mask to denote subcore sibling threads */
-   u8 subcore_sibling_mask;
-   /* Flag 

[GIT PULL] Please pull powerpc/linux.git powerpc-4.18-4 tag

2018-07-20 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc fixes for 4.18:

The following changes since commit 021c91791a5e7e85c567452f1be3e4c2c6cb6063:

  Linux 4.18-rc3 (2018-07-01 16:04:53 -0700)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.18-4

for you to fetch changes up to b03897cf318dfc47de33a7ecbc7655584266f034:

  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle) 
(2018-07-18 20:40:17 +1000)

- --
powerpc fixes for 4.18 #4

Two regression fixes, one for xmon disassembly formatting and the other to fix
the E500 build.

Two commits to fix a potential security issue in the VFIO code under obscure
circumstances.

And finally a fix to the Power9 idle code to restore SPRG3, which is user
visible and used for sched_getcpu().

Thanks to:
  Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy, James Clarke.

- --
Alexey Kardashevskiy (2):
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page

Gautham R. Shenoy (1):
  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)

James Clarke (1):
  powerpc/Makefile: Assemble with -me500 when building for E500

Michael Ellerman (1):
  powerpc/xmon: Fix disassembly since printf changes


 arch/powerpc/Makefile  |  1 +
 arch/powerpc/include/asm/mmu_context.h |  4 ++--
 arch/powerpc/kernel/idle_book3s.S  |  2 ++
 arch/powerpc/kvm/book3s_64_vio.c   |  2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c|  6 --
 arch/powerpc/mm/mmu_context_iommu.c| 37 --
 arch/powerpc/xmon/xmon.c   |  4 ++--
 drivers/vfio/vfio_iommu_spapr_tce.c| 10 -
 8 files changed, 52 insertions(+), 14 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBCAAGBQJbUrNFAAoJEFHr6jzI4aWAYmsP/2ZyuFemX2WlWocU0KUKF3ig
2og9uIrwrt4Loi44FCZxeSCWqvoJJq0VvhtPY5OOoYlEMwtTJe3ZK490i2ZLz1ev
Bv//6RBkc1KIpIHS/sgx5h3/uCYiSmSikchdWNnDq9pjBpESRIZLacb/62Ahi2ZP
qKOr2msCto1frncisRB8fYnRG3r/cyBOb6ZIjkoKigf2LaYc507xPh0GjF/ZMZZi
x5Nri0RTplYk74rGMGBmUuEUCtJPL6NjSg7IwplWXGnncdddLerXlkUxyKq+2ukm
n3wukncvzuAw3BMVP1I7+MR44x4/u37QXypNtV9Mj7kskJ4LBZDnAQPjYbMWWyM5
h3I4j8mc1JOqFY2eMnGn5oH+IXwNacVB+9joWkbb1sYA1Z4ejW4YPnUl7LsXUkdV
2iiw+VKFMHvQyu+VShlqMYuNvN5cLv+uKK5EOrYQaheCqomidLcnxHS6o1RkAcrn
UIgfkKnigAV99cstWIq9NWaCUmvLZFs8nZcSRncGC2vspsR6RuchHurpsxDwecOr
YqQUvjPuLzrDpMUjx6khX+YaYYRMXP3Hk0A0FCwULrqjBjJ2kvglrstRH2Cn/67+
a2Qis42ZQnKPDSCuIEVfBkAR1wkoRR6+LeQ/DYSoMLAiHOTKvPNfYSSyBWj1FiUS
dOrYHAh/DlYwk2l+RM/O
=Fz3Z
-END PGP SIGNATURE-


Re: [PATCH v4 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-20 Thread Mike Kravetz
On 07/20/2018 11:37 AM, Alex Ghiti wrote:
> Does anyone have any suggestion about those patches ?

I only took a quick look.  From the hugetlb perspective, I like the
idea of moving routines to a common file.  If any of the arch owners
(or anyone else) agree, I can do a review of the series.
-- 
Mike Kravetz

> On 07/09/2018 02:16 PM, Michal Hocko wrote:
>> [CC hugetlb guys - 
>> http://lkml.kernel.org/r/20180705110716.3919-1-a...@ghiti.fr]
>>
>> On Thu 05-07-18 11:07:05, Alexandre Ghiti wrote:
>>> In order to reduce copy/paste of functions across architectures and then
>>> make riscv hugetlb port (and future ports) simpler and smaller, this
>>> patchset intends to factorize the numerous hugetlb primitives that are
>>> defined across all the architectures.
>>>
>>> Except for prepare_hugepage_range, this patchset moves the versions that
>>> are just pass-through to standard pte primitives into
>>> asm-generic/hugetlb.h by using the same #ifdef semantic that can be
>>> found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.
>>>
>>> s390 architecture has not been tackled in this serie since it does not
>>> use asm-generic/hugetlb.h at all.
>>> powerpc could be factorized a bit more (cf huge_ptep_set_wrprotect).
>>>
>>> This patchset has been compiled on x86 only.
>>>
>>> Changelog:
>>>
>>> v4:
>>>Fix powerpc build error due to misplacing of #include
>>> outside of #ifdef CONFIG_HUGETLB_PAGE, as
>>>pointed by Christophe Leroy.
>>>
>>> v1, v2, v3:
>>>Same version, just problems with email provider and misuse of
>>>--batch-size option of git send-email
>>>
>>> Alexandre Ghiti (11):
>>>hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
>>>hugetlb: Introduce generic version of hugetlb_free_pgd_range
>>>hugetlb: Introduce generic version of set_huge_pte_at
>>>hugetlb: Introduce generic version of huge_ptep_get_and_clear
>>>hugetlb: Introduce generic version of huge_ptep_clear_flush
>>>hugetlb: Introduce generic version of huge_pte_none
>>>hugetlb: Introduce generic version of huge_pte_wrprotect
>>>hugetlb: Introduce generic version of prepare_hugepage_range
>>>hugetlb: Introduce generic version of huge_ptep_set_wrprotect
>>>hugetlb: Introduce generic version of huge_ptep_set_access_flags
>>>hugetlb: Introduce generic version of huge_ptep_get
>>>
>>>   arch/arm/include/asm/hugetlb-3level.h| 32 +-
>>>   arch/arm/include/asm/hugetlb.h   | 33 +--
>>>   arch/arm64/include/asm/hugetlb.h | 39 +++-
>>>   arch/ia64/include/asm/hugetlb.h  | 47 ++-
>>>   arch/mips/include/asm/hugetlb.h  | 40 +++--
>>>   arch/parisc/include/asm/hugetlb.h| 33 +++
>>>   arch/powerpc/include/asm/book3s/32/pgtable.h |  2 +
>>>   arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
>>>   arch/powerpc/include/asm/hugetlb.h   | 43 ++
>>>   arch/powerpc/include/asm/nohash/32/pgtable.h |  2 +
>>>   arch/powerpc/include/asm/nohash/64/pgtable.h |  1 +
>>>   arch/sh/include/asm/hugetlb.h| 54 ++---
>>>   arch/sparc/include/asm/hugetlb.h | 40 +++--
>>>   arch/x86/include/asm/hugetlb.h   | 72 +--
>>>   include/asm-generic/hugetlb.h| 88 
>>> +++-
>>>   15 files changed, 143 insertions(+), 384 deletions(-)
>>>
>>> -- 
>>> 2.16.2
> 


Re: [PATCH v4 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-20 Thread Alex Ghiti

Does anyone have any suggestion about those patches ?

On 07/09/2018 02:16 PM, Michal Hocko wrote:

[CC hugetlb guys - http://lkml.kernel.org/r/20180705110716.3919-1-a...@ghiti.fr]

On Thu 05-07-18 11:07:05, Alexandre Ghiti wrote:

In order to reduce copy/paste of functions across architectures and then
make riscv hugetlb port (and future ports) simpler and smaller, this
patchset intends to factorize the numerous hugetlb primitives that are
defined across all the architectures.

Except for prepare_hugepage_range, this patchset moves the versions that
are just pass-through to standard pte primitives into
asm-generic/hugetlb.h by using the same #ifdef semantic that can be
found in asm-generic/pgtable.h, i.e. __HAVE_ARCH_***.

s390 architecture has not been tackled in this serie since it does not
use asm-generic/hugetlb.h at all.
powerpc could be factorized a bit more (cf huge_ptep_set_wrprotect).

This patchset has been compiled on x86 only.

Changelog:

v4:
   Fix powerpc build error due to misplacing of #include
outside of #ifdef CONFIG_HUGETLB_PAGE, as
   pointed by Christophe Leroy.

v1, v2, v3:
   Same version, just problems with email provider and misuse of
   --batch-size option of git send-email

Alexandre Ghiti (11):
   hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
   hugetlb: Introduce generic version of hugetlb_free_pgd_range
   hugetlb: Introduce generic version of set_huge_pte_at
   hugetlb: Introduce generic version of huge_ptep_get_and_clear
   hugetlb: Introduce generic version of huge_ptep_clear_flush
   hugetlb: Introduce generic version of huge_pte_none
   hugetlb: Introduce generic version of huge_pte_wrprotect
   hugetlb: Introduce generic version of prepare_hugepage_range
   hugetlb: Introduce generic version of huge_ptep_set_wrprotect
   hugetlb: Introduce generic version of huge_ptep_set_access_flags
   hugetlb: Introduce generic version of huge_ptep_get

  arch/arm/include/asm/hugetlb-3level.h| 32 +-
  arch/arm/include/asm/hugetlb.h   | 33 +--
  arch/arm64/include/asm/hugetlb.h | 39 +++-
  arch/ia64/include/asm/hugetlb.h  | 47 ++-
  arch/mips/include/asm/hugetlb.h  | 40 +++--
  arch/parisc/include/asm/hugetlb.h| 33 +++
  arch/powerpc/include/asm/book3s/32/pgtable.h |  2 +
  arch/powerpc/include/asm/book3s/64/pgtable.h |  1 +
  arch/powerpc/include/asm/hugetlb.h   | 43 ++
  arch/powerpc/include/asm/nohash/32/pgtable.h |  2 +
  arch/powerpc/include/asm/nohash/64/pgtable.h |  1 +
  arch/sh/include/asm/hugetlb.h| 54 ++---
  arch/sparc/include/asm/hugetlb.h | 40 +++--
  arch/x86/include/asm/hugetlb.h   | 72 +--
  include/asm-generic/hugetlb.h| 88 +++-
  15 files changed, 143 insertions(+), 384 deletions(-)

--
2.16.2




Re: CONFIG_ANDROID_BINDER_IPC=y (Re: powerpc: 32BIT vs. 64BIT (PPC32 vs. PPC64))

2018-07-20 Thread Randy Dunlap
On 07/13/2018 04:24 PM, Randy Dunlap wrote:
> On 07/13/2018 04:41 AM, Mathieu Malaterre wrote:
>> Randy,
>>
>> On Mon, Jul 9, 2018 at 2:00 PM Mathieu Malaterre  wrote:
>>>
>>> On Sun, Jul 8, 2018 at 1:53 PM Michael Ellerman  wrote:

 Randy Dunlap  writes:
> Hi,
>
> Is there a good way (or a shortcut) to do something like:

 The best I know of is:

> $ make ARCH=powerpc O=PPC32 [other_options] allmodconfig
>   to get a PPC32/32BIT allmodconfig

 $ echo CONFIG_PPC64=n > allmod.config
 $ KCONFIG_ALLCONFIG=1 make allmodconfig
 $ grep PPC32 .config
 CONFIG_PPC32=y

 Which is still a bit clunky.


 I looked at this a while back and the problem we have is that the 32-bit
 kernel is not a single thing. There are multiple 32-bit platforms which
 are mutually exclusive.

 eg, from menuconfig:

  - 512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx
  - Freescale 85xx
  - Freescale 8xx
  - AMCC 40x
  - AMCC 44x, 46x or 47x
  - Freescale e200
>>>
>>> Most Linux distro seems to have drop support for ppc32. So I'd suggest
>>> to pick Debian powperc default config (but I agree that I am a little
>>> biased here).
>>
>> I tried an allmode as suggest by Michael (above). But I get a build error:
>>
>>   MODPOST vmlinux.o
>> drivers/android/binder.o: In function `binder_thread_write':
>> binder.c:(.text+0xc750): undefined reference to `__get_user_bad'
>> binder.c:(.text+0xc76c): undefined reference to `__get_user_bad'
>> binder.c:(.text+0xc790): undefined reference to `__get_user_bad'
>> binder.c:(.text+0xc7d4): undefined reference to `__get_user_bad'
>> binder.c:(.text+0xc7f4): undefined reference to `__get_user_bad'
>>
>>
>> So for now I need to do: CONFIG_ANDROID_BINDER_IPC=n
>>
>> How did you get passed this build failure ?
> 
> Hi,
> 
> I am not seeing an error on that driver build.
> 
> I am using gcc 8.1.0 from kernel.org:
> https://mirrors.edge.kernel.org/pub/tools/crosstool/
> 
> and building on x86_64.

Hi Mathieu,

I do see this build error (slightly different undefined reference though)
when I do an arch/microblaze/ cross-build:

drivers/android/binder.o: In function `binder_thread_write':
drivers/android/.tmp_gl_binder.o:(.text+0xcba8): undefined reference to 
`__user_bad'
drivers/android/.tmp_gl_binder.o:(.text+0xcbd4): undefined reference to 
`__user_bad'
drivers/android/.tmp_gl_binder.o:(.text+0xcfbc): undefined reference to 
`__user_bad'
drivers/android/.tmp_gl_binder.o:(.text+0xd648): undefined reference to 
`__user_bad'
drivers/android/.tmp_gl_binder.o:(.text+0xdbc0): undefined reference to 
`__user_bad'


-- 
~Randy


Re: [PATCH v2 2/2] powerpc/pseries: Wait for completion of hotplug events during PRRN handling

2018-07-20 Thread Nathan Fontenot

On 07/17/2018 02:40 PM, John Allen wrote:

While handling PRRN events, the time to handle the actual hotplug events
dwarfs the time it takes to perform the device tree updates and queue the
hotplug events. In the case that PRRN events are being queued continuously,
hotplug events have been observed to be queued faster than the kernel can
actually handle them. This patch avoids the problem by waiting for a
hotplug request to complete before queueing more hotplug events.

Signed-off-by: John Allen 


Reviewed-by: Nathan Fontenot 


---
  arch/powerpc/platforms/pseries/mobility.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/mobility.c 
b/arch/powerpc/platforms/pseries/mobility.c
index 8a8033a249c7..49930848fa78 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -242,6 +242,7 @@ static int add_dt_node(__be32 parent_phandle, __be32 
drc_index)
  static void prrn_update_node(__be32 phandle)
  {
struct pseries_hp_errorlog *hp_elog;
+   struct completion hotplug_done;
struct device_node *dn;

/*
@@ -263,7 +264,9 @@ static void prrn_update_node(__be32 phandle)
hp_elog->id_type = PSERIES_HP_ELOG_ID_DRC_INDEX;
hp_elog->_drc_u.drc_index = phandle;

-   queue_hotplug_event(hp_elog, NULL, NULL);
+   init_completion(&hotplug_done);
+   queue_hotplug_event(hp_elog, &hotplug_done, NULL);
+   wait_for_completion(&hotplug_done);

kfree(hp_elog);
  }





Re: [PATCH v2 1/2] powerpc/pseries: Avoid blocking rtas polling handling multiple PRRN events

2018-07-20 Thread Nathan Fontenot

On 07/17/2018 02:40 PM, John Allen wrote:

When a PRRN event is being handled and another PRRN event comes in, the
second event will block rtas polling waiting on the first to complete,
preventing any further rtas events from being handled. This can be
especially problematic in case that PRRN events are continuously being
queued in which case rtas polling gets indefinitely blocked completely.

This patch introduces a mutex that prevents any subsequent PRRN events from
running while there is a prrn event being handled, allowing rtas polling to
continue normally.

Signed-off-by: John Allen 


Reviewed-by: Nathan Fontenot 


---
v2:
   -Unlock prrn_lock when PRRN operations are complete, not after handler is
scheduled.
   -Remove call to flush_work, the previous broken method of serializing
PRRN events.
---
  arch/powerpc/kernel/rtasd.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 44d66c33d59d..845fc5aec178 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -35,6 +35,8 @@

  static DEFINE_SPINLOCK(rtasd_log_lock);

+static DEFINE_MUTEX(prrn_lock);
+
  static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait);

  static char *rtas_log_buf;
@@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work)
 */
pseries_devicetree_update(-prrn_update_scope);
numa_update_cpu_topology(false);
+   mutex_unlock(&prrn_lock);
  }

  static DECLARE_WORK(prrn_work, prrn_work_fn);

  static void prrn_schedule_update(u32 scope)
  {
-   flush_work(&prrn_work);
-   prrn_update_scope = scope;
-   schedule_work(&prrn_work);
+   if (mutex_trylock(&prrn_lock)) {
+   prrn_update_scope = scope;
+   schedule_work(&prrn_work);
+   }
  }

  static void handle_rtas_event(const struct rtas_error_log *log)





Re: [PATCH v4 4/6] powerpc/fsl: Enable cpu vulnerabilities reporting for NXP PPC BOOK3E

2018-07-20 Thread Diana Madalina Craciun
On 7/19/2018 3:05 PM, Michael Ellerman wrote:
> LEROY Christophe  writes:
>> Diana Madalina Craciun  a écrit :
>>> On 7/17/2018 7:47 PM, LEROY Christophe wrote:
 Diana Craciun  a écrit :
> The NXP PPC Book3E platforms are not vulnerable to meltdown and
> Spectre v4, so make them PPC_BOOK3S_64 specific.
>
> Signed-off-by: Diana Craciun 
> ---
> History:
>
> v2-->v3
> - used the existing functions for spectre v1/v2
>
>  arch/powerpc/Kconfig   | 7 ++-
>  arch/powerpc/kernel/security.c | 2 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 9f2b75f..116c953 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -165,7 +165,7 @@ config PPC
>   select GENERIC_CLOCKEVENTS_BROADCASTif SMP
>   select GENERIC_CMOS_UPDATE
>   select GENERIC_CPU_AUTOPROBE
> - select GENERIC_CPU_VULNERABILITIES  if PPC_BOOK3S_64
> + select GENERIC_CPU_VULNERABILITIES  if PPC_NOSPEC
 I don't understand.  You say this patch is to make something specific
 to book3s64 specific, and you are creating a new config param that
 make things less specific

 Christophe
>>> In order to enable the vulnerabilities reporting on NXP socs I need to
>>> enable them for PPC_FSL_BOOK3E. So they will be enabled for both
>>> PPC_FSL_BOOK3E and PPC_BOOK3S_64. This is the reason for adding the
>>> Kconfig. However this will enable: spectre v1/v2 and meltdown. NXP socs
>>> are not vulnerable to meltdown, so I made the meltdown reporting
>>> PPC_BOOK3S_64 specific. I guess I can have the PPC_NOSPEC definition in
>>> a separate patch to be more clear.
>> Yes you can. Or keep it as a single patch and add the details you gave  
>> me in the patch description.
> Yeah I think the patch is fine, but the change log is a bit short on detail.
>
> If you just send me a new change log I can fold it in.
>
> cheers
>
Thanks! This is the new change log:

"The Spectre/Meltdown vulnerabilities will be enabled for both
PPC_FSL_BOOK3E and PPC_BOOK3S_64. In order to avoid a complicated ifdef
we add a new Kconfig (PPC_NOSPEC) to select the common code between
BOOK3S_64 and FSL_BOOK3E. However, the NXP platforms are not vulnerable
to Meltdown, so make the Meltdown vulnerability reporting PPC_BOOK3S_64
specific."

Regards,

Diana



TP Link WDR4900 (was NXP p1010se device trees only correct for P1010E/P1014E, not P1010/P1014 SoCs).

2018-07-20 Thread Tim Small

On 09/07/18 23:21, Scott Wood wrote:


Thanks for your email.  The device in question ships an old uboot (a
vendor fork of U-Boot 2010.12-svn15934).


This was added by commit 6b70ffb9d1b2e, committed in July 2008... maybe
there's a problem with the old U-Boot finding the crypto node on this
particular chip?


Hi Scott,

Thanks for your response and the pointers...  I don't know my way around 
this code at all (I just got here when I hit a user space bug when 
trying to use 802.1AE), so I'm in the dark a bit here.


The device doesn't have mainline support, there is the original vendor 
patched uboot and kernel (linux 2.6.35 based) released here:


https://www.tp-link.com/us/download/TL-WDR4900.html#GPL-Code

The the vendor u-boot serial output identifies the board as a P1014RDB 
in its serial output, so I'd guess that TP-Link just made minimal 
changes from the NXP reference design.


The OpenWRT distro package, which has its own reasonably simple set of 
kernel patches (it's currently using 4.9.x on this platform) at:


https://git.openwrt.org/openwrt/openwrt.git

in:

target/linux/mpc85xx/files/arch/powerpc/platforms/85xx

It produces a cuImage for this board, but I don't know why that decision 
was made, maybe it's just what the vendor kernel did, or maybe it's 
difficult to differentiate a TL-WDR4900 from a P1010RDB reference board. 
 I couldn't find the P1010RDB BSP to see what TP-Link has done.


I could potentially work on getting those upstream if you think that 
would be worthwhile, but I don't really have a good feel for how good or 
hacky the code that's in OpenWRT is at the moment.  I'd probably need 
help getting it up and running.


I presume it'd mean switching from a cuImage to a uImage if practical?

Cheers,

Tim.

--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-20 Thread Michael S. Tsirkin
On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> This patch series is the follow up on the discussions we had before about
> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> were suggestions about doing away with two different paths of transactions
> with the host/QEMU, first being the direct GPA and the other being the DMA
> API based translations.
> 
> First patch attempts to create a direct GPA mapping based DMA operations
> structure called 'virtio_direct_dma_ops' with exact same implementation
> of the direct GPA path which virtio core currently has but just wrapped in
> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> existing semantics. The second patch does exactly that inside the function
> virtio_finalize_features(). The third patch removes the default direct GPA
> path from virtio core forcing it to use DMA API callbacks for all devices.
> Now with that change, every device must have a DMA operations structure
> associated with it. The fourth patch adds an additional hook which gives
> the platform an opportunity to do yet another override if required. This
> platform hook can be used on POWER Ultravisor based protected guests to
> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> in the above mentioned thread how host is allowed to access only parts of
> the guest GPA range) bounce buffering into the shared memory for all I/O
> scatter gather buffers to be consumed on the host side.
> 
> Please go through these patches and review whether this approach broadly
> makes sense. I will appreciate suggestions, inputs, comments regarding
> the patches or the approach in general. Thank you.

I like how patches 1-3 look. Could you test performance
with/without to see whether the extra indirection through
use of DMA ops causes a measurable slow-down?

> Anshuman Khandual (4):
>   virtio: Define virtio_direct_dma_ops structure
>   virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively
>   virtio: Force virtio core to use DMA API callbacks for all virtio devices
>   virtio: Add platform specific DMA API translation for virito devices
> 
>  arch/powerpc/include/asm/dma-mapping.h |  6 +++
>  arch/powerpc/platforms/pseries/iommu.c |  6 +++
>  drivers/virtio/virtio.c| 72 
> ++
>  drivers/virtio/virtio_pci_common.h |  3 ++
>  drivers/virtio/virtio_ring.c   | 65 +-
>  5 files changed, 89 insertions(+), 63 deletions(-)
> 
> -- 
> 2.9.3


Re: [RFC 4/4] virtio: Add platform specific DMA API translation for virito devices

2018-07-20 Thread Michael S. Tsirkin
On Fri, Jul 20, 2018 at 09:29:41AM +0530, Anshuman Khandual wrote:
>Subject: Re: [RFC 4/4] virtio: Add platform specific DMA API translation for
> virito devices

s/virito/virtio/

> This adds a hook which a platform can define in order to allow it to
> override virtio device's DMA OPS irrespective of whether it has the
> flag VIRTIO_F_IOMMU_PLATFORM set or not. We want to use this to do
> bounce-buffering of data on the new secure pSeries platform, currently
> under development, where a KVM host cannot access all of the memory
> space of a secure KVM guest.  The host can only access the pages which
> the guest has explicitly requested to be shared with the host, thus
> the virtio implementation in the guest has to copy data to and from
> shared pages.
> 
> With this hook, the platform code in the secure guest can force the
> use of swiotlb for virtio buffers, with a back-end for swiotlb which
> will use a pool of pre-allocated shared pages.  Thus all data being
> sent or received by virtio devices will be copied through pages which
> the host has access to.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/powerpc/include/asm/dma-mapping.h | 6 ++
>  arch/powerpc/platforms/pseries/iommu.c | 6 ++
>  drivers/virtio/virtio.c| 7 +++
>  3 files changed, 19 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa3945..bc5a9d3 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -116,3 +116,9 @@ extern u64 __dma_get_required_mask(struct device *dev);
>  
>  #endif /* __KERNEL__ */
>  #endif   /* _ASM_DMA_MAPPING_H */
> +
> +#define platform_override_dma_ops platform_override_dma_ops
> +
> +struct virtio_device;
> +
> +extern void platform_override_dma_ops(struct virtio_device *vdev);
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index 06f0296..5773bc7 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1396,3 +1397,8 @@ static int __init disable_multitce(char *str)
>  __setup("multitce=", disable_multitce);
>  
>  machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> +
> +void platform_override_dma_ops(struct virtio_device *vdev)
> +{
> + /* Override vdev->parent.dma_ops if required */
> +}
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 6b13987..432c332 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -168,6 +168,12 @@ EXPORT_SYMBOL_GPL(virtio_add_status);
>  
>  const struct dma_map_ops virtio_direct_dma_ops;
>  
> +#ifndef platform_override_dma_ops
> +static inline void platform_override_dma_ops(struct virtio_device *vdev)
> +{
> +}
> +#endif
> +
>  int virtio_finalize_features(struct virtio_device *dev)
>  {
>   int ret = dev->config->finalize_features(dev);
> @@ -179,6 +185,7 @@ int virtio_finalize_features(struct virtio_device *dev)
>   if (virtio_has_iommu_quirk(dev))
>   set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops);
>  
> + platform_override_dma_ops(dev);

Is there a single place where virtio_has_iommu_quirk is called now?
If so, we could put this into virtio_has_iommu_quirk then.

>   if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   return 0;
>  
> -- 
> 2.9.3


Re: [PATCH 1/2] powerpc: Add ppc32_allmodconfig defconfig target

2018-07-20 Thread Michael Ellerman
Randy Dunlap  writes:

> On 07/09/2018 07:24 AM, Michael Ellerman wrote:
>> Because the allmodconfig logic just sets every symbol to M or Y, it
>> has the effect of always generating a 64-bit config, because
>> CONFIG_PPC64 becomes Y.
>> 
>> So to make it easier for folks to test 32-bit code, provide a phony
>> defconfig target that generates a 32-bit allmodconfig.
>> 
>> The 32-bit port has several mutually exclusive CPU types, we choose
>> the Book3S variants as that's what the help text in Kconfig says is
>> most common.
>> 
>> Signed-off-by: Michael Ellerman 
>
> Hi Michael,
>
> ppc32_allmodconfig sets CONFIG_ISA=y (and other related symbols) and
> CONFIG_PPC_CHRP=y.  But my builds are failing because they are missing
> the functions isa_bus_to_virt() and isa_virt_to_bus().
>
> Any ideas?

It's old legacy cruft that we've never implemented :)

I don't know if it's possible to implement it for CHRP, Ben implied it
might be, back in 2009:

  https://lists.ozlabs.org/pipermail/linuxppc-dev/2009-June/073232.html

But of course nothing came of it.


It looks like there's only a handful of drivers left using them, we
should probably just mark them as not buildable on PPC.

Arnd did something similar for ARM in:
  e9b106b8fbdb ("net: lance,ni64: don't build for ARM")


cheers


Re: [PATCH 1/2] powerpc: Add ppc32_allmodconfig defconfig target

2018-07-20 Thread Michael Ellerman
Randy Dunlap  writes:

> On 07/09/2018 07:24 AM, Michael Ellerman wrote:
>> Because the allmodconfig logic just sets every symbol to M or Y, it
>> has the effect of always generating a 64-bit config, because
>> CONFIG_PPC64 becomes Y.
>> 
>> So to make it easier for folks to test 32-bit code, provide a phony
>> defconfig target that generates a 32-bit allmodconfig.
>> 
>> The 32-bit port has several mutually exclusive CPU types, we choose
>> the Book3S variants as that's what the help text in Kconfig says is
>> most com
>> Signed-off-by: Michael Ellerman 
>
> Hi Michael,
>
> Sorry for the delay.  I was traveling (out in the boonies).
>
> I'm trying to use 'make ppc32_allmodconfig'.  Cross-building on x86_64
> with crosstools from kernel.org.  (gcc 8.1.0)
>
> I'm getting build errors.  Looks like it's missing a header file or 3.
> I looked into that but it's a long and twisty maze of passages.
> Any ideas?

Urk.

That code was really written for 64-bit and we haven't ever quite made
it fully generic, as you can see.

Christophe got it working for 8xx (a different 32-bit variant), but
clearly it doesn't work for this config.

This might be the solution for now:

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index c45424c64e19..cb406d00702c 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -362,6 +362,7 @@ config FAIL_IOMMU
 config PPC_PTDUMP
 bool "Export kernel pagetable layout to userspace via debugfs"
 depends on DEBUG_KERNEL && DEBUG_FS
+depends on PPC64 || PPC_8xx
 help
  This option exports the state of the kernel pagetables to a
  debugfs file. This is only useful for kernel developers who are


cheers

>   CC  arch/powerpc/mm/dump_linuxpagetables.o
> In file included from ../arch/powerpc/include/asm/book3s/pgtable.h:8,
>  from ../arch/powerpc/include/asm/pgtable.h:18,
>  from ../include/linux/hugetlb.h:12,
>  from ../arch/powerpc/mm/dump_linuxpagetables.c:19:
> ../arch/powerpc/mm/dump_linuxpagetables.c: In function 'populate_markers':
> ../arch/powerpc/include/asm/book3s/32/pgtable.h:53:19: error: 'PKMAP_BASE' 
> undeclared (first use in this function); did you mean 'AT_BASE'?
>  #define KVIRT_TOP PKMAP_BASE
>^~
> ../arch/powerpc/include/asm/book3s/32/pgtable.h:64:23: note: in expansion of 
> macro 'KVIRT_TOP'
>  #define IOREMAP_TOP ((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
>^
> ../arch/powerpc/mm/dump_linuxpagetables.c:456:39: note: in expansion of macro 
> 'IOREMAP_TOP'
>   address_markers[i++].start_address = IOREMAP_TOP;
>^~~
> ../arch/powerpc/include/asm/book3s/32/pgtable.h:53:19: note: each undeclared 
> identifier is reported only once for each function it appears in
>  #define KVIRT_TOP PKMAP_BASE
>^~
> ../arch/powerpc/include/asm/book3s/32/pgtable.h:64:23: note: in expansion of 
> macro 'KVIRT_TOP'
>  #define IOREMAP_TOP ((KVIRT_TOP - CONFIG_CONSISTENT_SIZE) & PAGE_MASK)
>^
> ../arch/powerpc/mm/dump_linuxpagetables.c:456:39: note: in expansion of macro 
> 'IOREMAP_TOP'
>   address_markers[i++].start_address = IOREMAP_TOP;
>^~~
> ../arch/powerpc/mm/dump_linuxpagetables.c:464:39: error: implicit declaration 
> of function 'PKMAP_ADDR'; did you mean 'PCI_IO_ADDR'? 
> [-Werror=implicit-function-declaration]
>   address_markers[i++].start_address = PKMAP_ADDR(LAST_PKMAP);
>^~
>PCI_IO_ADDR
> ../arch/powerpc/mm/dump_linuxpagetables.c:464:50: error: 'LAST_PKMAP' 
> undeclared (first use in this function); did you mean 'LIST_HEAD'?
>   address_markers[i++].start_address = PKMAP_ADDR(LAST_PKMAP);
>   ^~
>   LIST_HEAD
>
>
>
> Thanks.
>
>> ---
>>  arch/powerpc/Makefile | 5 +
>>  arch/powerpc/configs/book3s_32.config | 2 ++
>>  2 files changed, 7 insertions(+)
>>  create mode 100644 arch/powerpc/configs/book3s_32.config
>> 
>> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
>> index 2ea575cb3401..2556c2182789 100644
>> --- a/arch/powerpc/Makefile
>> +++ b/arch/powerpc/Makefile
>> @@ -354,6 +354,11 @@ mpc86xx_smp_defconfig:
>>  $(call merge_into_defconfig,mpc86xx_basic_defconfig,\
>>  86xx-smp 86xx-hw fsl-emb-nonhw)
>>  
>> +PHONY += ppc32_allmodconfig
>> +ppc32_allmodconfig:
>> +$(Q)$(MAKE) 
>> KCONFIG_ALLCONFIG=$(srctree)/arch/powerpc/configs/book3s_32.config \
>> +-f $(srctree)/Makefile allmodconfig
>> +
>>  define archhelp
>>@echo '* zImage  - Build default images selected by kernel config'
>>@echo '  zImage.*- Compressed kernel image 
>> (arch/$(ARCH)/boot/zIm

Re: [PATCH 2/2] powerpc: Add ppc64le and ppc64_book3e allmodconfig targets

2018-07-20 Thread Michael Ellerman
Hi Randy,

Randy Dunlap  writes:
> On 07/09/18 07:24, Michael Ellerman wrote:
>> Similarly as we just did for 32-bit, add phony targets for generating
>> a little endian and Book3E allmodconfig. These aren't covered by the
>> regular allmodconfig, which is big endian and Book3S due to the way
>> the Kconfig symbols are structured.
>
> [adding Felipe Balbi]
>
> Is book3e allmodconfig not seen/used very much?

Seems so :)

> Besides the patches that I have already sent, I am seeing a build problem
> with ppc64_book3e_allmodconfig, where we have:
>
> CONFIG_USB_PHY=y
> CONFIG_FSL_USB2_OTG=y
> but
> CONFIG_USB_OTG_FSM=m
>
> In drivers/usb/phy/Kconfig, FSL_USB2_OTG depends on USB_OTG_FSM (among
> other things), but!  FSL_USB2_OTG is a bool symbol, depending on a
> tristate symbol.  This often causes problems.  In this case it causes errors
> with a builtin driver trying to use symbols that are built in a loadable 
> module:
>
> drivers/usb/phy/phy-fsl-usb.o: In function `.fsl_otg_ioctl':
> phy-fsl-usb.c:(.text.fsl_otg_ioctl+0xb4): undefined reference to 
> `.otg_statemachine'


Do we just need something like?

diff --git a/drivers/usb/phy/Kconfig b/drivers/usb/phy/Kconfig
index d7312eed6088..91ea3083e7ad 100644
--- a/drivers/usb/phy/Kconfig
+++ b/drivers/usb/phy/Kconfig
@@ -21,7 +21,7 @@ config AB8500_USB
 
 config FSL_USB2_OTG
bool "Freescale USB OTG Transceiver Driver"
-   depends on USB_EHCI_FSL && USB_FSL_USB2 && USB_OTG_FSM && PM
+   depends on USB_EHCI_FSL && USB_FSL_USB2 && USB_OTG_FSM=y && PM
depends on USB_GADGET || !USB_GADGET # if USB_GADGET=m, this can't be 
'y'
select USB_PHY
help


cheers


[RFC PATCH 4/4] powerpc/mm:book3s: Enable THP migration support

2018-07-20 Thread Aneesh Kumar K.V
---
 arch/powerpc/include/asm/book3s/64/pgtable.h | 5 +
 arch/powerpc/platforms/Kconfig.cputype   | 1 +
 2 files changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index fce9ce8781a0..f619f3215c05 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -734,6 +734,8 @@ static inline bool pte_user(pte_t pte)
  */
 #define __pte_to_swp_entry(pte)((swp_entry_t) { pte_val((pte)) & 
~_PAGE_PTE })
 #define __swp_entry_to_pte(x)  __pte((x).val | _PAGE_PTE)
+#define __pmd_to_swp_entry(pmd)(__pte_to_swp_entry(pmd_pte(pmd)))
+#define __swp_entry_to_pmd(x)  (pte_pmd(__swp_entry_to_pte(x)))
 
 #ifdef CONFIG_MEM_SOFT_DIRTY
 #define _PAGE_SWP_SOFT_DIRTY   (1UL << (SWP_TYPE_BITS + _PAGE_BIT_SWAP_TYPE))
@@ -1079,6 +1081,9 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
 #define pmd_mk_savedwrite(pmd) pte_pmd(pte_mk_savedwrite(pmd_pte(pmd)))
 #define pmd_clear_savedwrite(pmd)  
pte_pmd(pte_clear_savedwrite(pmd_pte(pmd)))
+#define pmd_swp_mksoft_dirty(pmd)  
pte_pmd(pte_swp_mksoft_dirty(pmd_pte(pmd)))
+#define pmd_swp_soft_dirty(pmd)pte_swp_soft_dirty(pmd_pte(pmd))
+#define pmd_swp_clear_soft_dirty(pmd)  
pte_pmd(pte_swp_clear_soft_dirty(pmd_pte(pmd)))
 
 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY
 #define pmd_soft_dirty(pmd)pte_soft_dirty(pmd_pte(pmd))
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index e6a1de521319..2e8ee6a33587 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -72,6 +72,7 @@ config PPC_BOOK3S_64
select PPC_HAVE_PMU_SUPPORT
select SYS_SUPPORTS_HUGETLBFS
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
+   select ARCH_ENABLE_THP_MIGRATION
select ARCH_SUPPORTS_NUMA_BALANCING
select IRQ_WORK
select HAVE_KERNEL_XZ
-- 
2.17.1



[RFC PATCH 3/4] powerpc/mm/book3s: Check for pmd_large instead of pmd_trans_huge in set_pmd_at

2018-07-20 Thread Aneesh Kumar K.V
We want to use this to store swap pte at pmd level. For swap ptes we don't want
to set H_PAGE_THP_HUGE. Hence check for pmd_large in set_pmd_at. This remove
the false WARN_ON when using this with swap pmd entry.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/pgtable-book3s64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index c1f825ff251b..9a4d4908e242 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -71,7 +71,7 @@ void set_pmd_at(struct mm_struct *mm, unsigned long addr,
 #ifdef CONFIG_DEBUG_VM
WARN_ON(pte_present(pmd_pte(*pmdp)) && !pte_protnone(pmd_pte(*pmdp)));
assert_spin_locked(pmd_lockptr(mm, pmdp));
-   WARN_ON(!(pmd_trans_huge(pmd) || pmd_devmap(pmd)));
+   WARN_ON(!(pmd_large(pmd) || pmd_devmap(pmd)));
 #endif
trace_hugepage_set_pmd(addr, pmd_val(pmd));
return set_pte_at(mm, addr, pmdp_ptep(pmdp), pmd_pte(pmd));
-- 
2.17.1



[RFC PATCH 2/4] powerpc/mm/hugetlb/book3s: add _PAGE_PRESENT to hugepd pointer.

2018-07-20 Thread Aneesh Kumar K.V
This make hugetlb directory pointer similar to other page able entries. A hugepd
entry is identified by lack of _PAGE_PTE bit set and directory size stored in
HUGEPD_SHIFT_MASK. We update that to also look at _PAGE_PRESENT

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-4k.h | 2 +-
 arch/powerpc/include/asm/book3s/64/hugetlb.h | 3 +++
 arch/powerpc/mm/hugetlbpage.c| 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
b/arch/powerpc/include/asm/book3s/64/hash-4k.h
index 9a3798660cef..15bc16b1dc9c 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
@@ -66,7 +66,7 @@ static inline int hash__hugepd_ok(hugepd_t hpd)
 * if it is not a pte and have hugepd shift mask
 * set, then it is a hugepd directory pointer
 */
-   if (!(hpdval & _PAGE_PTE) &&
+   if (!(hpdval & _PAGE_PTE) && (hpdval & _PAGE_PRESENT) &&
((hpdval & HUGEPD_SHIFT_MASK) != 0))
return true;
return false;
diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h 
b/arch/powerpc/include/asm/book3s/64/hugetlb.h
index 50888388a359..5b0177733994 100644
--- a/arch/powerpc/include/asm/book3s/64/hugetlb.h
+++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h
@@ -39,4 +39,7 @@ static inline bool gigantic_page_supported(void)
 }
 #endif
 
+/* hugepd entry valid bit */
+#define HUGEPD_VAL_BITS(0x8000UL)
+
 #endif
diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index f425b5b37d58..7ae5e4bfd318 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -95,7 +95,7 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
break;
else {
 #ifdef CONFIG_PPC_BOOK3S_64
-   *hpdp = __hugepd(__pa(new) |
+   *hpdp = __hugepd(__pa(new) | HUGEPD_VAL_BITS |
 (shift_to_mmu_psize(pshift) << 2));
 #elif defined(CONFIG_PPC_8xx)
*hpdp = __hugepd(__pa(new) | _PMD_USER |
-- 
2.17.1



[RFC PATCH 1/4] powerpc/mm/book3s: Update pmd_present to look at _PAGE_PRESENT bit

2018-07-20 Thread Aneesh Kumar K.V
With this patch we use 0x8000UL (_PAGE_PRESENT) to indicate a valid
pgd/pud/pmd entry. We also switch the p**_present() to look at this bit.

With pmd_present, we have a special case. We need to make sure we consider a
pmd marked invalid during THP split as present. Right now we clear the
_PAGE_PRESENT bit during a pmdp_invalidate. Inorder to consider this special
case we add a new pte bit _PAGE_INVALID (mapped to _RPAGE_SW0). This bit is
only used with _PAGE_PRESENT cleared. Hence we are not really loosing a pte bit
for this special case. pmd_present is also updated to look at _PAGE_INVALID.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash.h|  5 +
 arch/powerpc/include/asm/book3s/64/pgtable.h | 23 +---
 arch/powerpc/mm/hash_utils_64.c  |  6 ++---
 arch/powerpc/mm/pgtable-book3s64.c   |  2 +-
 4 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 0387b155f13d..a371ac7c3183 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -16,6 +16,11 @@
 #include 
 #endif
 
+/* Bits to set in a PMD/PUD/PGD entry valid bit*/
+#define HASH_PMD_VAL_BITS  (0x8000UL)
+#define HASH_PUD_VAL_BITS  (0x8000UL)
+#define HASH_PGD_VAL_BITS  (0x8000UL)
+
 /*
  * Size of EA range mapped by our pagetables.
  */
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 676118743a06..fce9ce8781a0 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -44,6 +44,15 @@
 
 #define _PAGE_PTE  0x4000UL/* distinguishes PTEs 
from pointers */
 #define _PAGE_PRESENT  0x8000UL/* pte contains a 
translation */
+/*
+ * We need to mark a pmd pte invalid while splitting. We can do that by 
clearing the
+ * _PAGE_PRESENT bit. But then that will be taken as a swap pte. Inorder to 
differentiate
+ * between two use a SW field when invalidating. We don't add a special bit to 
indicate
+ * swap pte because that is also used for migration ptes, and we do back and 
forth between
+ * a valid pte entry and migration ptes. So any information in the software 
bits will be
+ * lost if we overload those bits.
+ */
+#define _PAGE_INVALID  _RPAGE_SW0
 
 /*
  * Top and bottom bits of RPN which can be used by hash
@@ -859,8 +868,16 @@ static inline int pmd_none(pmd_t pmd)
 
 static inline int pmd_present(pmd_t pmd)
 {
+   /*
+* A pmd is considerent present if _PAGE_PRESENT is set.
+* We also need to consider the pmd present which is marked
+* invalid during a split. Hence we look for _PAGE_INVALID
+* if we find _PAGE_PRESENT cleared.
+*/
+   if (pmd_raw(pmd) & cpu_to_be64(_PAGE_PRESENT | _PAGE_INVALID))
+   return true;
 
-   return !pmd_none(pmd);
+   return false;
 }
 
 static inline int pmd_bad(pmd_t pmd)
@@ -887,7 +904,7 @@ static inline int pud_none(pud_t pud)
 
 static inline int pud_present(pud_t pud)
 {
-   return !pud_none(pud);
+   return (pud_raw(pud) & cpu_to_be64(_PAGE_PRESENT));
 }
 
 extern struct page *pud_page(pud_t pud);
@@ -934,7 +951,7 @@ static inline int pgd_none(pgd_t pgd)
 
 static inline int pgd_present(pgd_t pgd)
 {
-   return !pgd_none(pgd);
+   return (pgd_raw(pgd) & cpu_to_be64(_PAGE_PRESENT));
 }
 
 static inline pte_t pgd_pte(pgd_t pgd)
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 5a72e980e25a..7ce7fa5397d5 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1002,9 +1002,9 @@ void __init hash__early_init_mmu(void)
 * 4k use hugepd format, so for hash set then to
 * zero
 */
-   __pmd_val_bits = 0;
-   __pud_val_bits = 0;
-   __pgd_val_bits = 0;
+   __pmd_val_bits = HASH_PMD_VAL_BITS;
+   __pud_val_bits = HASH_PUD_VAL_BITS;
+   __pgd_val_bits = HASH_PGD_VAL_BITS;
 
__kernel_virt_start = H_KERN_VIRT_START;
__kernel_virt_size = H_KERN_VIRT_SIZE;
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 5d2328ef7958..c1f825ff251b 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -106,7 +106,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned 
long address,
 {
unsigned long old_pmd;
 
-   old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 
0);
+   old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, 
_PAGE_INVALID);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
/*
 * This ensures that generic code that rely on IRQ disabling
-- 
2.17.1



Re: Improvements for the PS3

2018-07-20 Thread Fredrik Noring
Hi Geert,

> > That would not work with kernel freezes unfortunately. Debugging those with
> > nondeterministicly invisible kernel prints would be painful, I believe.
> 
> Unfortunately AFAIK the PS3 doesn't have any other "synchronous" output we
> can use for debugging (like flashing the power LED).

Well, one can use lv1_panic to bisect the PowerPC boot process. It either
freezes, in which case the call to lv1_panic isn't reached, or it reboots.
This method yields 1 bit of information for every reflash and reboot, with
a procedure taking about 5 minutes per turn, so 12 bits/hour sustained. ;)

The prime advantage is that lv1_panic works everywhere, including the
earliest boot stages.

I've since learned about the PS3GELIC_UDBG option to broadcast UDP via the
Ethernet port, but I don't know how well it works or its limitations.

One useful option for debugging is to setup a very early graphical console,
more or less from head.S, before any kernel functions are invoked. I have a
simple proof of concept implemented for the PS2 here:

https://github.com/frno7/linux/blob/ps2-v4.17-n3/arch/mips/boot/compressed/dbg.c

This could be done for the PS3 as well. The OtherOS demo

http://mc.pp.se/ps3/oodemo.xhtml

by Marcus Comstedt in 2007 contains the essentials with MMU and hypervisor
initialisation for graphics. I discovered that a small patch is required to
make his demo work with modern GCC:

--- a/source/script.lds
+++ b/source/script.lds
@@ -50,6 +50,7 @@ SECTIONS
  .opd : {
   *(.opd)
   }
+ . = ALIGN(256);
  .got : {
   __toc_start = .;
   *(.got)

Fredrik


Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier

2018-07-20 Thread Peter Zijlstra
On Thu, Jul 19, 2018 at 10:04:09AM -0700, Andy Lutomirski wrote:
> I added some more arch maintainers.  The idea here is that, on x86 at
> least, task->active_mm and all its refcounting is pure overhead.  When
> a process exits, __mmput() gets called, but the core kernel has a
> longstanding "optimization" in which other tasks (kernel threads and
> idle tasks) may have ->active_mm pointing at this mm.  This is nasty,
> complicated, and hurts performance on large systems, since it requires
> extra atomic operations whenever a CPU switches between real users
> threads and idle/kernel threads.
> 
> It's also almost completely worthless on x86 at least, since __mmput()
> frees pagetables, and that operation *already* forces a remote TLB
> flush, so we might as well zap all the active_mm references at the
> same time.

So I disagree that active_mm is complicated (the code is less than ideal
but that is actually fixable). And aside from the process exit case, it
does avoid CR3 writes when switching between user and kernel threads
(which can be far more often than exit if you have longer running
tasks).

Now agreed, recent x86 work has made that less important.

And I of course also agree that not doing those refcount atomics is
better.


Re: Improvements for the PS3

2018-07-20 Thread Geert Uytterhoeven
Hi Fredrik,

On Thu, Jul 19, 2018 at 10:14 PM Fredrik Noring  wrote:
> > > > so I added a sleep with
> > > >
> > > > + msleep(1);
> >
> > I can't see where you added the sleep, but 10s seems excessive.
> > If the real reason is the need to wait for an interrupt for 
> > ps3fb_sync_image(),
> > then waiting for 40 ms should be sufficient? Or am I missing something?
>
> It's at the end of ps3fb_probe, as shown in the original post:
>
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-July/175771.html

Thanks, I had found that one in the mean time...

> I thought 100 ms or so would work, but evidently it didn't. In fact, 1 s
> for even 5 s didn't seem to work either. In any case, I would like to
> develop a solution that does not need to sleep at all, so that will be my
> first approach for a proper implementation.

Hmm...

> > > I suppose the problem is that it relies on interrupts for ps3fb_sync_image
> > > > to regularly copy the image, hence without them the screen isn't 
> > > > updated to
> > > > show kernel panics, etc. Perhaps one way to fix that is to implement the
> > > > struct fb_tile_ops API, so that the console is synchronously updated? 
> > > > Would
> > > > that be acceptable?
> > >
> > > I'm not sure if that would work or not.   Maybe Geert is more familiar 
> > > with it.
> >
> > That sounds like a complex solution, slowing down the console a lot.
>
> Why would that be slow? I have implemented a similar technique for the
> PlayStation 2 frame buffer, and (without any measurements at hand now) it
> appears to be about as fast as is possible, and reasonably easy too. :)

[...]

OK, I retract my statement ;-)

> > What about letting ps3fb register a panic notifier to sync the screen, like
> > hyperv_fb does?
>
> That would not work with kernel freezes unfortunately. Debugging those with
> nondeterministicly invisible kernel prints would be painful, I believe.

Unfortunately AFAIK the PS3 doesn't have any other "synchronous" output we
can use for debugging (like flashing the power LED).

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [RESEND][PATCH] powerpc/powernv : Save/Restore SPRG3 on entry/exit from stop.

2018-07-20 Thread Michael Neuling
On Fri, 2018-07-20 at 16:29 +1000, Michael Ellerman wrote:
> Michael Neuling  writes:
> > On Fri, 2018-07-20 at 12:32 +1000, Michael Ellerman wrote:
> > > Michael Neuling  writes:
> > > > On Wed, 2018-07-18 at 13:42 +0530, Gautham R Shenoy wrote:
> > > > > On Wed, Jul 18, 2018 at 09:24:19AM +1000, Michael Neuling wrote:
> > > > > > 
> > > > > > >   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> > > > > > > diff --git a/arch/powerpc/kernel/idle_book3s.S
> > > > > > > b/arch/powerpc/kernel/idle_book3s.S
> > > > > > > index d85d551..5069d42 100644
> > > > > > > --- a/arch/powerpc/kernel/idle_book3s.S
> > > > > > > +++ b/arch/powerpc/kernel/idle_book3s.S
> > > > > > > @@ -120,6 +120,9 @@ power9_save_additional_sprs:
> > > > > > >   mfspr   r4, SPRN_MMCR2
> > > > > > >   std r3, STOP_MMCR1(r13)
> > > > > > >   std r4, STOP_MMCR2(r13)
> > > > > > > +
> > > > > > > + mfspr   r3, SPRN_SPRG3
> > > > > > > + std r3, STOP_SPRG3(r13)
> > > > > > 
> > > > > > We don't need to save it.  Just restore it from paca->sprg_vdso
> > > > > > which
> > > > > > should
> > > > > > never change.
> > > > > 
> > > > > Ok. I will respin a patch to restore SPRG3 from paca->sprg_vdso.
> > > > > 
> > > > > > 
> > > > > > How can we do better at catching these missing SPRGs?
> > > > > 
> > > > > We can go through the list of SPRs from the POWER9 User Manual and
> > > > > document explicitly why we don't have to save/restore certain SPRs
> > > > > during the execution of the stop instruction. Does this sound ok ?
> > > > > 
> > > > > (Ref: Table 4-8, Section 4.7.3.4 from the POWER9 User Manual
> > > > > accessible from
> > > > > https://openpowerfoundation.org/?resource_lib=power9-processor-users-m
> > > > > anua
> > > > > l)
> > > > 
> > > > I was thinking of a boot time test case built into linux. linux has some
> > > > boot
> > > > time test cases which you can enable via CONFIG options.
> > > > 
> > > > Firstly you could see if an SPR exists using the same trick xmon does in
> > > > dump_one_spr(). Then once you have a list of usable SPRs, you could
> > > > write
> > > > all
> > > > the known ones (I assume you'd have to leave out some, like the PSSCR),
> > > > then
> > > > set
> > > 
> > > Write what value?
> > > 
> > > Ideally you want to write a random bit pattern to reduce the chance
> > > that only some bits are being restored.
> > 
> > The xmon dump_one_spr() trick tries to work around that by writing one
> > random
> > value and then a different one to see if it really is a nop.
> > 
> > > But you can't do that because writing a value to an SPRs has an effect.
> > 
> > Sure that's a concern but xmon seems to get away with it.
> 
> I don't think it writes, but maybe I'm reading the code wrong.

You're right, sorry. It's the write the GPR that becomes a NOP when the SPR is
not there. I misremembered how it worked. 

Maybe that won't work stop since we'd need to be able change the SPR value to
ensure we don't hit the reset value after a stop state. 

We'd be able to detect SPRs that that change from it's reset value but not those
that are already at their reset value.

> Writing a random value to the MSR could be fun :)

Fortunately the MSR is not an SPR :-P

> > 
> > Yeah, I'm not convinced it'll work either but it would be a nice piece of
> > test
> > infrastructure to have if it does work.
> 
> Yeah I guess I'd rather we worked on 1) and 2) below first :)

ok

> > We'd still need to marry up the SPR numbers we get from the test to what's
> > actually being restored in Linux.
> > 
> > > But there's a much simpler solution, we should 1) have a selftest for
> > > getcpu() and 2) we should be running the glibc (I think?) test suite
> > > that found this in the first place. It's frankly embarrassing that we
> > > didn't find this.
> > 
> > Yeah, we should do that also, but how do we catch the next SPR we are
> > missing.
> > I'd like some systematic way of doing that rather than wack-a-mole.
> 
> Whack-a-mole 

I preferred waking them :-)

> We could also improve things by documenting how each SPR is handled, eg.
> is it saved/restored across idle, syscall, KVM etc. And possibly that
> could even become code that defines how SPRs are handled, rather than it
> all being done ad-hoc.

Yeah.  It's complicated by linux calling opal_slw_set_reg() to change what's
saved. This was part of the reason I'd hoped doing a linux test case would help
as we could do it after those calls.

Mikey