Re: [PATCH v7 00/24] Speculative page faults

2018-02-12 Thread Laurent Dufour
On 08/02/2018 21:53, Andrew Morton wrote:
> On Tue,  6 Feb 2018 17:49:46 +0100 Laurent Dufour 
>  wrote:
> 
>> This is a port on kernel 4.15 of the work done by Peter Zijlstra to
>> handle page fault without holding the mm semaphore [1].
>>
>> The idea is to try to handle user space page faults without holding the
>> mmap_sem. This should allow better concurrency for massively threaded
>> process since the page fault handler will not wait for other threads memory
>> layout change to be done, assuming that this change is done in another part
>> of the process's memory space. This type page fault is named speculative
>> page fault. If the speculative page fault fails because of a concurrency is
>> detected or because underlying PMD or PTE tables are not yet allocating, it
>> is failing its processing and a classic page fault is then tried.
>>
>> The speculative page fault (SPF) has to look for the VMA matching the fault
>> address without holding the mmap_sem, this is done by introducing a rwlock
>> which protects the access to the mm_rb tree. Previously this was done using
>> SRCU but it was introducing a lot of scheduling to process the VMA's
>> freeing
>> operation which was hitting the performance by 20% as reported by Kemi Wang
>> [2].Using a rwlock to protect access to the mm_rb tree is limiting the
>> locking contention to these operations which are expected to be in a O(log
>> n)
>> order. In addition to ensure that the VMA is not freed in our back a
>> reference count is added and 2 services (get_vma() and put_vma()) are
>> introduced to handle the reference count. When a VMA is fetch from the RB
>> tree using get_vma() is must be later freeed using put_vma(). Furthermore,
>> to allow the VMA to be used again by the classic page fault handler a
>> service is introduced can_reuse_spf_vma(). This service is expected to be
>> called with the mmap_sem hold. It checked that the VMA is still matching
>> the specified address and is releasing its reference count as the mmap_sem
>> is hold it is ensure that it will not be freed in our back. In general, the
>> VMA's reference count could be decremented when holding the mmap_sem but it
>> should not be increased as holding the mmap_sem is ensuring that the VMA is
>> stable. I can't see anymore the overhead I got while will-it-scale
>> benchmark anymore.
>>
>> The VMA's attributes checked during the speculative page fault processing
>> have to be protected against parallel changes. This is done by using a per
>> VMA sequence lock. This sequence lock allows the speculative page fault
>> handler to fast check for parallel changes in progress and to abort the
>> speculative page fault in that case.
>>
>> Once the VMA is found, the speculative page fault handler would check for
>> the VMA's attributes to verify that the page fault has to be handled
>> correctly or not. Thus the VMA is protected through a sequence lock which
>> allows fast detection of concurrent VMA changes. If such a change is
>> detected, the speculative page fault is aborted and a *classic* page fault
>> is tried.  VMA sequence lockings are added when VMA attributes which are
>> checked during the page fault are modified.
>>
>> When the PTE is fetched, the VMA is checked to see if it has been changed,
>> so once the page table is locked, the VMA is valid, so any other changes
>> leading to touching this PTE will need to lock the page table, so no
>> parallel change is possible at this time.
>>
>> The locking of the PTE is done with interrupts disabled, this allows to
>> check for the PMD to ensure that there is not an ongoing collapsing
>> operation. Since khugepaged is firstly set the PMD to pmd_none and then is
>> waiting for the other CPU to have catch the IPI interrupt, if the pmd is
>> valid at the time the PTE is locked, we have the guarantee that the
>> collapsing opertion will have to wait on the PTE lock to move foward. This
>> allows the SPF handler to map the PTE safely. If the PMD value is different
>> than the one recorded at the beginning of the SPF operation, the classic
>> page fault handler will be called to handle the operation while holding the
>> mmap_sem. As the PTE lock is done with the interrupts disabled, the lock is
>> done using spin_trylock() to avoid dead lock when handling a page fault
>> while a TLB invalidate is requested by an other CPU holding the PTE.
>>
>> Support for THP is not done because when checking for the PMD, we can be
>> confused by an in progress collapsing operation done by khugepaged. The
>> issue is that pmd_none() could be true either if the PMD is not already
>> populate or if the underlying PTE are in the way to be collapsed. So we
>> cannot safely allocate a PMD if pmd_none() is true.
>>
>> This series builds on top of v4.15-mmotm-2018-01-31-16-51 and is
>> functional on x86 and PowerPC.
> 
> One question which people will want to answer is "is this thing
> working".  ie, how frequently does the code fall back to the regular
> heavyweight fau

[RFC PATCH] powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb

2018-02-12 Thread Aneesh Kumar K.V
With 64k page size, we have hugetlb pte entries at the pmd and pud level for
book3s64. We don't need to create a separate page table cache for that. With 4k
we need to make sure hugepd page table cache for 16M is placed at PUD level
and 16G at the PGD level.

Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64.

Without this patch, with 64k page size we create pagetable caches with shift
value 10 and 7 which are not used at all.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/mm/hugetlbpage.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 876da2bc1796..3b509b268030 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -122,9 +122,6 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t 
*hpdp,
 #if defined(CONFIG_PPC_FSL_BOOK3E) || defined(CONFIG_PPC_8xx)
 #define HUGEPD_PGD_SHIFT PGDIR_SHIFT
 #define HUGEPD_PUD_SHIFT PUD_SHIFT
-#else
-#define HUGEPD_PGD_SHIFT PUD_SHIFT
-#define HUGEPD_PUD_SHIFT PMD_SHIFT
 #endif
 
 /*
@@ -669,12 +666,24 @@ static int __init hugetlbpage_init(void)
if (add_huge_page_size(1ULL << shift) < 0)
continue;
 
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (shift > PGDIR_SHIFT)
+   BUG();
+   else if (shift > PUD_SHIFT)
+   pdshift = PGDIR_SHIFT;
+   else if (shift > PMD_SHIFT)
+   pdshift = PUD_SHIFT;
+   else
+   pdshift = PMD_SHIFT;
+#else
if (shift < HUGEPD_PUD_SHIFT)
pdshift = PMD_SHIFT;
else if (shift < HUGEPD_PGD_SHIFT)
pdshift = PUD_SHIFT;
else
pdshift = PGDIR_SHIFT;
+#endif
/*
 * if we have pdshift and shift value same, we don't
 * use pgt cache for hugepd.
-- 
2.14.3



[PATCH] powerpc/powernv: IMC fix out of bounds memory access at shutdown

2018-02-12 Thread Nicholas Piggin
The OPAL IMC driver's shutdown handler disables nest PMU counters by
walking nodes and taking the first CPU out of their cpumask, which is
used to index into the paca (get_hard_smp_processor_id()). This does
not always do the right thing, and in particular for CPU-less nodes it
returns NR_CPUS and that overruns the paca and dereferences random
memory.

Fix it by being more careful about checking returned CPU, and only
using online CPUs. It's not clear this shutdown code makes sense
after commit 885dcd709b ("powerpc/perf: Add nest IMC PMU support"),
but this should not make things worse

Changing the way pacas are allocated to an array of pointers exposed
this bug:

Unable to handle kernel paging request for data at address 0x2a21af1eeb76
Faulting instruction address: 0xc00a5468
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack 
nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp tun bridge stp llc 
iptable_filter ib_ipoib ib_cm ib_core kvm_hv kvm binfmt_misc vmx_crypto 
dm_multipath scsi_dh_rdac scsi_dh_alua ip_tables x_tables autofs4 crc32c_vpmsum
CPU: 52 PID: 1 Comm: systemd-shutdow Not tainted 
4.15.0-12636-g3f1ac76cdc8f-dirty #134
NIP:  c00a5468 LR: c00a5454 CTR: 
REGS: c000200e58403870 TRAP: 0380   Not tainted  
(4.15.0-12636-g3f1ac76cdc8f-dirty)
MSR:  9280b033   CR: 28288422  XER: 
2004
CFAR: c0152354 SOFTE: 0
GPR00: c00a5454 c000200e58403af0 c1093f00 0001
GPR04: 0001 04dc c000200e609a 0001b3bc
GPR08: c10d0b98 2a21af1eeb46 c000200fff7fc000 
GPR12: 8000 c00eb800 000133f97b10 
GPR16: 72e9dcc8 000133faf4a0 000133f97310 
GPR20: 000133f97e80 000133f97d80 000133f97470 000133f97aa8
GPR24: c10cfb70 c0d20d68 c0d20d78 c0d30438
GPR28: c0d20d88 0800 c10d10b8 00fc
NIP [c00a5468] opal_imc_counters_shutdown+0x148/0x1d0
LR [c00a5454] opal_imc_counters_shutdown+0x134/0x1d0
Call Trace:
[c000200e58403af0] [c00a5454] opal_imc_counters_shutdown+0x134/0x1d0 
(unreliable)
[c000200e58403b90] [c0723734] platform_drv_shutdown+0x44/0x60
[c000200e58403bb0] [c071df58] device_shutdown+0x1f8/0x350
[c000200e58403c50] [c010bbd4] kernel_restart_prepare+0x54/0x70
[c000200e58403c70] [c010bd28] kernel_restart+0x28/0xc0
[c000200e58403ce0] [c010c210] SyS_reboot+0x1d0/0x2c0
[c000200e58403e30] [c000b920] system_call+0x58/0x6c
Instruction dump:
48512459 6000 7fe4fb78 7c7d07b4 7f63db78 7fa5eb78 480acebd 6000
e958 7ba91f24 3861 7d2a482a  4bfe84a9 6000 7fa5eb78
---[ end trace 8e58676c4eb8656a ]---

Cc: Anju T Sudhakar 
Cc: Hemant Kumar 
Cc: Madhavan Srinivasan 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/platforms/powernv/opal-imc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index dd4c9b8b8a81..f6f55ab4980e 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -199,9 +199,11 @@ static void disable_nest_pmu_counters(void)
const struct cpumask *l_cpumask;
 
get_online_cpus();
-   for_each_online_node(nid) {
+   for_each_node_with_cpus(nid) {
l_cpumask = cpumask_of_node(nid);
-   cpu = cpumask_first(l_cpumask);
+   cpu = cpumask_first_and(l_cpumask, cpu_online_mask);
+   if (cpu >= nr_cpu_ids)
+   continue;
opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
   get_hard_smp_processor_id(cpu));
}
-- 
2.16.1



[PATCH] selftests/powerpc: Fix: use ucontext_t instead of struct ucontext

2018-02-12 Thread Harish
With glibc 2.26 'struct ucontext' is removed to improve POSIX
compliance, which breaks powerpc/alignment_handler selftest.
Fix the test by using ucontext_t. Tested on ppc, works with older
glibc versions as well.

Fixes the following:
alignment_handler.c: In function ‘sighandler’:
alignment_handler.c:68:5: error: dereferencing pointer to incomplete type 
‘struct ucontext’
  ucp->uc_mcontext.gp_regs[PT_NIP] += 4;
 ^~

Signed-off-by: Harish 
---
 tools/testing/selftests/powerpc/alignment/alignment_handler.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 39fd362..0f2698f 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -57,7 +57,7 @@ volatile int gotsig;
 
 void sighandler(int sig, siginfo_t *info, void *ctx)
 {
-   struct ucontext *ucp = ctx;
+   ucontext_t *ucp = ctx;
 
if (!testing) {
signal(sig, SIG_DFL);
-- 
2.7.4



Re: [PATCH] powerpc/npu-dma.c: Fix deadlock in mmio_invalidate

2018-02-12 Thread Balbir Singh
On Tue, 13 Feb 2018 14:17:34 +1100
Alistair Popple  wrote:

> When sending TLB invalidates to the NPU we need to send extra flushes due
> to a hardware issue. The original implementation would lock the all the
> ATSD MMIO registers sequentially before unlocking and relocking each of
> them sequentially to do the extra flush.
> 
> This introduced a deadlock as it is possible for one thread to hold one
> ATSD register whilst waiting for another register to be freed while the
> other thread is holding that register waiting for the one in the first
> thread to be freed.
> 
> For example if there are two threads and two ATSD registers:
> 
> Thread A  Thread B
> Acquire 1
> Acquire 2
> Release 1 Acquire 1
> Wait 1Wait 2
> 
> Both threads will be stuck waiting to acquire a register resulting in an
> RCU stall warning or soft lockup.
> 
> This patch solves the deadlock by refactoring the code to ensure registers
> are not released between flushes and to ensure all registers are either
> acquired or released together and in order.
> 
> Fixes: bbd5ff50afff ("powerpc/powernv/npu-dma: Add explicit flush when 
> sending an ATSD")
> Signed-off-by: Alistair Popple 
> ---
> 
> Michael,
> 
> This should probalby go to stable as well, although it's bigger than the 100
> line limit mentioned in the stable kernel rules.
> 
> - Alistair
> 
>  arch/powerpc/platforms/powernv/npu-dma.c | 195 
> +--
>  1 file changed, 109 insertions(+), 86 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index fb0a6dee9bce..5746b456dfa4 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -408,6 +408,11 @@ struct npu_context {
>   void *priv;
>  };
>  
> +struct mmio_atsd_reg {
> + struct npu *npu;
> + int reg;
> +};
> +

Is it just easier to move reg to inside of struct npu?

>  /*
>   * Find a free MMIO ATSD register and mark it in use. Return -ENOSPC
>   * if none are available.
> @@ -433,79 +438,83 @@ static void put_mmio_atsd_reg(struct npu *npu, int reg)
>  #define XTS_ATSD_AVA  1
>  #define XTS_ATSD_STAT 2
>  
> -static int mmio_launch_invalidate(struct npu *npu, unsigned long launch,
> - unsigned long va)
> +static void mmio_launch_invalidate(struct mmio_atsd_reg *mmio_atsd_reg,
> + unsigned long launch, unsigned long va)
>  {
> - int mmio_atsd_reg;
> -
> - do {
> - mmio_atsd_reg = get_mmio_atsd_reg(npu);
> - cpu_relax();
> - } while (mmio_atsd_reg < 0);
> + struct npu *npu = mmio_atsd_reg->npu;
> + int reg = mmio_atsd_reg->reg;
>  
>   __raw_writeq(cpu_to_be64(va),
> - npu->mmio_atsd_regs[mmio_atsd_reg] + XTS_ATSD_AVA);
> + npu->mmio_atsd_regs[reg] + XTS_ATSD_AVA);
>   eieio();
> - __raw_writeq(cpu_to_be64(launch), npu->mmio_atsd_regs[mmio_atsd_reg]);
> -
> - return mmio_atsd_reg;
> + __raw_writeq(cpu_to_be64(launch), npu->mmio_atsd_regs[reg]);
>  }
>  
> -static int mmio_invalidate_pid(struct npu *npu, unsigned long pid, bool 
> flush)
> +static void mmio_invalidate_pid(struct mmio_atsd_reg 
> mmio_atsd_reg[NV_MAX_NPUS],
> + unsigned long pid, bool flush)
>  {
> + int i;
>   unsigned long launch;
>  
> - /* IS set to invalidate matching PID */
> - launch = PPC_BIT(12);
> + for (i = 0; i <= max_npu2_index; i++) {
> + if (mmio_atsd_reg[i].reg < 0)
> + continue;
>  
> - /* PRS set to process-scoped */
> - launch |= PPC_BIT(13);
> + /* IS set to invalidate matching PID */
> + launch = PPC_BIT(12);
>  
> - /* AP */
> - launch |= (u64) mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
> + /* PRS set to process-scoped */
> + launch |= PPC_BIT(13);
>  
> - /* PID */
> - launch |= pid << PPC_BITLSHIFT(38);
> + /* AP */
> + launch |= (u64)
> + mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
>  
> - /* No flush */
> - launch |= !flush << PPC_BITLSHIFT(39);
> + /* PID */
> + launch |= pid << PPC_BITLSHIFT(38);
>  
> - /* Invalidating the entire process doesn't use a va */
> - return mmio_launch_invalidate(npu, launch, 0);
> + /* No flush */
> + launch |= !flush << PPC_BITLSHIFT(39);
> +
> + /* Invalidating the entire process doesn't use a va */
> + mmio_launch_invalidate(&mmio_atsd_reg[i], launch, 0);
> + }
>  }
>  
> -static int mmio_invalidate_va(struct npu *npu, unsigned long va,
> - unsigned long pid, bool flush)
> +static void mmio_invalidate_va(struct mmio_atsd_reg 
> mmio_atsd_reg[NV_MAX_NPUS],
> + unsigned long va, unsigned long pid, bool flush)
>  {
> + int i;
>   unsigned lo

Re: [PATCH v2 04/13] lpfc: Add push-to-adapter support to sli4

2018-02-12 Thread Michael Ellerman
Johannes Thumshirn  writes:

> On Wed, Feb 07, 2018 at 10:51:57AM +0100, Johannes Thumshirn wrote:
>> > +  /* Enable combined writes for DPP aperture */
>> > +  pg_addr = (unsigned long)(wq->dpp_regaddr) & PAGE_MASK;
>> > +#ifdef CONFIG_X86
>> > +  rc = set_memory_wc(pg_addr, 1);
>> > +  if (rc) {
>> > +  lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
>> > +  "3272 Cannot setup Combined "
>> > +  "Write on WQ[%d] - disable 
>> > DPP\n",
>> > +  wq->queue_id);
>> > +  phba->cfg_enable_dpp = 0;
>> > +  }
>> > +#else
>> > +  phba->cfg_enable_dpp = 0;
>> > +#endif
>> > +  } else
>> > +  wq->db_regaddr = phba->sli4_hba.WQDBregaddr;
>> 
>> I don't really like the set_memory_wc() call here. Neither do I like the 
>> ifdef
>> CONFIG_X86 special casing.
>> 
>> If you really need write combining, can't you at least use ioremap_wc()?
>
> Coming back to this again (after talking to our ARM/POWER folks internally).
> Is this really x86 specific here? I know there are servers with other 
> architectures
> using lpfcs out there.
>
> I _think_ write combining should be possible on other architectures (that have
> PCIe and aren't dead) as well.
>
> The ioremap_wc() I suggested is probably wrong.
>
> So can you please revisit this? I CCed Mark and Michael, maybe they can help
> here.

I'm not much of an I/O guy, but I do know that on powerpc we don't
implement set_memory_wc(). So if you're using that then you do need the
ifdef.

I couldn't easily find the rest of this thread, so I'm not sure if
ioremap_wc() is an option. We do implement that and on modern CPUs at
least it will give you something that's not just a plain uncached
mapping.

cheers


Re: [PATCH v2] powerpc/npu: Cleanup MMIO ATSD flushing

2018-02-12 Thread Balbir Singh
On Wed, Feb 7, 2018 at 2:14 PM, Alistair Popple  wrote:
> On Tuesday, 16 January 2018 3:15:05 PM AEDT Alistair Popple wrote:
>> Thanks Balbir, one question below. I have no way of testing this at present 
>> but
>> it looks ok to me. Thanks!
>
> The below are more future optimisations once we can test. So in the meantime:
>
> Acked-by: Alistair Popple 

@aneesh can you please look at this? @mpe can we pick this up if there
are no objections?

Balbir Singh


Re: samples/seccomp/ broken when cross compiling s390, ppc allyesconfig

2018-02-12 Thread Kees Cook
On Mon, Feb 12, 2018 at 7:25 PM, Michael Ellerman  wrote:
> Michal Hocko  writes:
>> Hi,
>> my build test machinery chokes on samples/seccomp when cross compiling
>> s390 and ppc64 allyesconfig. This has been the case for quite some
>> time already but I never found time to look at the problem and report
>> it. It seems this is not new issue and similar thing happend for
>> MIPS e9107f88c985 ("samples/seccomp/Makefile: do not build tests if
>> cross-compiling for MIPS").
>>
>> The build logs are attached.
>>
>> What is the best way around this? Should we simply skip compilation on
>> cross compile or is actually anybody relying on that? Or should I simply
>> disable it for s390 and ppc?
>
> The whole thing seems very confused. It's not building for the target,
> it's building for the host, ie. the Makefile sets hostprogs-m and
> HOSTCFLAGS etc.
>
> So it can't possibly work with cross compiling as it's currently
> written.
>
> Either the Makefile needs some serious work to properly support cross
> compiling or it should just be disabled when cross compiling.

Hrm, yeah, the goal was to entirely disable cross compiling, but I
guess we didn't hit it with a hard enough hammer. :)

-Kees

-- 
Kees Cook
Pixel Security


[PATCH kernel] powerpc/npu: Do not try invalidating 32bit table when 64bit table is enabled

2018-02-12 Thread Alexey Kardashevskiy
GPUs and the corresponding NVLink bridges get different PEs as they have
separate translation validation entries (TVEs). We put these PEs to
the same IOMMU group so they cannot be passed through separately.
So the iommu_table_group_ops::set_window/unset_window for GPUs do set
tables to the NPU PEs as well which means that iommu_table's list of
attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked.
This list is used for TCE cache invalidation.

The problem is that NPU PE has just a single TVE and can be programmed
to point to 32bit or 64bit windows while GPU PE has two (as any other PCI
device). So we end up having an 32bit iommu_table struct linked to both
PEs even though only the 64bit TCE table cache can be invalidated on NPU.
And a relatively recent skiboot detects this and prints errors.

This changes GPU's iommu_table_group_ops::set_window/unset_window to make
sure that NPU PE is only linked to the table actually used by the hardware.
If there are two tables used by an IOMMU group, the NPU PE will use
the last programmed one which with the current use scenarios is expected
to be a 64bit one.

Signed-off-by: Alexey Kardashevskiy 
---

Do we need BUG_ON(IOMMU_TABLE_GROUP_MAX_TABLES != 2)?


This is an example for:

0004:04:00.0 3D: NVIDIA Corporation Device 1db1 (rev a1)
0006:00:00.0 Bridge: IBM Device 04ea (rev 01)
0006:00:00.1 Bridge: IBM Device 04ea (rev 01)

Before the patch (npu2_tce_kill messages are from skiboot):

pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000
pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000
pci 0004:04 : [PE# 00] Setting up window#1 800..800 
pg=1
pci 0006:00:00.0: [PE# 0d] Setting up window 800..800 
pg=1
NPU6: npu2_tce_kill: Unexpected TCE size (got 0x1000 expected 0x1)
NPU6: npu2_tce_kill: Unexpected TCE size (got 0x1000 expected 0x1)
NPU6: npu2_tce_kill: Unexpected TCE size (got 0x1000 expected 0x1)
NPU6: npu2_tce_kill: Unexpected TCE size (got 0x1000 expected 0x1)
NPU6: npu2_tce_kill: Unexpected TCE size (got 0x1000 expected 0x1)
...
pci 0004:04 : [PE# 00] Removing DMA window #0
pci 0006:00:00.0: [PE# 0d] Removing DMA window
pci 0004:04 : [PE# 00] Removing DMA window #1
pci 0006:00:00.0: [PE# 0d] Removing DMA window
pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000
pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000
pci 0004:04 : [PE# 00] Setting up window#1 800..800 
pg=1
pci 0006:00:00.0: [PE# 0d] Setting up window 800..800 
pg=1

After the patch (no errors here):

pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000
pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000
pci 0004:04 : [PE# 00] Setting up window#1 800..800 
pg=1
pci 0006:00:00.0: [PE# 0d] Removing DMA window
pci 0006:00:00.0: [PE# 0d] Setting up window 800..800 
pg=1
pci 0004:04 : [PE# 00] Removing DMA window #0
pci 0004:04 : [PE# 00] Removing DMA window #1
pci 0006:00:00.0: [PE# 0d] Removing DMA window
pci 0004:04 : [PE# 00] Setting up window#0 0..3fff pg=1000
pci 0006:00:00.0: [PE# 0d] Setting up window 0..3fff pg=1000
pci 0004:04 : [PE# 00] Setting up window#1 800..800 
pg=1
pci 0006:00:00.0: [PE# 0d] Removing DMA window
pci 0006:00:00.0: [PE# 0d] Setting up window 800..800 
pg=1
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 496e476..2f91815 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2681,14 +2681,23 @@ static struct pnv_ioda_pe *gpe_table_group_to_npe(
 static long pnv_pci_ioda2_npu_set_window(struct iommu_table_group *table_group,
int num, struct iommu_table *tbl)
 {
+   struct pnv_ioda_pe *npe = gpe_table_group_to_npe(table_group);
+   int num2 = (num == 0) ? 1 : 0;
long ret = pnv_pci_ioda2_set_window(table_group, num, tbl);
 
if (ret)
return ret;
 
-   ret = pnv_npu_set_window(gpe_table_group_to_npe(table_group), num, tbl);
-   if (ret)
+   if (table_group->tables[num2])
+   pnv_npu_unset_window(npe, num2);
+
+   ret = pnv_npu_set_window(npe, num, tbl);
+   if (ret) {
pnv_pci_ioda2_unset_window(table_group, num);
+   if (table_group->tables[num2])
+   pnv_npu_set_window(npe, num2,
+   table_group->tables[num2]);
+   }
 
return ret;
 }
@@ -2697,12 +2706,24 @@ static long pnv_pci_ioda2_npu_unset_window(
struct iommu_table_group *table_group,
int num)
 

[RFC] powerpc/radix/hotunplug: Atomically replace pte entries

2018-02-12 Thread Balbir Singh
The current approach uses stop machine for atomicity while removing
a smaller range from a larger mapping. For example, while trying
to hotunplug 256MiB from a 1GiB range, we split the mappings into
the next slower size (2MiB). This is done using stop machine. This
approach atomically replaces the pte entry by

a. Creating an array of smaller mappings
b. Ignoring the holes (the area to be hot-unplugged)
c. Atomically replacing the entry at the pud/pmd level

The code assumes that permissions in a linear mapping don't change
once set. The permissions are copied from the larger PTE to the
smaller PTE's based on this assumption.

Suggested-by: Michael Ellerman 
Signed-off-by: Balbir Singh 
---
 arch/powerpc/mm/pgtable-radix.c | 125 +---
 1 file changed, 91 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 17ae5c15a9e06..4b3642a9e8d13 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -124,6 +124,93 @@ int radix__map_kernel_page(unsigned long ea, unsigned long 
pa,
return 0;
 }
 
+static int replace_pte_entries(unsigned long start, unsigned long end,
+  unsigned long hole_start, unsigned long hole_end,
+  unsigned long map_page_size, pgprot_t flags)
+{
+   int i;
+   int rc = 0;
+   unsigned long addr, pa;
+
+   if (map_page_size == PUD_SIZE) {
+   pgd_t *pgdp;
+   pud_t *pudp;
+   pmd_t *pmdp, *new_pmdp;
+   unsigned long size = RADIX_PMD_TABLE_SIZE / sizeof(pmd_t);
+
+   pgdp = pgd_offset_k(start);
+   pudp = pud_offset(pgdp, start);
+
+   pmdp = pmd_alloc_one(&init_mm, start);
+   if (!pmdp) {
+   rc = 1;
+   goto done;
+   }
+
+   for (i = 0; i < size; i++) {
+   addr = start + i * PMD_SIZE;
+   new_pmdp = (pmd_t *)(pmdp + i);
+
+   if (addr >= hole_start ||
+   addr <= hole_end) {
+   *new_pmdp = __pmd(0ULL);
+   continue;
+   }
+
+   pa = __pa(addr);
+   *new_pmdp = pfn_pmd(pa >> PMD_SHIFT, flags);
+   *new_pmdp = pmd_mkhuge(*pmdp);
+   }
+
+   pud_populate(&init_mm, pudp, pmdp);
+   } else if (map_page_size == PMD_SIZE) {
+   pgd_t *pgdp;
+   pud_t *pudp;
+   pmd_t *pmdp;
+   pte_t *new_ptep, *ptep;
+   unsigned long size = RADIX_PTE_TABLE_SIZE / sizeof(pte_t);
+
+   pgdp = pgd_offset_k(start);
+   pudp = pud_offset(pgdp, start);
+   pmdp = pmd_offset(pudp, start);
+
+   ptep = pte_alloc_one(&init_mm, start);
+   if (!ptep) {
+   rc = 1;
+   goto done;
+   }
+
+   for (i = 0; i < size; i++) {
+   addr = start + i * PAGE_SIZE;
+   new_ptep = (pte_t *)(ptep + i);
+
+   if (addr >= hole_start ||
+   addr <= hole_end) {
+   *new_ptep = __pte(0ULL);
+   continue;
+   }
+
+   pa = __pa(addr);
+   *new_ptep = pfn_pte(pa >> PAGE_SHIFT, flags);
+   *new_ptep = __pte(pte_val(*new_ptep) | _PAGE_PTE);
+   }
+
+   pmd_populate_kernel(&init_mm, pmdp, ptep);
+   } else {
+   WARN_ONCE(1, "Unsupported mapping size to "
+"split %lx, ea %lx\n", map_page_size, start);
+   rc = 1;
+   }
+
+   smp_wmb();
+   if (rc == 0)
+   radix__flush_tlb_kernel_range(start, start + map_page_size);
+
+done:
+   return rc;
+
+}
+
 #ifdef CONFIG_STRICT_KERNEL_RWX
 void radix__change_memory_range(unsigned long start, unsigned long end,
unsigned long clear)
@@ -672,30 +759,6 @@ static void free_pmd_table(pmd_t *pmd_start, pud_t *pud)
pud_clear(pud);
 }
 
-struct change_mapping_params {
-   pte_t *pte;
-   unsigned long start;
-   unsigned long end;
-   unsigned long aligned_start;
-   unsigned long aligned_end;
-};
-
-static int stop_machine_change_mapping(void *data)
-{
-   struct change_mapping_params *params =
-   (struct change_mapping_params *)data;
-
-   if (!data)
-   return -1;
-
-   spin_unlock(&init_mm.page_table_lock);
-   pte_clear(&init_mm, params->aligned_start, params->pte);
-   create_physical_mapping(params->aligned_start, params->start);
-   create_physical_mapping(params->end, params->

Re: [PATCH] powerpc/npu-dma.c: Fix crash after __mmu_notifier_register failure

2018-02-12 Thread Alistair Popple
Thanks Mark, this will also fix the lack of cleanup OPAL call in the unlikely
case the kzalloc() fails.

Acked-By: Alistair Popple 

On Friday, 9 February 2018 7:20:06 PM AEDT Mark Hairgrove wrote:
> pnv_npu2_init_context wasn't checking the return code from
> __mmu_notifier_register. If  __mmu_notifier_register failed, the
> npu_context was still assigned to the mm and the caller wasn't given any
> indication that things went wrong. Later on pnv_npu2_destroy_context would
> be called, which in turn called mmu_notifier_unregister and dropped
> mm->mm_count without having incremented it in the first place. This led to
> various forms of corruption like mm use-after-free and mm double-free.
> 
> __mmu_notifier_register can fail with EINTR if a signal is pending, so
> this case can be frequent.
> 
> This patch calls opal_npu_destroy_context on the failure paths, and makes
> sure not to assign mm->context.npu_context until past the failure points.
> 
> Signed-off-by: Mark Hairgrove 
> ---
>  arch/powerpc/platforms/powernv/npu-dma.c |   32 +++--
>  1 files changed, 21 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
> b/arch/powerpc/platforms/powernv/npu-dma.c
> index f6cbc1a..48c73aa 100644
> --- a/arch/powerpc/platforms/powernv/npu-dma.c
> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
> @@ -677,6 +677,11 @@ struct npu_context *pnv_npu2_init_context(struct pci_dev 
> *gpdev,
>   /* No nvlink associated with this GPU device */
>   return ERR_PTR(-ENODEV);
>  
> + nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
> + if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
> + &nvlink_index)))
> + return ERR_PTR(-ENODEV);
> +
>   if (!mm || mm->context.id == 0) {
>   /*
>* Kernel thread contexts are not supported and context id 0 is
> @@ -704,25 +709,30 @@ struct npu_context *pnv_npu2_init_context(struct 
> pci_dev *gpdev,
>*/
>   npu_context = mm->context.npu_context;
>   if (!npu_context) {
> + rc = -ENOMEM;
>   npu_context = kzalloc(sizeof(struct npu_context), GFP_KERNEL);
> - if (!npu_context)
> - return ERR_PTR(-ENOMEM);
> + if (npu_context) {
> + kref_init(&npu_context->kref);
> + npu_context->mm = mm;
> + npu_context->mn.ops = &nv_nmmu_notifier_ops;
> + rc = __mmu_notifier_register(&npu_context->mn, mm);
> + }
> +
> + if (rc) {
> + kfree(npu_context);
> + opal_npu_destroy_context(nphb->opal_id, mm->context.id,
> + PCI_DEVID(gpdev->bus->number,
> + gpdev->devfn));
> + return ERR_PTR(rc);
> + }
>  
>   mm->context.npu_context = npu_context;
> - npu_context->mm = mm;
> - npu_context->mn.ops = &nv_nmmu_notifier_ops;
> - __mmu_notifier_register(&npu_context->mn, mm);
> - kref_init(&npu_context->kref);
>   } else {
> - kref_get(&npu_context->kref);
> + WARN_ON(!kref_get_unless_zero(&npu_context->kref));
>   }
>  
>   npu_context->release_cb = cb;
>   npu_context->priv = priv;
> - nvlink_dn = of_parse_phandle(npdev->dev.of_node, "ibm,nvlink", 0);
> - if (WARN_ON(of_property_read_u32(nvlink_dn, "ibm,npu-link-index",
> - &nvlink_index)))
> - return ERR_PTR(-ENODEV);
>   npu_context->npdev[npu->index][nvlink_index] = npdev;
>  
>   if (!nphb->npu.nmmu_flush) {
> 




[RFC][PATCH bpf v2 2/2] bpf: powerpc64: add JIT support for multi-function programs

2018-02-12 Thread Sandipan Das
This adds support for bpf-to-bpf function calls for the powerpc64
JIT compiler. After a round of the usual JIT passes, the offsets
to callee functions from __bpf_call_base are known. To update the
target addresses for the branch instructions associated with each
BPF_CALL, an extra pass is performed.

Since it is seen that the offsets may be as large as 64 bits for
powerpc64, we use the aux data associated with each caller to get
the correct branch target address rather than using the imm field
of the BPF_CALL instruction.

Signed-off-by: Sandipan Das 
---
v2: Use the off field of the instruction as an index for
aux->func to determine the start address of a callee
function.
---
 arch/powerpc/net/bpf_jit_comp64.c | 73 +--
 1 file changed, 63 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 0a34b0cec7b7..cf0d4e32aa52 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -290,7 +290,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32
 /* Assemble the body code between the prologue & epilogue */
 static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
  struct codegen_context *ctx,
- u32 *addrs)
+ u32 *addrs, bool extra_pass)
 {
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -746,11 +746,17 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
break;
 
/*
-* Call kernel helper
+* Call kernel helper or bpf function
 */
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
-   func = (u8 *) __bpf_call_base + imm;
+   if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
+   if (fp->aux->func && off < fp->aux->func_cnt)
+   func = (u8 *) 
fp->aux->func[off]->bpf_func;
+   else
+   return -EINVAL;
+   else
+   func = (u8 *) __bpf_call_base + imm;
 
/* Save skb pointer if we need to re-cache skb data */
if ((ctx->seen & SEEN_SKB) &&
@@ -970,6 +976,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
return 0;
 }
 
+struct powerpc64_jit_data {
+   struct bpf_binary_header *header;
+   u32 *addrs;
+   u8 *image;
+   u32 proglen;
+   struct codegen_context ctx;
+};
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
u32 proglen;
@@ -977,6 +991,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
+   struct powerpc64_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
@@ -984,6 +999,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
+   bool extra_pass = false;
 
if (!fp->jit_requested)
return org_fp;
@@ -997,7 +1013,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
fp = tmp_fp;
}
 
+   jit_data = fp->aux->jit_data;
+   if (!jit_data) {
+   jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+   if (!jit_data) {
+   fp = org_fp;
+   goto out;
+   }
+   fp->aux->jit_data = jit_data;
+   }
+
flen = fp->len;
+   addrs = jit_data->addrs;
+   if (addrs) {
+   cgctx = jit_data->ctx;
+   image = jit_data->image;
+   bpf_hdr = jit_data->header;
+   proglen = jit_data->proglen;
+   alloclen = proglen + FUNCTION_DESCR_SIZE;
+   extra_pass = true;
+   goto skip_init_ctx;
+   }
+
addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL) {
fp = org_fp;
@@ -1010,10 +1047,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*fp)
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+   if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
/* We hit something illegal or unsupported. */
fp = org_fp;
-   goto out;
+   goto out_addrs;
}
 
/*
@@ -1031,9 +1068,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
bpf_jit_fill_ill_insns);
if (!bpf_hdr) {
fp = org_fp;
- 

[RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-12 Thread Sandipan Das
The imm field of a bpf_insn is a signed 32-bit integer. For
JIT-ed bpf-to-bpf function calls, it stores the offset from
__bpf_call_base to the start of the callee function.

For some architectures, such as powerpc64, it was found that
this offset may be as large as 64 bits because of which this
cannot be accomodated in the imm field without truncation.

To resolve this, we additionally make aux->func within each
bpf_prog associated with the functions to point to the list
of all function addresses determined by the verifier.

We keep the value assigned to the off field of the bpf_insn
as a way to index into aux->func and also set aux->func_cnt
so that this can be used for performing basic upper bound
checks for the off field.

Signed-off-by: Sandipan Das 
---
v2: Make aux->func point to the list of functions determined
by the verifier rather than allocating a separate callee
list for each function.
---
 kernel/bpf/verifier.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..1c4d9cd485ed 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5288,11 +5288,25 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn->src_reg != BPF_PSEUDO_CALL)
continue;
subprog = insn->off;
-   insn->off = 0;
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
}
+
+   /* the offset to a callee function from __bpf_call_base
+* may be larger than what the 32 bit integer imm can
+* accomodate which will truncate the higher order bits
+*
+* to avoid this, we additionally utilize the aux data
+* of each function to point to a list of all function
+* addresses determined by the verifier
+*
+* the off field of the instruction provides the index
+* in this list where the start address of a function
+* is available
+*/
+   func[i]->aux->func = func;
+   func[i]->aux->func_cnt = env->subprog_cnt + 1;
}
for (i = 0; i <= env->subprog_cnt; i++) {
old_bpf_func = func[i]->bpf_func;
-- 
2.14.3



Re: samples/seccomp/ broken when cross compiling s390, ppc allyesconfig

2018-02-12 Thread Michael Ellerman
Michal Hocko  writes:
> Hi,
> my build test machinery chokes on samples/seccomp when cross compiling
> s390 and ppc64 allyesconfig. This has been the case for quite some
> time already but I never found time to look at the problem and report
> it. It seems this is not new issue and similar thing happend for
> MIPS e9107f88c985 ("samples/seccomp/Makefile: do not build tests if
> cross-compiling for MIPS").
>
> The build logs are attached.
>
> What is the best way around this? Should we simply skip compilation on
> cross compile or is actually anybody relying on that? Or should I simply
> disable it for s390 and ppc?

The whole thing seems very confused. It's not building for the target,
it's building for the host, ie. the Makefile sets hostprogs-m and
HOSTCFLAGS etc.

So it can't possibly work with cross compiling as it's currently
written.

Either the Makefile needs some serious work to properly support cross
compiling or it should just be disabled when cross compiling.

cheers


[PATCH] powerpc/npu-dma.c: Fix deadlock in mmio_invalidate

2018-02-12 Thread Alistair Popple
When sending TLB invalidates to the NPU we need to send extra flushes due
to a hardware issue. The original implementation would lock the all the
ATSD MMIO registers sequentially before unlocking and relocking each of
them sequentially to do the extra flush.

This introduced a deadlock as it is possible for one thread to hold one
ATSD register whilst waiting for another register to be freed while the
other thread is holding that register waiting for the one in the first
thread to be freed.

For example if there are two threads and two ATSD registers:

Thread AThread B
Acquire 1
Acquire 2
Release 1   Acquire 1
Wait 1  Wait 2

Both threads will be stuck waiting to acquire a register resulting in an
RCU stall warning or soft lockup.

This patch solves the deadlock by refactoring the code to ensure registers
are not released between flushes and to ensure all registers are either
acquired or released together and in order.

Fixes: bbd5ff50afff ("powerpc/powernv/npu-dma: Add explicit flush when sending 
an ATSD")
Signed-off-by: Alistair Popple 
---

Michael,

This should probalby go to stable as well, although it's bigger than the 100
line limit mentioned in the stable kernel rules.

- Alistair

 arch/powerpc/platforms/powernv/npu-dma.c | 195 +--
 1 file changed, 109 insertions(+), 86 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/npu-dma.c 
b/arch/powerpc/platforms/powernv/npu-dma.c
index fb0a6dee9bce..5746b456dfa4 100644
--- a/arch/powerpc/platforms/powernv/npu-dma.c
+++ b/arch/powerpc/platforms/powernv/npu-dma.c
@@ -408,6 +408,11 @@ struct npu_context {
void *priv;
 };
 
+struct mmio_atsd_reg {
+   struct npu *npu;
+   int reg;
+};
+
 /*
  * Find a free MMIO ATSD register and mark it in use. Return -ENOSPC
  * if none are available.
@@ -433,79 +438,83 @@ static void put_mmio_atsd_reg(struct npu *npu, int reg)
 #define XTS_ATSD_AVA  1
 #define XTS_ATSD_STAT 2
 
-static int mmio_launch_invalidate(struct npu *npu, unsigned long launch,
-   unsigned long va)
+static void mmio_launch_invalidate(struct mmio_atsd_reg *mmio_atsd_reg,
+   unsigned long launch, unsigned long va)
 {
-   int mmio_atsd_reg;
-
-   do {
-   mmio_atsd_reg = get_mmio_atsd_reg(npu);
-   cpu_relax();
-   } while (mmio_atsd_reg < 0);
+   struct npu *npu = mmio_atsd_reg->npu;
+   int reg = mmio_atsd_reg->reg;
 
__raw_writeq(cpu_to_be64(va),
-   npu->mmio_atsd_regs[mmio_atsd_reg] + XTS_ATSD_AVA);
+   npu->mmio_atsd_regs[reg] + XTS_ATSD_AVA);
eieio();
-   __raw_writeq(cpu_to_be64(launch), npu->mmio_atsd_regs[mmio_atsd_reg]);
-
-   return mmio_atsd_reg;
+   __raw_writeq(cpu_to_be64(launch), npu->mmio_atsd_regs[reg]);
 }
 
-static int mmio_invalidate_pid(struct npu *npu, unsigned long pid, bool flush)
+static void mmio_invalidate_pid(struct mmio_atsd_reg 
mmio_atsd_reg[NV_MAX_NPUS],
+   unsigned long pid, bool flush)
 {
+   int i;
unsigned long launch;
 
-   /* IS set to invalidate matching PID */
-   launch = PPC_BIT(12);
+   for (i = 0; i <= max_npu2_index; i++) {
+   if (mmio_atsd_reg[i].reg < 0)
+   continue;
 
-   /* PRS set to process-scoped */
-   launch |= PPC_BIT(13);
+   /* IS set to invalidate matching PID */
+   launch = PPC_BIT(12);
 
-   /* AP */
-   launch |= (u64) mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
+   /* PRS set to process-scoped */
+   launch |= PPC_BIT(13);
 
-   /* PID */
-   launch |= pid << PPC_BITLSHIFT(38);
+   /* AP */
+   launch |= (u64)
+   mmu_get_ap(mmu_virtual_psize) << PPC_BITLSHIFT(17);
 
-   /* No flush */
-   launch |= !flush << PPC_BITLSHIFT(39);
+   /* PID */
+   launch |= pid << PPC_BITLSHIFT(38);
 
-   /* Invalidating the entire process doesn't use a va */
-   return mmio_launch_invalidate(npu, launch, 0);
+   /* No flush */
+   launch |= !flush << PPC_BITLSHIFT(39);
+
+   /* Invalidating the entire process doesn't use a va */
+   mmio_launch_invalidate(&mmio_atsd_reg[i], launch, 0);
+   }
 }
 
-static int mmio_invalidate_va(struct npu *npu, unsigned long va,
-   unsigned long pid, bool flush)
+static void mmio_invalidate_va(struct mmio_atsd_reg mmio_atsd_reg[NV_MAX_NPUS],
+   unsigned long va, unsigned long pid, bool flush)
 {
+   int i;
unsigned long launch;
 
-   /* IS set to invalidate target VA */
-   launch = 0;
+   for (i = 0; i <= max_npu2_index; i++) {
+   if (mmio_atsd_reg[i].reg < 0)
+   continue;
+
+   /* IS set to invalidate target VA */
+   launch = 0;
 
-

Re: [PATCH 2/4] powerpc/vas: Fix cleanup when VAS is not configured

2018-02-12 Thread Michael Ellerman
Sukadev Bhattiprolu  writes:

> Michael Ellerman [m...@ellerman.id.au] wrote:
>> Sukadev Bhattiprolu  writes:
>> 
>> > When VAS is not configured in the system, make sure to remove
>> > the VAS debugfs directory and unregister the platform driver.
>> >
>> > Signed-off-by: Sukadev Bhattiprolu 
>> ...
>> > diff --git a/arch/powerpc/platforms/powernv/vas.c 
>> > b/arch/powerpc/platforms/powernv/vas.c
>> > index aebbe95..f83e27d8 100644
>> > --- a/arch/powerpc/platforms/powernv/vas.c
>> > +++ b/arch/powerpc/platforms/powernv/vas.c
>> > @@ -169,8 +169,11 @@ static int __init vas_init(void)
>> >found++;
>> >}
>> >  
>> > -  if (!found)
>> > +  if (!found) {
>> > +  platform_driver_unregister(&vas_driver);
>> > +  vas_cleanup_dbgdir();
>> >return -ENODEV;
>> > +  }
>> 
>> The better patch would be to move the call to vas_init_dbgdir() down
>> here, where we know we have successfully registered the driver.
>
> Well, when VAS is configured, init_vas_instance() expects the top level
> "vas" debugfs dir to already be setup.

OK.

> We could have each init_vas_instance() assume it is the first and
> unconditionally call vas_init_dbgdir(). vas_init_dbgdir() could make
> sure to initialize only once.

Yeah that looks like a good solution.

cheers


Re: [PATCH] cxl: Enable NORST bit in PSL_DEBUG register for PSL9

2018-02-12 Thread Andrew Donnellan

On 09/02/18 15:09, Vaibhav Jain wrote:

We enable the NORST bit by default for debug afu images to prevent
reset of AFU trace-data on a PCI link drop. For production AFU images
this bit is always ignored and PSL gets reconfigured anyways thereby
resetting the trace data. So setting this bit for non-debug images
doesn't have any impact.

Signed-off-by: Vaibhav Jain 


Acked-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2018-02-12 Thread Michael Ellerman
Andrew Morton  writes:

> On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal  
> wrote:
>
>> >
>> > So I don't think that the above test result means that errors are properly
>> > handled, and the proposed patch should help for arm64.
>> 
>> Although, the deviation of pud_huge() avoids a kernel crash the code
>> would be easier to maintain and reason about if arm64 helpers are
>> consistent with expectations by core code.
>> 
>> I'll look to update the arm64 helpers once this patch gets merged. But
>> it would be helpful if there was a clear expression of semantics for
>> pud_huge() for various cases. Is there any version that can be used as
>> reference?
>
> Is that an ack or tested-by?
>
> Mike keeps plaintively asking the powerpc developers to take a look,
> but they remain steadfastly in hiding.

Cc'ing linuxppc-dev is always a good idea :)

> Folks, this patch fixes a BUG and is marked for -stable.  Can we please
> prioritize it?

It's not crashing for me (on 4.16-rc1):

  # ./huge-poison 
  Poisoning page...once
  Poisoning page...once again
  madvise: Bad address

And I guess the above is the expected behaviour?

Looking at the function trace it looks like the 2nd madvise is going
down reasonable code paths, but I don't know for sure:

  8)   |  SyS_madvise() {
  8)   |capable() {
  8)   |  ns_capable_common() {
  8)   0.094 us|cap_capable();
  8)   0.516 us|  }
  8)   1.052 us|}
  8)   |get_user_pages_fast() {
  8)   0.354 us|  gup_pgd_range();
  8)   |  get_user_pages_unlocked() {
  8)   0.050 us|down_read();
  8)   |__get_user_pages() {
  8)   |  find_extend_vma() {
  8)   |find_vma() {
  8)   0.148 us|  vmacache_find();
  8)   0.622 us|}
  8)   1.064 us|  }
  8)   0.028 us|  arch_vma_access_permitted();
  8)   |  follow_hugetlb_page() {
  8)   |huge_pte_offset() {
  8)   0.128 us|  __find_linux_pte();
  8)   0.580 us|}
  8)   0.048 us|_raw_spin_lock();
  8)   |hugetlb_fault() {
  8)   |  huge_pte_offset() {
  8)   0.034 us|__find_linux_pte();
  8)   0.434 us|  }
  8)   0.028 us|  is_hugetlb_entry_migration();
  8)   0.032 us|  is_hugetlb_entry_hwpoisoned();
  8)   2.118 us|}
  8)   4.940 us|  }
  8)   7.468 us|}
  8)   0.056 us|up_read();
  8)   8.722 us|  }
  8) + 10.264 us   |}
  8) + 12.212 us   |  }


cheers


Re: [PATCH] cxl: Remove function write_timebase_ctrl_psl9() for PSL9

2018-02-12 Thread Andrew Donnellan

On 09/02/18 15:10, Vaibhav Jain wrote:

For PSL9 the time-base enable bit has moved from PSL_TB_CTLSTAT
register to PSL_CONTROL register. Hence we don't need an sl_ops
implementation for 'write_timebase_ctrl' for PSL9.

Hence this patch removes function write_timebase_ctrl_psl9() and its
references from the code.

Signed-off-by: Vaibhav Jain 


Acked-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: pata-macio WARNING at dmam_alloc_coherent+0xec/0x110

2018-02-12 Thread Christoph Hellwig
On Mon, Feb 12, 2018 at 10:52:46PM +0200, Meelis Roos wrote:
> I tested 4.16-rc1 on my PowerMac G4 and got the following warning from 
> macio pata driver. Since pata-macio has no recent changes, dma-mapping.h 
> changes seem to be related.

Thje are, as they add just that warning.  But the root cause looks
older, and that is that the macio bus doesn't seem to to set the
dma coherent mask, which could cause all kinds of hidden issues.

I'm travelling right now, but unless someone beats me to it I'll look
into a patch to properly set the coherent mask ASAP.


Re: [PATCH 1/2] powerpc/kdump: Add missing optional dummy functions

2018-02-12 Thread Guenter Roeck
On Tue, Feb 13, 2018 at 10:01:57AM +1100, Balbir Singh wrote:
> On Tue, Feb 13, 2018 at 9:34 AM, Guenter Roeck  wrote:
> > If KEXEC_CORE is not enabled, PowerNV builds fail as follows.
> >
> > arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
> > arch/powerpc/platforms/powernv/smp.c:236:4: error:
> > implicit declaration of function 'crash_ipi_callback'
> >
> > Add dummy function calls, similar to kdump_in_progress(), to solve the
> > problem.
> >
> > Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel ...")
> > Cc: Balbir Singh 
> > Cc: Michael Ellerman 
> > Cc: Nicholas Piggin 
> > Signed-off-by: Guenter Roeck 
> > ---
> 
> Thanks for working on this.
> 
> You've added two functions, I understand the crash_send_ipi() bits
> that I broke. Looks like crash_ipi_callback broken without KEXEC_CORE?
> 

If I recall correctly, 4145f358644b introduced the call to crash_ipi_callback().
After I declared the dummy function for that, I got an error about the missing
crash_send_ipi(). I didn't spend more time on it but just added another dummy
function. It may well be that another problem was introduced in the same time
frame. On the other side, maybe I got it all wrong, and my patch is not worth
the computer it was written on.

Thanks,
Guenter


Re: [PATCH 1/2] powerpc/kdump: Add missing optional dummy functions

2018-02-12 Thread Balbir Singh
On Tue, Feb 13, 2018 at 9:34 AM, Guenter Roeck  wrote:
> If KEXEC_CORE is not enabled, PowerNV builds fail as follows.
>
> arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
> arch/powerpc/platforms/powernv/smp.c:236:4: error:
> implicit declaration of function 'crash_ipi_callback'
>
> Add dummy function calls, similar to kdump_in_progress(), to solve the
> problem.
>
> Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel ...")
> Cc: Balbir Singh 
> Cc: Michael Ellerman 
> Cc: Nicholas Piggin 
> Signed-off-by: Guenter Roeck 
> ---

Thanks for working on this.

You've added two functions, I understand the crash_send_ipi() bits
that I broke. Looks like crash_ipi_callback broken without KEXEC_CORE?

I am going to test these patches, for now

Acked-by: Balbir Singh 

Balbir Singh


[PATCH 5/5] mtd: Stop updating erase_info->state and calling mtd_erase_callback()

2018-02-12 Thread Boris Brezillon
MTD users are no longer checking erase_info->state to determine if the
erase operation failed or succeeded. Moreover, mtd_erase_callback() is
now a NOP.

We can safely get rid of all mtd_erase_callback() calls and all
erase_info->state assignments. While at it, get rid of the
erase_info->state field, all MTD_ERASE_XXX definitions and the
mtd_erase_callback() function.

Signed-off-by: Boris Brezillon 
---
 drivers/mtd/chips/cfi_cmdset_0001.c  | 16 ++--
 drivers/mtd/chips/cfi_cmdset_0002.c  | 26 +++---
 drivers/mtd/chips/cfi_cmdset_0020.c  |  3 ---
 drivers/mtd/chips/map_ram.c  |  2 --
 drivers/mtd/devices/bcm47xxsflash.c  |  9 +
 drivers/mtd/devices/block2mtd.c  |  7 +--
 drivers/mtd/devices/docg3.c  | 16 ++--
 drivers/mtd/devices/lart.c   |  4 
 drivers/mtd/devices/mtd_dataflash.c  |  4 
 drivers/mtd/devices/mtdram.c |  2 --
 drivers/mtd/devices/phram.c  |  2 --
 drivers/mtd/devices/pmc551.c |  2 --
 drivers/mtd/devices/powernv_flash.c  | 11 ++-
 drivers/mtd/devices/slram.c  |  2 --
 drivers/mtd/devices/spear_smi.c  |  3 ---
 drivers/mtd/devices/sst25l.c |  3 ---
 drivers/mtd/devices/st_spi_fsm.c |  4 
 drivers/mtd/lpddr/lpddr2_nvm.c   | 10 ++
 drivers/mtd/lpddr/lpddr_cmds.c   |  2 --
 drivers/mtd/mtdconcat.c  |  1 -
 drivers/mtd/mtdcore.c|  6 ++
 drivers/mtd/mtdpart.c|  5 -
 drivers/mtd/nand/nand_base.c | 15 +++
 drivers/mtd/onenand/onenand_base.c   | 17 -
 drivers/mtd/spi-nor/spi-nor.c|  3 ---
 drivers/mtd/ubi/gluebi.c |  3 ---
 drivers/net/ethernet/sfc/falcon/mtd.c| 11 +--
 drivers/net/ethernet/sfc/mtd.c   | 11 +--
 drivers/staging/goldfish/goldfish_nand.c |  3 ---
 include/linux/mtd/mtd.h  |  9 -
 30 files changed, 20 insertions(+), 192 deletions(-)

diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c 
b/drivers/mtd/chips/cfi_cmdset_0001.c
index 5e1b68cbcd0a..d4c07b85f18e 100644
--- a/drivers/mtd/chips/cfi_cmdset_0001.c
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c
@@ -1993,20 +1993,8 @@ static int __xipram do_erase_oneblock(struct map_info 
*map, struct flchip *chip,
 
 static int cfi_intelext_erase_varsize(struct mtd_info *mtd, struct erase_info 
*instr)
 {
-   unsigned long ofs, len;
-   int ret;
-
-   ofs = instr->addr;
-   len = instr->len;
-
-   ret = cfi_varsize_frob(mtd, do_erase_oneblock, ofs, len, NULL);
-   if (ret)
-   return ret;
-
-   instr->state = MTD_ERASE_DONE;
-   mtd_erase_callback(instr);
-
-   return 0;
+   return cfi_varsize_frob(mtd, do_erase_oneblock, instr->addr,
+   instr->len, NULL);
 }
 
 static void cfi_intelext_sync (struct mtd_info *mtd)
diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c 
b/drivers/mtd/chips/cfi_cmdset_0002.c
index 56aa6b75213d..668e2cbc155b 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -2415,20 +2415,8 @@ static int __xipram do_erase_oneblock(struct map_info 
*map, struct flchip *chip,
 
 static int cfi_amdstd_erase_varsize(struct mtd_info *mtd, struct erase_info 
*instr)
 {
-   unsigned long ofs, len;
-   int ret;
-
-   ofs = instr->addr;
-   len = instr->len;
-
-   ret = cfi_varsize_frob(mtd, do_erase_oneblock, ofs, len, NULL);
-   if (ret)
-   return ret;
-
-   instr->state = MTD_ERASE_DONE;
-   mtd_erase_callback(instr);
-
-   return 0;
+   return cfi_varsize_frob(mtd, do_erase_oneblock, instr->addr,
+   instr->len, NULL);
 }
 
 
@@ -2436,7 +2424,6 @@ static int cfi_amdstd_erase_chip(struct mtd_info *mtd, 
struct erase_info *instr)
 {
struct map_info *map = mtd->priv;
struct cfi_private *cfi = map->fldrv_priv;
-   int ret = 0;
 
if (instr->addr != 0)
return -EINVAL;
@@ -2444,14 +2431,7 @@ static int cfi_amdstd_erase_chip(struct mtd_info *mtd, 
struct erase_info *instr)
if (instr->len != mtd->size)
return -EINVAL;
 
-   ret = do_erase_chip(map, &cfi->chips[0]);
-   if (ret)
-   return ret;
-
-   instr->state = MTD_ERASE_DONE;
-   mtd_erase_callback(instr);
-
-   return 0;
+   return do_erase_chip(map, &cfi->chips[0]);
 }
 
 static int do_atmel_lock(struct map_info *map, struct flchip *chip,
diff --git a/drivers/mtd/chips/cfi_cmdset_0020.c 
b/drivers/mtd/chips/cfi_cmdset_0020.c
index 7d342965f392..7b7658a05036 100644
--- a/drivers/mtd/chips/cfi_cmdset_0020.c
+++ b/drivers/mtd/chips/cfi_cmdset_0020.c
@@ -965,9 +965,6 @@ static int cfi_staa_erase_varsize(struct mtd_info *mtd,
}
}
 
-   instr->state = MTD_

[PATCH 3/5] mtd: Stop assuming mtd_erase() is asynchronous

2018-02-12 Thread Boris Brezillon
None of the mtd->_erase() implementations work in an asynchronous manner,
so let's simplify MTD users that call mtd_erase(). All they need to do
is check the value returned by mtd_erase() and assume that != 0 means
failure.

Signed-off-by: Boris Brezillon 
---
 drivers/mtd/devices/bcm47xxsflash.c |  3 --
 drivers/mtd/ftl.c   | 51 
 drivers/mtd/inftlmount.c|  5 +-
 drivers/mtd/mtdblock.c  | 20 
 drivers/mtd/mtdchar.c   | 33 +
 drivers/mtd/mtdconcat.c | 48 ++-
 drivers/mtd/mtdcore.c   |  8 ++--
 drivers/mtd/mtdoops.c   | 19 
 drivers/mtd/mtdpart.c   |  2 -
 drivers/mtd/mtdswap.c   | 32 -
 drivers/mtd/nftlmount.c |  4 +-
 drivers/mtd/rfd_ftl.c   | 92 +++--
 drivers/mtd/sm_ftl.c| 18 
 drivers/mtd/sm_ftl.h|  4 --
 drivers/mtd/tests/mtd_test.c|  4 --
 drivers/mtd/tests/speedtest.c   |  6 ---
 drivers/mtd/ubi/io.c| 35 --
 fs/jffs2/erase.c| 36 ++-
 include/linux/mtd/mtd.h |  2 -
 19 files changed, 52 insertions(+), 370 deletions(-)

diff --git a/drivers/mtd/devices/bcm47xxsflash.c 
b/drivers/mtd/devices/bcm47xxsflash.c
index e2bd81817df4..6b84947cfbea 100644
--- a/drivers/mtd/devices/bcm47xxsflash.c
+++ b/drivers/mtd/devices/bcm47xxsflash.c
@@ -95,9 +95,6 @@ static int bcm47xxsflash_erase(struct mtd_info *mtd, struct 
erase_info *erase)
else
erase->state = MTD_ERASE_DONE;
 
-   if (erase->callback)
-   erase->callback(erase);
-
return err;
 }
 
diff --git a/drivers/mtd/ftl.c b/drivers/mtd/ftl.c
index 664d206a4cbe..fcf9907e7987 100644
--- a/drivers/mtd/ftl.c
+++ b/drivers/mtd/ftl.c
@@ -140,12 +140,6 @@ typedef struct partition_t {
 #define XFER_PREPARED  0x03
 #define XFER_FAILED0x04
 
-/**/
-
-
-static void ftl_erase_callback(struct erase_info *done);
-
-
 /*==
 
 Scan_header() checks to see if a memory region contains an FTL
@@ -349,17 +343,19 @@ static int erase_xfer(partition_t *part,
 return -ENOMEM;
 
 erase->mtd = part->mbd.mtd;
-erase->callback = ftl_erase_callback;
 erase->addr = xfer->Offset;
 erase->len = 1 << part->header.EraseUnitSize;
-erase->priv = (u_long)part;
 
 ret = mtd_erase(part->mbd.mtd, erase);
+if (!ret) {
+   xfer->state = XFER_ERASED;
+   xfer->EraseCount++;
+} else {
+   xfer->state = XFER_FAILED;
+   pr_notice("ftl_cs: erase failed: err = %d\n", ret);
+}
 
-if (!ret)
-   xfer->EraseCount++;
-else
-   kfree(erase);
+kfree(erase);
 
 return ret;
 } /* erase_xfer */
@@ -371,37 +367,6 @@ static int erase_xfer(partition_t *part,
 
 ==*/
 
-static void ftl_erase_callback(struct erase_info *erase)
-{
-partition_t *part;
-struct xfer_info_t *xfer;
-int i;
-
-/* Look up the transfer unit */
-part = (partition_t *)(erase->priv);
-
-for (i = 0; i < part->header.NumTransferUnits; i++)
-   if (part->XferInfo[i].Offset == erase->addr) break;
-
-if (i == part->header.NumTransferUnits) {
-   printk(KERN_NOTICE "ftl_cs: internal error: "
-  "erase lookup failed!\n");
-   return;
-}
-
-xfer = &part->XferInfo[i];
-if (erase->state == MTD_ERASE_DONE)
-   xfer->state = XFER_ERASED;
-else {
-   xfer->state = XFER_FAILED;
-   printk(KERN_NOTICE "ftl_cs: erase failed: state = %d\n",
-  erase->state);
-}
-
-kfree(erase);
-
-} /* ftl_erase_callback */
-
 static int prepare_xfer(partition_t *part, int i)
 {
 erase_unit_header_t header;
diff --git a/drivers/mtd/inftlmount.c b/drivers/mtd/inftlmount.c
index 8d6bb189ea8e..0f47be4834d8 100644
--- a/drivers/mtd/inftlmount.c
+++ b/drivers/mtd/inftlmount.c
@@ -393,9 +393,10 @@ int INFTL_formatblock(struct INFTLrecord *inftl, int block)
   mark only the failed block in the bbt. */
for (physblock = 0; physblock < inftl->EraseSize;
 physblock += instr->len, instr->addr += instr->len) {
-   mtd_erase(inftl->mbd.mtd, instr);
+   int ret;
 
-   if (instr->state == MTD_ERASE_FAILED) {
+   ret = mtd_erase(inftl->mbd.mtd, instr);
+   if (ret) {
printk(KERN_WARNING "INFTL: error while formatting 
block %d\n",
block);
goto fail;
diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c
index bb4c14f83c75..7b2b7f651181 100644
--- a/drivers/mtd/mtdblock.c
+++ b/drivers/mtd/mtdblock.c
@@ -55,48 +55,28 @@

[PATCH 4/5] mtd: Unconditionally update ->fail_addr and ->addr in part_erase()

2018-02-12 Thread Boris Brezillon
->fail_addr and ->addr can be updated no matter the result of
parent->_erase(), we just need to remove the code doing the same thing
in mtd_erase_callback() to avoid adjusting those fields twice.

Note that this can be done because all MTD users have been converted to
not pass an erase_info->callback() and are thus only taking the
->addr_fail and ->addr fields into account after part_erase() has
returned.

While we're at it, get rid of the erase_info->mtd field which was only
needed to let mtd_erase_callback() get the partition device back.

Signed-off-by: Boris Brezillon 
---
 drivers/mtd/ftl.c |  1 -
 drivers/mtd/inftlmount.c  |  3 ---
 drivers/mtd/mtdblock.c|  1 -
 drivers/mtd/mtdchar.c |  1 -
 drivers/mtd/mtdconcat.c   |  1 -
 drivers/mtd/mtdoops.c |  1 -
 drivers/mtd/mtdpart.c | 16 
 drivers/mtd/mtdswap.c |  2 --
 drivers/mtd/nand/nand_base.c  |  1 -
 drivers/mtd/nand/nand_bbt.c   |  1 -
 drivers/mtd/nftlmount.c   |  1 -
 drivers/mtd/rfd_ftl.c |  1 -
 drivers/mtd/sm_ftl.c  |  1 -
 drivers/mtd/tests/mtd_test.c  |  1 -
 drivers/mtd/tests/speedtest.c |  1 -
 drivers/mtd/ubi/io.c  |  1 -
 fs/jffs2/erase.c  |  1 -
 include/linux/mtd/mtd.h   |  3 ++-
 18 files changed, 6 insertions(+), 32 deletions(-)

diff --git a/drivers/mtd/ftl.c b/drivers/mtd/ftl.c
index fcf9907e7987..0a6adfaec7b5 100644
--- a/drivers/mtd/ftl.c
+++ b/drivers/mtd/ftl.c
@@ -342,7 +342,6 @@ static int erase_xfer(partition_t *part,
 if (!erase)
 return -ENOMEM;
 
-erase->mtd = part->mbd.mtd;
 erase->addr = xfer->Offset;
 erase->len = 1 << part->header.EraseUnitSize;
 
diff --git a/drivers/mtd/inftlmount.c b/drivers/mtd/inftlmount.c
index 0f47be4834d8..aab4f68bd36f 100644
--- a/drivers/mtd/inftlmount.c
+++ b/drivers/mtd/inftlmount.c
@@ -208,8 +208,6 @@ static int find_boot_record(struct INFTLrecord *inftl)
if (ip->Reserved0 != ip->firstUnit) {
struct erase_info *instr = &inftl->instr;
 
-   instr->mtd = inftl->mbd.mtd;
-
/*
 *  Most likely this is using the
 *  undocumented qiuck mount feature.
@@ -385,7 +383,6 @@ int INFTL_formatblock(struct INFTLrecord *inftl, int block)
   _first_? */
 
/* Use async erase interface, test return code */
-   instr->mtd = inftl->mbd.mtd;
instr->addr = block * inftl->EraseSize;
instr->len = inftl->mbd.mtd->erasesize;
/* Erase one physical eraseblock at a time, even though the NAND api
diff --git a/drivers/mtd/mtdblock.c b/drivers/mtd/mtdblock.c
index 7b2b7f651181..a5b1933c0490 100644
--- a/drivers/mtd/mtdblock.c
+++ b/drivers/mtd/mtdblock.c
@@ -65,7 +65,6 @@ static int erase_write (struct mtd_info *mtd, unsigned long 
pos,
/*
 * First, let's erase the flash block.
 */
-   erase.mtd = mtd;
erase.addr = pos;
erase.len = len;
 
diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c
index 2beb22dd6bbb..c06b33f80e75 100644
--- a/drivers/mtd/mtdchar.c
+++ b/drivers/mtd/mtdchar.c
@@ -726,7 +726,6 @@ static int mtdchar_ioctl(struct file *file, u_int cmd, 
u_long arg)
erase->addr = einfo32.start;
erase->len = einfo32.length;
}
-   erase->mtd = mtd;
 
ret = mtd_erase(mtd, erase);
kfree(erase);
diff --git a/drivers/mtd/mtdconcat.c b/drivers/mtd/mtdconcat.c
index caa09bf6e572..93c47e56d9d8 100644
--- a/drivers/mtd/mtdconcat.c
+++ b/drivers/mtd/mtdconcat.c
@@ -427,7 +427,6 @@ static int concat_erase(struct mtd_info *mtd, struct 
erase_info *instr)
erase->len = length;
 
length -= erase->len;
-   erase->mtd = subdev;
if ((err = mtd_erase(subdev, erase))) {
/* sanity check: should never happen since
 * block alignment has been checked above */
diff --git a/drivers/mtd/mtdoops.c b/drivers/mtd/mtdoops.c
index 028ded59297b..9f25111fd559 100644
--- a/drivers/mtd/mtdoops.c
+++ b/drivers/mtd/mtdoops.c
@@ -94,7 +94,6 @@ static int mtdoops_erase_block(struct mtdoops_context *cxt, 
int offset)
int ret;
int page;
 
-   erase.mtd = mtd;
erase.addr = offset;
erase.len = mtd->erasesize;
 
diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
index ae1206633d9d..1c07a6f0dfe5 100644
--- a/drivers/mtd/mtdpart.c
+++ b/drivers/mtd/mtdpart.c
@@ -205,23 +205,15 @@ static int part_erase(struct mtd_info *mtd, struct 
erase_info *instr)
 
instr->addr += part->offset;
ret = part->parent->_erase(part->parent, instr);
-   if (ret) {
-   if (instr->fail_addr != MTD_FAIL_ADDR_UNKNOWN)
-   

[PATCH 0/5] mtd: Simplify erase handling

2018-02-12 Thread Boris Brezillon
Hello,

This series aims at simplifying erase handling both in MTD drivers and
MTD users code.

Historically, the erase operation has been designed to be asynchronous,
which, in theory, is a good thing since erasing a block usually takes
longer that reading/writing to a flash. In practice, all drivers are
implementing ->_erase() in a synchronous manner. Moreover, both drivers
and users are inconsistently updating/checking the erase_info fields.

In order to simplify things, let's assume ->_erase() is and will always
be synchronous. This also make error code checking more consistent and
allows us to get rid of a few hundred lines of code.

Regards,

Boris

Boris Brezillon (5):
  mtd: Initialize ->fail_addr early in mtd_erase()
  mtd: Get rid of unused fields in struct erase_info
  mtd: Stop assuming mtd_erase() is asynchronous
  mtd: Unconditionally update ->fail_addr and ->addr in part_erase()
  mtd: Stop updating erase_info->state and calling mtd_erase_callback()

 drivers/mtd/chips/cfi_cmdset_0001.c  | 16 +-
 drivers/mtd/chips/cfi_cmdset_0002.c  | 26 ++---
 drivers/mtd/chips/cfi_cmdset_0020.c  |  3 --
 drivers/mtd/chips/map_ram.c  |  2 -
 drivers/mtd/devices/bcm47xxsflash.c  | 12 +
 drivers/mtd/devices/block2mtd.c  |  7 +--
 drivers/mtd/devices/docg3.c  | 16 +-
 drivers/mtd/devices/lart.c   |  4 --
 drivers/mtd/devices/mtd_dataflash.c  |  4 --
 drivers/mtd/devices/mtdram.c |  2 -
 drivers/mtd/devices/phram.c  |  2 -
 drivers/mtd/devices/pmc551.c |  2 -
 drivers/mtd/devices/powernv_flash.c  | 11 +---
 drivers/mtd/devices/slram.c  |  2 -
 drivers/mtd/devices/spear_smi.c  |  3 --
 drivers/mtd/devices/sst25l.c |  3 --
 drivers/mtd/devices/st_spi_fsm.c |  4 --
 drivers/mtd/ftl.c| 52 +++---
 drivers/mtd/inftlmount.c |  8 ++-
 drivers/mtd/lpddr/lpddr2_nvm.c   | 10 +---
 drivers/mtd/lpddr/lpddr_cmds.c   |  2 -
 drivers/mtd/mtdblock.c   | 21 
 drivers/mtd/mtdchar.c| 34 +---
 drivers/mtd/mtdconcat.c  | 48 +
 drivers/mtd/mtdcore.c| 17 +++---
 drivers/mtd/mtdoops.c| 20 ---
 drivers/mtd/mtdpart.c| 23 ++--
 drivers/mtd/mtdswap.c| 34 
 drivers/mtd/nand/nand_base.c | 16 ++
 drivers/mtd/nand/nand_bbt.c  |  1 -
 drivers/mtd/nftlmount.c  |  5 +-
 drivers/mtd/onenand/onenand_base.c   | 17 --
 drivers/mtd/rfd_ftl.c| 93 ++--
 drivers/mtd/sm_ftl.c | 19 ---
 drivers/mtd/sm_ftl.h |  4 --
 drivers/mtd/spi-nor/spi-nor.c|  3 --
 drivers/mtd/tests/mtd_test.c |  5 --
 drivers/mtd/tests/speedtest.c|  7 ---
 drivers/mtd/ubi/gluebi.c |  3 --
 drivers/mtd/ubi/io.c | 36 -
 drivers/net/ethernet/sfc/falcon/mtd.c| 11 +---
 drivers/net/ethernet/sfc/mtd.c   | 11 +---
 drivers/staging/goldfish/goldfish_nand.c |  3 --
 fs/jffs2/erase.c | 37 ++---
 include/linux/mtd/mtd.h  | 19 +--
 45 files changed, 79 insertions(+), 599 deletions(-)

-- 
2.14.1



[PATCH 1/5] mtd: Initialize ->fail_addr early in mtd_erase()

2018-02-12 Thread Boris Brezillon
mtd_erase() can return an error before ->fail_addr is initialized to
MTD_FAIL_ADDR_UNKNOWN. Move this initialization at the very beginning
of the function.

Signed-off-by: Boris Brezillon 
---
 drivers/mtd/mtdcore.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index a1c94526fb88..c87859ff338b 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -953,6 +953,8 @@ EXPORT_SYMBOL_GPL(__put_mtd_device);
  */
 int mtd_erase(struct mtd_info *mtd, struct erase_info *instr)
 {
+   instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
+
if (!mtd->erasesize || !mtd->_erase)
return -ENOTSUPP;
 
@@ -961,7 +963,6 @@ int mtd_erase(struct mtd_info *mtd, struct erase_info 
*instr)
if (!(mtd->flags & MTD_WRITEABLE))
return -EROFS;
 
-   instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
if (!instr->len) {
instr->state = MTD_ERASE_DONE;
mtd_erase_callback(instr);
-- 
2.14.1



[PATCH 2/5] mtd: Get rid of unused fields in struct erase_info

2018-02-12 Thread Boris Brezillon
Some fields are not used by MTD drivers, users or core code. Moreover,
those fields are not documented, so get rid of them to avoid any
confusion.

Signed-off-by: Boris Brezillon 
---
 include/linux/mtd/mtd.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
index 205ededccc60..2a407dc9beaa 100644
--- a/include/linux/mtd/mtd.h
+++ b/include/linux/mtd/mtd.h
@@ -48,14 +48,9 @@ struct erase_info {
uint64_t addr;
uint64_t len;
uint64_t fail_addr;
-   u_long time;
-   u_long retries;
-   unsigned dev;
-   unsigned cell;
void (*callback) (struct erase_info *self);
u_long priv;
u_char state;
-   struct erase_info *next;
 };
 
 struct mtd_erase_region_info {
-- 
2.14.1



Re: KVM compile error

2018-02-12 Thread Christian Zigotzky
Hello Michael,

I compiled the RC1 of kernel 4.16 today. Unfortunately the issue with KVM still 
exists.
I get the error 'label out defined but not used' (see error messages below).

Link to the rc1 kernel config without KVM: 
http://www.xenosoft.de/cyrus-4.16-rc1.config

Link to the git kernel config with KVM: 
http://www.xenosoft.de/cyrus-4.16-alpha10.config

Thanks,
Christian

Sent from my iPhone

> On 12. Feb 2018, at 13:04, Christian Zigotzky  wrote:
> 
> It‘s only an info. I tried to compile the latest git version yesterday and I 
> got this error. I will try to compile the RC1 today and test if this error 
> still exists.
> 
> Cheers,
> Christian
> 
> Sent from my iPhone
> 
>> On 12. Feb 2018, at 12:08, Michael Ellerman  wrote:
>> 
>> Christian Zigotzky  writes:
>> 
>>> Just for info: KVM doesn’t compile currently.
>>> 
>>> Error messages:
>>> 
>>> CC  arch/powerpc/kvm/powerpc.o
>>> arch/powerpc/kvm/powerpc.c: In function 'kvm_arch_vcpu_ioctl_run':
>>> arch/powerpc/kvm/powerpc.c:1611:1: error: label 'out' defined but not used 
>>> [-Werror=unused-label]
>>> out:
>>> ^
>>> cc1: all warnings being treated as errors
>> 
>> I don't see this, which compiler/config/commit is that?
>> 
>> cheers
> 


Re: Build regressions/improvements in v4.16-rc1

2018-02-12 Thread James Hogan
On Mon, Feb 12, 2018 at 11:28:32AM +0100, Geert Uytterhoeven wrote:
> On Mon, Feb 12, 2018 at 11:17 AM, Geert Uytterhoeven
>  wrote:
> > Below is the list of build error/warning regressions/improvements in
> > v4.16-rc1[1] compared to v4.15[2].
> >
> > Summarized:
> >   - build errors: +13/-5
> >   - build warnings: +1653/-1537
> >
> > Note that there may be false regressions, as some logs are incomplete.
> > Still, they're build errors/warnings.
> >
> > Happy fixing! ;-)
> >
> > Thanks to the linux-next team for providing the build service.
> >
> > [1] 
> > http://kisskb.ellerman.id.au/kisskb/head/7928b2cbe55b2a410a0f5c1f154610059c57b1b2/
> >  (all 273 configs)
> > [2] 
> > http://kisskb.ellerman.id.au/kisskb/head/d8a5b80568a9cb66810e75b182018e9edb68e8ff/
> >  (271 out of 273 configs)
> >
> >
> > *** ERRORS ***
...
> >   + /home/kisskb/slave/src/drivers/net/ethernet/intel/i40e/i40e_ethtool.c: 
> > error: implicit declaration of function 'cmpxchg64' 
> > [-Werror=implicit-function-declaration]:  => 4443:6, 4443:2
> 
> mips{,el}-allmodconfig

FYI I reported this here:
https://lkml.kernel.org/r/20180207150907.GB5092@saruman

but I haven't seen any action on it yet.

Cheers
James


signature.asc
Description: Digital signature


Re: [PATCH] headers: untangle kmemleak.h from mm.h

2018-02-12 Thread Randy Dunlap
On 02/12/2018 04:28 AM, Michael Ellerman wrote:
> Randy Dunlap  writes:
> 
>> From: Randy Dunlap 
>>
>> Currently  #includes  for no obvious
>> reason. It looks like it's only a convenience, so remove kmemleak.h
>> from slab.h and add  to any users of kmemleak_*
>> that don't already #include it.
>> Also remove  from source files that do not use it.
>>
>> This is tested on i386 allmodconfig and x86_64 allmodconfig. It
>> would be good to run it through the 0day bot for other $ARCHes.
>> I have neither the horsepower nor the storage space for the other
>> $ARCHes.
>>
>> [slab.h is the second most used header file after module.h; kernel.h
>> is right there with slab.h. There could be some minor error in the
>> counting due to some #includes having comments after them and I
>> didn't combine all of those.]
>>
>> This is Lingchi patch #1 (death by a thousand cuts, applied to kernel
>> header files).
>>
>> Signed-off-by: Randy Dunlap 
> 
> I threw it at a random selection of configs and so far the only failures
> I'm seeing are:
> 
>   lib/test_firmware.c:134:2: error: implicit declaration of function 'vfree' 
> [-Werror=implicit-function-declaration]   
>
>   lib/test_firmware.c:620:25: error: implicit declaration of function 
> 'vzalloc' [-Werror=implicit-function-declaration]
>   lib/test_firmware.c:620:2: error: implicit declaration of function 
> 'vzalloc' [-Werror=implicit-function-declaration]
>   security/integrity/digsig.c:146:2: error: implicit declaration of function 
> 'vfree' [-Werror=implicit-function-declaration]
> 
> Full results trickling in here, not all the failures there are caused by
> this patch, ie. some configs are broken in mainline:
> 
>   http://kisskb.ellerman.id.au/kisskb/head/13396/

That's very useful, thanks.

I'll send a few patches for those.

-- 
~Randy


[PATCH 2/2] powerpc/pseries: Declare optional dummy function for find_and_online_cpu_nid

2018-02-12 Thread Guenter Roeck
Commit e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with
memoryless nodes") adds an unconditional call to find_and_online_cpu_nid(),
which is only declared if CONFIG_PPC_SPLPAR is enabled. This results in
the following build error if this is not the case.

arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
undefined reference to `.find_and_online_cpu_nid'

Follow the guideline provided by similar functions and provide a dummy
function if CONFIG_PPC_SPLPAR is not enabled. This also moves the external
function declaration into an include file where it should be.

Fixes: e67e02a544e9 ("powerpc/pseries: Fix cpu hotplug crash with ...")
Cc: Michael Bringmann 
Cc: Michael Ellerman 
Cc: Nathan Fontenot 
Signed-off-by: Guenter Roeck 
---
 arch/powerpc/include/asm/topology.h  | 5 +
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index 88187c285c70..52815982436f 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -82,6 +82,7 @@ static inline int numa_update_cpu_topology(bool cpus_locked)
 extern int start_topology_update(void);
 extern int stop_topology_update(void);
 extern int prrn_is_enabled(void);
+extern int find_and_online_cpu_nid(int cpu);
 #else
 static inline int start_topology_update(void)
 {
@@ -95,6 +96,10 @@ static inline int prrn_is_enabled(void)
 {
return 0;
 }
+static inline int find_and_online_cpu_nid(int cpu)
+{
+   return 0;
+}
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_NEED_MULTIPLE_NODES)
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index dceb51454d8d..f5c6a8cd2926 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -340,8 +340,6 @@ static void pseries_remove_processor(struct device_node *np)
cpu_maps_update_done();
 }
 
-extern int find_and_online_cpu_nid(int cpu);
-
 static int dlpar_online_cpu(struct device_node *dn)
 {
int rc = 0;
-- 
2.7.4



[PATCH 1/2] powerpc/kdump: Add missing optional dummy functions

2018-02-12 Thread Guenter Roeck
If KEXEC_CORE is not enabled, PowerNV builds fail as follows.

arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self':
arch/powerpc/platforms/powernv/smp.c:236:4: error:
implicit declaration of function 'crash_ipi_callback'

Add dummy function calls, similar to kdump_in_progress(), to solve the
problem.

Fixes: 4145f358644b ("powernv/kdump: Fix cases where the kdump kernel ...")
Cc: Balbir Singh 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Signed-off-by: Guenter Roeck 
---
 arch/powerpc/include/asm/kexec.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 9dcbfa6bbb91..d8b1e8e7e035 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -140,6 +140,12 @@ static inline bool kdump_in_progress(void)
return false;
 }
 
+static inline void crash_ipi_callback(struct pt_regs *regs) { }
+
+static inline void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
+{
+}
+
 #endif /* CONFIG_KEXEC_CORE */
 #endif /* ! __ASSEMBLY__ */
 #endif /* __KERNEL__ */
-- 
2.7.4



Re: [PATCH 5/5] mtd: Stop updating erase_info->state and calling mtd_erase_callback()

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:11 CET schrieb Boris Brezillon:
> MTD users are no longer checking erase_info->state to determine if the
> erase operation failed or succeeded. Moreover, mtd_erase_callback() is
> now a NOP.
> 
> We can safely get rid of all mtd_erase_callback() calls and all
> erase_info->state assignments. While at it, get rid of the
> erase_info->state field, all MTD_ERASE_XXX definitions and the
> mtd_erase_callback() function.
> 
> Signed-off-by: Boris Brezillon 

Reviewed-by: Richard Weinberger 

Thanks,
//richard



usleep_range without a range

2018-02-12 Thread Joe Perches
scheduling can generally be better when these values are
not identical.  Perhaps these ranges should be expanded.

$ git grep -P -n "usleep_range\s*\(\s*([\w\.\>\-]+)\s*,\s*\1\s*\)"
drivers/clk/ux500/clk-sysctrl.c:45: 
usleep_range(clk->enable_delay_us, clk->enable_delay_us);
drivers/cpufreq/pmac64-cpufreq.c:140:   usleep_range(1000, 1000);
drivers/cpufreq/pmac64-cpufreq.c:239:   usleep_range(1, 1); /* should 
be faster , to fix */
drivers/cpufreq/pmac64-cpufreq.c:284:   usleep_range(500, 500);
drivers/media/i2c/smiapp/smiapp-core.c:1228:usleep_range(1000, 1000);
drivers/media/i2c/smiapp/smiapp-core.c:1235:usleep_range(1000, 1000);
drivers/media/i2c/smiapp/smiapp-core.c:1240:usleep_range(sleep, sleep);
drivers/media/i2c/smiapp/smiapp-core.c:1387:usleep_range(5000, 5000);
drivers/media/i2c/smiapp/smiapp-quirk.c:205:usleep_range(2000, 2000);
drivers/media/i2c/smiapp/smiapp-regs.c:279: usleep_range(2000, 
2000);
drivers/power/supply/ab8500_fg.c:643:   usleep_range(100, 100);
drivers/staging/rtl8192u/r819xU_phy.c:180:  usleep_range(1000, 1000);
drivers/staging/rtl8192u/r819xU_phy.c:736:  
usleep_range(1000, 1000);
drivers/staging/rtl8192u/r819xU_phy.c:740:  
usleep_range(1000, 1000);
sound/soc/codecs/ab8500-codec.c:1065:   
usleep_range(AB8500_ANC_SM_DELAY, AB8500_ANC_SM_DELAY);
sound/soc/codecs/ab8500-codec.c:1068:   
usleep_range(AB8500_ANC_SM_DELAY, AB8500_ANC_SM_DELAY);




Re: [PATCH 4/5] mtd: Unconditionally update ->fail_addr and ->addr in part_erase()

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:10 CET schrieb Boris Brezillon:
> ->fail_addr and ->addr can be updated no matter the result of
> parent->_erase(), we just need to remove the code doing the same thing
> in mtd_erase_callback() to avoid adjusting those fields twice.
> 
> Note that this can be done because all MTD users have been converted to
> not pass an erase_info->callback() and are thus only taking the
> ->addr_fail and ->addr fields into account after part_erase() has
> returned.
> 
> While we're at it, get rid of the erase_info->mtd field which was only
> needed to let mtd_erase_callback() get the partition device back.
> 
> Signed-off-by: Boris Brezillon 

Reviewed-by: Richard Weinberger 

Thanks,
//richard


[PATCH v6 17/17] ASoC: fsl_ssi: Use ssi->streams instead of reading register

2018-02-12 Thread Nicolin Chen
Since ssi->streams is being updated along with SCR register and
its SSIEN bit, it's simpler to use it instead.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 5bc67ad..0823b08 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -803,11 +803,6 @@ static int fsl_ssi_hw_params(struct snd_pcm_substream 
*substream,
unsigned int sample_size = params_width(hw_params);
u32 wl = SSI_SxCCR_WL(sample_size);
int ret;
-   u32 scr;
-   int enabled;
-
-   regmap_read(regs, REG_SSI_SCR, &scr);
-   enabled = scr & SSI_SCR_SSIEN;
 
/*
 * SSI is properly configured if it is enabled and running in
@@ -815,7 +810,7 @@ static int fsl_ssi_hw_params(struct snd_pcm_substream 
*substream,
 * that should set separate configurations for STCCR and SRCCR
 * despite running in the synchronous mode.
 */
-   if (enabled && ssi->synchronous)
+   if (ssi->streams && ssi->synchronous)
return 0;
 
if (fsl_ssi_is_i2s_master(ssi)) {
-- 
2.1.4



[PATCH v6 16/17] ASoC: fsl_ssi: Move DT related code to a separate probe()

2018-02-12 Thread Nicolin Chen
This patch cleans up probe() function by moving all Device Tree
related code into a separate function. It allows the probe() to
be Device Tree independent. This will be very useful for future
integration of imx-ssi driver which has similar functionalities
while exists only because it supports non-DT cases.

This patch also moves symmetric_channels of AC97 from the probe
to the structure snd_soc_dai_driver for simplification.

Additionally, since PowerPC and AC97 use the same pdev pointer
to register a platform device, this patch also unifies related
code.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 219 +++-
 1 file changed, 124 insertions(+), 95 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index b58fabe..5bc67ad 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -239,8 +239,12 @@ struct fsl_ssi_soc_data {
  *
  * @fiq_params: FIQ stream filtering parameters
  *
- * @pdev: Pointer to pdev when using fsl-ssi as sound card (ppc only)
- *TODO: Should be replaced with simple-sound-card
+ * @card_pdev: Platform_device pointer to register a sound card for PowerPC or
+ * to register a CODEC platform device for AC97
+ * @card_name: Platform_device name to register a sound card for PowerPC or
+ * to register a CODEC platform device for AC97
+ * @card_idx: The index of SSI to register a sound card for PowerPC or
+ *to register a CODEC platform device for AC97
  *
  * @dbg_stats: Debugging statistics
  *
@@ -285,7 +289,9 @@ struct fsl_ssi {
 
struct imx_pcm_fiq_params fiq_params;
 
-   struct platform_device *pdev;
+   struct platform_device *card_pdev;
+   char card_name[32];
+   u32 card_idx;
 
struct fsl_ssi_dbg dbg_stats;
 
@@ -1134,6 +1140,7 @@ static const struct snd_soc_component_driver 
fsl_ssi_component = {
 
 static struct snd_soc_dai_driver fsl_ssi_ac97_dai = {
.bus_control = true,
+   .symmetric_channels = 1,
.probe = fsl_ssi_dai_probe,
.playback = {
.stream_name = "AC97 Playback",
@@ -1291,9 +1298,7 @@ static void make_lowercase(char *s)
 static int fsl_ssi_imx_probe(struct platform_device *pdev,
 struct fsl_ssi *ssi, void __iomem *iomem)
 {
-   struct device_node *np = pdev->dev.of_node;
struct device *dev = &pdev->dev;
-   u32 dmas[4];
int ret;
 
/* Backward compatible for a DT without ipg clock name assigned */
@@ -1327,14 +1332,8 @@ static int fsl_ssi_imx_probe(struct platform_device 
*pdev,
ssi->dma_params_tx.addr = ssi->ssi_phys + REG_SSI_STX0;
ssi->dma_params_rx.addr = ssi->ssi_phys + REG_SSI_SRX0;
 
-   /* Set to dual FIFO mode according to the SDMA sciprt */
-   ret = of_property_read_u32_array(np, "dmas", dmas, 4);
-   if (ssi->use_dma && !ret && dmas[2] == IMX_DMATYPE_SSI_DUAL) {
-   ssi->use_dual_fifo = true;
-   /*
-* Use even numbers to avoid channel swap due to SDMA
-* script design
-*/
+   /* Use even numbers to avoid channel swap due to SDMA script design */
+   if (ssi->use_dual_fifo) {
ssi->dma_params_tx.maxburst &= ~0x1;
ssi->dma_params_rx.maxburst &= ~0x1;
}
@@ -1375,41 +1374,109 @@ static void fsl_ssi_imx_clean(struct platform_device 
*pdev, struct fsl_ssi *ssi)
clk_disable_unprepare(ssi->clk);
 }
 
-static int fsl_ssi_probe(struct platform_device *pdev)
+static int fsl_ssi_probe_from_dt(struct fsl_ssi *ssi)
 {
-   struct fsl_ssi *ssi;
-   int ret = 0;
-   struct device_node *np = pdev->dev.of_node;
-   struct device *dev = &pdev->dev;
+   struct device *dev = ssi->dev;
+   struct device_node *np = dev->of_node;
const struct of_device_id *of_id;
const char *p, *sprop;
const __be32 *iprop;
-   struct resource *res;
-   void __iomem *iomem;
-   char name[64];
-   struct regmap_config regconfig = fsl_ssi_regconfig;
+   u32 dmas[4];
+   int ret;
 
of_id = of_match_device(fsl_ssi_ids, dev);
if (!of_id || !of_id->data)
return -EINVAL;
 
-   ssi = devm_kzalloc(dev, sizeof(*ssi), GFP_KERNEL);
-   if (!ssi)
-   return -ENOMEM;
-
ssi->soc = of_id->data;
-   ssi->dev = dev;
+
+   ret = of_property_match_string(np, "clock-names", "ipg");
+   /* Get error code if not found */
+   ssi->has_ipg_clk_name = ret >= 0;
 
/* Check if being used in AC97 mode */
sprop = of_get_property(np, "fsl,mode", NULL);
-   if (sprop) {
-   if (!strcmp(sprop, "ac97-slave"))
-   ssi->dai_fmt = FSLSSI_AC97_DAIFMT;
+   if (sprop && !strcmp(sprop, "ac97-slave")) {
+   ssi->

[PATCH v6 15/17] ASoC: fsl_ssi: Add bool synchronous to mark synchronous mode

2018-02-12 Thread Nicolin Chen
Using symmetric_rates in the cpu_dai_drv is a bit implicit,
so this patch adds a bool synchronous instead.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index ed9102d..b58fabe 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -217,6 +217,7 @@ struct fsl_ssi_soc_data {
  * @dai_fmt: DAI configuration this device is currently used with
  * @streams: Mask of current active streams: BIT(TX) and BIT(RX)
  * @i2s_net: I2S and Network mode configurations of SCR register
+ * @synchronous: Use synchronous mode - both of TX and RX use STCK and SFCK
  * @use_dma: DMA is used or FIQ with stream filter
  * @use_dual_fifo: DMA with support for dual FIFO mode
  * @has_ipg_clk_name: If "ipg" is in the clock name list of device tree
@@ -262,6 +263,7 @@ struct fsl_ssi {
unsigned int dai_fmt;
u8 streams;
u8 i2s_net;
+   bool synchronous;
bool use_dma;
bool use_dual_fifo;
bool has_ipg_clk_name;
@@ -673,7 +675,6 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream 
*substream,
bool tx2, tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
struct fsl_ssi *ssi = snd_soc_dai_get_drvdata(dai);
struct regmap *regs = ssi->regs;
-   int synchronous = ssi->cpu_dai_drv.symmetric_rates, ret;
u32 pm = 999, div2, psr, stccr, mask, afreq, factor, i;
unsigned long clkrate, baudrate, tmprate;
unsigned int slots = params_channels(hw_params);
@@ -681,6 +682,7 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream 
*substream,
u64 sub, savesub = 10;
unsigned int freq;
bool baudclk_is_used;
+   int ret;
 
/* Override slots and slot_width if being specifically set... */
if (ssi->slots)
@@ -759,7 +761,7 @@ static int fsl_ssi_set_bclk(struct snd_pcm_substream 
*substream,
mask = SSI_SxCCR_PM_MASK | SSI_SxCCR_DIV2 | SSI_SxCCR_PSR;
 
/* STCCR is used for RX in synchronous mode */
-   tx2 = tx || synchronous;
+   tx2 = tx || ssi->synchronous;
regmap_update_bits(regs, REG_SSI_SxCCR(tx2), mask, stccr);
 
if (!baudclk_is_used) {
@@ -807,7 +809,7 @@ static int fsl_ssi_hw_params(struct snd_pcm_substream 
*substream,
 * that should set separate configurations for STCCR and SRCCR
 * despite running in the synchronous mode.
 */
-   if (enabled && ssi->cpu_dai_drv.symmetric_rates)
+   if (enabled && ssi->synchronous)
return 0;
 
if (fsl_ssi_is_i2s_master(ssi)) {
@@ -839,7 +841,7 @@ static int fsl_ssi_hw_params(struct snd_pcm_substream 
*substream,
}
 
/* In synchronous mode, the SSI uses STCCR for capture */
-   tx2 = tx || ssi->cpu_dai_drv.symmetric_rates;
+   tx2 = tx || ssi->synchronous;
regmap_update_bits(regs, REG_SSI_SxCCR(tx2), SSI_SxCCR_WL_MASK, wl);
 
return 0;
@@ -968,7 +970,7 @@ static int _fsl_ssi_set_dai_fmt(struct fsl_ssi *ssi, 
unsigned int fmt)
srcr = strcr;
 
/* Set SYN mode and clear RXDIR bit when using SYN or AC97 mode */
-   if (ssi->cpu_dai_drv.symmetric_rates || fsl_ssi_is_ac97(ssi)) {
+   if (ssi->synchronous || fsl_ssi_is_ac97(ssi)) {
srcr &= ~SSI_SRCR_RXDIR;
scr |= SSI_SCR_SYN;
}
@@ -1456,6 +1458,7 @@ static int fsl_ssi_probe(struct platform_device *pdev)
if (!fsl_ssi_is_ac97(ssi)) {
ssi->cpu_dai_drv.symmetric_rates = 1;
ssi->cpu_dai_drv.symmetric_samplebits = 1;
+   ssi->synchronous = true;
}
 
ssi->cpu_dai_drv.symmetric_channels = 1;
-- 
2.1.4



[PATCH v6 14/17] ASoC: fsl_ssi: Clean up _fsl_ssi_set_dai_fmt()

2018-02-12 Thread Nicolin Chen
The _fsl_ssi_set_dai_fmt() is a helper function being called from
fsl_ssi_set_dai_fmt() as an ASoC operation and fsl_ssi_hw_init()
mainly for AC97 format initialization.

This patch cleans the _fsl_ssi_set_dai_fmt() in following ways:
* Removing *dev pointer in the parameters as it's included in the
  *ssi pointer of struct fsl_ssi.
* Using regmap_update_bits() instead of regmap_read() with masking
  the value manually.
* Moving baudclk check to the switch-case routine to skip the I2S
  master check. And moving SxCCR.DC settings after baudclk check.
* Adding format settings for SND_SOC_DAIFMT_AC97 like others.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 74 +++--
 1 file changed, 35 insertions(+), 39 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index dfb0da3..ed9102d 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -860,42 +860,31 @@ static int fsl_ssi_hw_free(struct snd_pcm_substream 
*substream,
return 0;
 }
 
-static int _fsl_ssi_set_dai_fmt(struct device *dev,
-   struct fsl_ssi *ssi, unsigned int fmt)
+static int _fsl_ssi_set_dai_fmt(struct fsl_ssi *ssi, unsigned int fmt)
 {
-   struct regmap *regs = ssi->regs;
-   u32 strcr = 0, stcr, srcr, scr, mask;
+   u32 strcr = 0, scr = 0, stcr, srcr, mask;
 
ssi->dai_fmt = fmt;
 
-   if (fsl_ssi_is_i2s_master(ssi) && IS_ERR(ssi->baudclk)) {
-   dev_err(dev, "missing baudclk for master mode\n");
-   return -EINVAL;
-   }
-
-   regmap_read(regs, REG_SSI_SCR, &scr);
-   scr &= ~(SSI_SCR_SYN | SSI_SCR_I2S_MODE_MASK);
/* Synchronize frame sync clock for TE to avoid data slipping */
scr |= SSI_SCR_SYNC_TX_FS;
 
-   mask = SSI_STCR_TXBIT0 | SSI_STCR_TFDIR | SSI_STCR_TXDIR |
-  SSI_STCR_TSCKP | SSI_STCR_TFSI | SSI_STCR_TFSL | SSI_STCR_TEFS;
-   regmap_read(regs, REG_SSI_STCR, &stcr);
-   regmap_read(regs, REG_SSI_SRCR, &srcr);
-   stcr &= ~mask;
-   srcr &= ~mask;
+   /* Set to default shifting settings: LSB_ALIGNED */
+   strcr |= SSI_STCR_TXBIT0;
 
/* Use Network mode as default */
ssi->i2s_net = SSI_SCR_NET;
switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) {
case SND_SOC_DAIFMT_I2S:
-   regmap_update_bits(regs, REG_SSI_STCCR,
-  SSI_SxCCR_DC_MASK, SSI_SxCCR_DC(2));
-   regmap_update_bits(regs, REG_SSI_SRCCR,
-  SSI_SxCCR_DC_MASK, SSI_SxCCR_DC(2));
switch (fmt & SND_SOC_DAIFMT_MASTER_MASK) {
-   case SND_SOC_DAIFMT_CBM_CFS:
case SND_SOC_DAIFMT_CBS_CFS:
+   if (IS_ERR(ssi->baudclk)) {
+   dev_err(ssi->dev,
+   "missing baudclk for master mode\n");
+   return -EINVAL;
+   }
+   /* fall through */
+   case SND_SOC_DAIFMT_CBM_CFS:
ssi->i2s_net |= SSI_SCR_I2S_MODE_MASTER;
break;
case SND_SOC_DAIFMT_CBM_CFM:
@@ -905,30 +894,34 @@ static int _fsl_ssi_set_dai_fmt(struct device *dev,
return -EINVAL;
}
 
+   regmap_update_bits(ssi->regs, REG_SSI_STCCR,
+  SSI_SxCCR_DC_MASK, SSI_SxCCR_DC(2));
+   regmap_update_bits(ssi->regs, REG_SSI_SRCCR,
+  SSI_SxCCR_DC_MASK, SSI_SxCCR_DC(2));
+
/* Data on rising edge of bclk, frame low, 1clk before data */
-   strcr |= SSI_STCR_TFSI | SSI_STCR_TSCKP |
-SSI_STCR_TXBIT0 | SSI_STCR_TEFS;
+   strcr |= SSI_STCR_TFSI | SSI_STCR_TSCKP | SSI_STCR_TEFS;
break;
case SND_SOC_DAIFMT_LEFT_J:
/* Data on rising edge of bclk, frame high */
-   strcr |= SSI_STCR_TXBIT0 | SSI_STCR_TSCKP;
+   strcr |= SSI_STCR_TSCKP;
break;
case SND_SOC_DAIFMT_DSP_A:
/* Data on rising edge of bclk, frame high, 1clk before data */
-   strcr |= SSI_STCR_TFSL | SSI_STCR_TSCKP |
-SSI_STCR_TXBIT0 | SSI_STCR_TEFS;
+   strcr |= SSI_STCR_TFSL | SSI_STCR_TSCKP | SSI_STCR_TEFS;
break;
case SND_SOC_DAIFMT_DSP_B:
/* Data on rising edge of bclk, frame high */
-   strcr |= SSI_STCR_TFSL | SSI_STCR_TSCKP | SSI_STCR_TXBIT0;
+   strcr |= SSI_STCR_TFSL | SSI_STCR_TSCKP;
break;
case SND_SOC_DAIFMT_AC97:
/* Data on falling edge of bclk, frame high, 1clk before data */
-   ssi->i2s_net |= SSI_SCR_I2S_MODE_NORMAL;
+

[PATCH v6 13/17] ASoC: fsl_ssi: Setup AC97 in fsl_ssi_hw_init()

2018-02-12 Thread Nicolin Chen
AC97 configures most of registers earlier to start a communication
with CODECs in order to successfully initialize CODEC. Currently,
_fsl_ssi_set_dai_fmt() and fsl_ssi_setup_ac97() are called to get
all SSI registers properly set.

Since now the driver has a fsl_ssi_hw_init() to handle all register
initial settings, this patch moves those register settings of AC97
to the fsl_ssi_hw_init() as well.

Meanwhile it applies _fsl_ssi_set_dai_fmt() call to AC97 only since
other formats would be configured via normal set_dai_fmt() directly.

This patch also adds fsl_ssi_hw_clean() to cleanup control bits for
AC97 in the platform remote() function.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 33 +++--
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 36f3d51..dfb0da3 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -987,9 +987,6 @@ static int _fsl_ssi_set_dai_fmt(struct device *dev,
regmap_write(regs, REG_SSI_SRCR, srcr);
regmap_write(regs, REG_SSI_SCR, scr);
 
-   if ((fmt & SND_SOC_DAIFMT_FORMAT_MASK) == SND_SOC_DAIFMT_AC97)
-   fsl_ssi_setup_ac97(ssi);
-
return 0;
 }
 
@@ -1255,10 +1252,34 @@ static int fsl_ssi_hw_init(struct fsl_ssi *ssi)
regmap_update_bits(ssi->regs, REG_SSI_SCR,
   SSI_SCR_TCH_EN, SSI_SCR_TCH_EN);
 
+   /* AC97 should start earlier to communicate with CODECs */
+   if (fsl_ssi_is_ac97(ssi)) {
+   _fsl_ssi_set_dai_fmt(ssi->dev, ssi, ssi->dai_fmt);
+   fsl_ssi_setup_ac97(ssi);
+   }
+
return 0;
 }
 
 /**
+ * Clear SSI registers
+ */
+static void fsl_ssi_hw_clean(struct fsl_ssi *ssi)
+{
+   /* Disable registers for AC97 */
+   if (fsl_ssi_is_ac97(ssi)) {
+   /* Disable TE and RE bits first */
+   regmap_update_bits(ssi->regs, REG_SSI_SCR,
+  SSI_SCR_TE | SSI_SCR_RE, 0);
+   /* Disable AC97 mode */
+   regmap_write(ssi->regs, REG_SSI_SACNT, 0);
+   /* Unset WAIT bits */
+   regmap_write(ssi->regs, REG_SSI_SOR, 0);
+   /* Disable SSI -- software reset */
+   regmap_update_bits(ssi->regs, REG_SSI_SCR, SSI_SCR_SSIEN, 0);
+   }
+}
+/**
  * Make every character in a string lower-case
  */
 static void make_lowercase(char *s)
@@ -1540,9 +1561,6 @@ static int fsl_ssi_probe(struct platform_device *pdev)
}
 
 done:
-   if (ssi->dai_fmt)
-   _fsl_ssi_set_dai_fmt(dev, ssi, ssi->dai_fmt);
-
/* Initially configures SSI registers */
fsl_ssi_hw_init(ssi);
 
@@ -1592,6 +1610,9 @@ static int fsl_ssi_remove(struct platform_device *pdev)
if (ssi->pdev)
platform_device_unregister(ssi->pdev);
 
+   /* Clean up SSI registers */
+   fsl_ssi_hw_clean(ssi);
+
if (ssi->soc->imx)
fsl_ssi_imx_clean(pdev, ssi);
 
-- 
2.1.4



[PATCH v6 12/17] ASoC: fsl_ssi: Move one-time configurations to probe()

2018-02-12 Thread Nicolin Chen
The probe() could handle some one-time configurations since
they will not be changed once being configured.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 39 ++-
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 7e15b30..36f3d51 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -865,7 +865,6 @@ static int _fsl_ssi_set_dai_fmt(struct device *dev,
 {
struct regmap *regs = ssi->regs;
u32 strcr = 0, stcr, srcr, scr, mask;
-   u8 wm;
 
ssi->dai_fmt = fmt;
 
@@ -874,8 +873,6 @@ static int _fsl_ssi_set_dai_fmt(struct device *dev,
return -EINVAL;
}
 
-   fsl_ssi_setup_regvals(ssi);
-
regmap_read(regs, REG_SSI_SCR, &scr);
scr &= ~(SSI_SCR_SYN | SSI_SCR_I2S_MODE_MASK);
/* Synchronize frame sync clock for TE to avoid data slipping */
@@ -990,16 +987,6 @@ static int _fsl_ssi_set_dai_fmt(struct device *dev,
regmap_write(regs, REG_SSI_SRCR, srcr);
regmap_write(regs, REG_SSI_SCR, scr);
 
-   wm = ssi->fifo_watermark;
-
-   regmap_write(regs, REG_SSI_SFCSR,
-SSI_SFCSR_TFWM0(wm) | SSI_SFCSR_RFWM0(wm) |
-SSI_SFCSR_TFWM1(wm) | SSI_SFCSR_RFWM1(wm));
-
-   if (ssi->use_dual_fifo)
-   regmap_update_bits(regs, REG_SSI_SCR,
-  SSI_SCR_TCH_EN, SSI_SCR_TCH_EN);
-
if ((fmt & SND_SOC_DAIFMT_FORMAT_MASK) == SND_SOC_DAIFMT_AC97)
fsl_ssi_setup_ac97(ssi);
 
@@ -1249,6 +1236,29 @@ static struct snd_ac97_bus_ops fsl_ssi_ac97_ops = {
 };
 
 /**
+ * Initialize SSI registers
+ */
+static int fsl_ssi_hw_init(struct fsl_ssi *ssi)
+{
+   u32 wm = ssi->fifo_watermark;
+
+   /* Initialize regvals */
+   fsl_ssi_setup_regvals(ssi);
+
+   /* Set watermarks */
+   regmap_write(ssi->regs, REG_SSI_SFCSR,
+SSI_SFCSR_TFWM0(wm) | SSI_SFCSR_RFWM0(wm) |
+SSI_SFCSR_TFWM1(wm) | SSI_SFCSR_RFWM1(wm));
+
+   /* Enable Dual FIFO mode */
+   if (ssi->use_dual_fifo)
+   regmap_update_bits(ssi->regs, REG_SSI_SCR,
+  SSI_SCR_TCH_EN, SSI_SCR_TCH_EN);
+
+   return 0;
+}
+
+/**
  * Make every character in a string lower-case
  */
 static void make_lowercase(char *s)
@@ -1533,6 +1543,9 @@ static int fsl_ssi_probe(struct platform_device *pdev)
if (ssi->dai_fmt)
_fsl_ssi_set_dai_fmt(dev, ssi, ssi->dai_fmt);
 
+   /* Initially configures SSI registers */
+   fsl_ssi_hw_init(ssi);
+
if (fsl_ssi_is_ac97(ssi)) {
u32 ssi_idx;
 
-- 
2.1.4



[PATCH v6 11/17] ASoC: fsl_ssi: Use snd_soc_init_dma_data instead

2018-02-12 Thread Nicolin Chen
Since there is a helper function, use it to help readability.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 00dfdc7..7e15b30 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1110,10 +1110,9 @@ static int fsl_ssi_dai_probe(struct snd_soc_dai *dai)
 {
struct fsl_ssi *ssi = snd_soc_dai_get_drvdata(dai);
 
-   if (ssi->soc->imx && ssi->use_dma) {
-   dai->playback_dma_data = &ssi->dma_params_tx;
-   dai->capture_dma_data = &ssi->dma_params_rx;
-   }
+   if (ssi->soc->imx && ssi->use_dma)
+   snd_soc_dai_init_dma_data(dai, &ssi->dma_params_tx,
+ &ssi->dma_params_rx);
 
return 0;
 }
-- 
2.1.4



[PATCH v6 10/17] ASoC: fsl_ssi: Set xFEN0 and xFEN1 together

2018-02-12 Thread Nicolin Chen
It'd be safer to enable both FIFOs for TX or RX at the same time.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 156f5132..00dfdc7 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -591,6 +591,11 @@ static void fsl_ssi_setup_regvals(struct fsl_ssi *ssi)
if (fsl_ssi_is_ac97(ssi))
vals[RX].scr = vals[TX].scr = 0;
 
+   if (ssi->use_dual_fifo) {
+   vals[RX].srcr |= SSI_SRCR_RFEN1;
+   vals[TX].stcr |= SSI_STCR_TFEN1;
+   }
+
if (ssi->use_dma) {
vals[RX].sier |= SSI_SIER_RDMAE;
vals[TX].sier |= SSI_SIER_TDMAE;
@@ -991,14 +996,9 @@ static int _fsl_ssi_set_dai_fmt(struct device *dev,
 SSI_SFCSR_TFWM0(wm) | SSI_SFCSR_RFWM0(wm) |
 SSI_SFCSR_TFWM1(wm) | SSI_SFCSR_RFWM1(wm));
 
-   if (ssi->use_dual_fifo) {
-   regmap_update_bits(regs, REG_SSI_SRCR,
-  SSI_SRCR_RFEN1, SSI_SRCR_RFEN1);
-   regmap_update_bits(regs, REG_SSI_STCR,
-  SSI_STCR_TFEN1, SSI_STCR_TFEN1);
+   if (ssi->use_dual_fifo)
regmap_update_bits(regs, REG_SSI_SCR,
   SSI_SCR_TCH_EN, SSI_SCR_TCH_EN);
-   }
 
if ((fmt & SND_SOC_DAIFMT_FORMAT_MASK) == SND_SOC_DAIFMT_AC97)
fsl_ssi_setup_ac97(ssi);
-- 
2.1.4



[PATCH v6 09/17] ASoC: fsl_ssi: Clean up fsl_ssi_setup_regvals()

2018-02-12 Thread Nicolin Chen
This patch cleans fsl_ssi_setup_regvals() by following changes:
1) Moving DBG bits to the first lines.
2) Setting SSIE, RE/TE as default and cleaning it for AC97

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index fc5768d..156f5132 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -580,18 +580,16 @@ static void fsl_ssi_setup_regvals(struct fsl_ssi *ssi)
 {
struct fsl_ssi_regvals *vals = ssi->regvals;
 
-   vals[RX].sier = SSI_SIER_RFF0_EN;
+   vals[RX].sier = SSI_SIER_RFF0_EN | FSLSSI_SIER_DBG_RX_FLAGS;
vals[RX].srcr = SSI_SRCR_RFEN0;
-   vals[RX].scr = 0;
-   vals[TX].sier = SSI_SIER_TFE0_EN;
+   vals[RX].scr = SSI_SCR_SSIEN | SSI_SCR_RE;
+   vals[TX].sier = SSI_SIER_TFE0_EN | FSLSSI_SIER_DBG_TX_FLAGS;
vals[TX].stcr = SSI_STCR_TFEN0;
-   vals[TX].scr = 0;
+   vals[TX].scr = SSI_SCR_SSIEN | SSI_SCR_TE;
 
/* AC97 has already enabled SSIEN, RE and TE, so ignore them */
-   if (!fsl_ssi_is_ac97(ssi)) {
-   vals[RX].scr = SSI_SCR_SSIEN | SSI_SCR_RE;
-   vals[TX].scr = SSI_SCR_SSIEN | SSI_SCR_TE;
-   }
+   if (fsl_ssi_is_ac97(ssi))
+   vals[RX].scr = vals[TX].scr = 0;
 
if (ssi->use_dma) {
vals[RX].sier |= SSI_SIER_RDMAE;
@@ -600,9 +598,6 @@ static void fsl_ssi_setup_regvals(struct fsl_ssi *ssi)
vals[RX].sier |= SSI_SIER_RIE;
vals[TX].sier |= SSI_SIER_TIE;
}
-
-   vals[RX].sier |= FSLSSI_SIER_DBG_RX_FLAGS;
-   vals[TX].sier |= FSLSSI_SIER_DBG_TX_FLAGS;
 }
 
 static void fsl_ssi_setup_ac97(struct fsl_ssi *ssi)
-- 
2.1.4



[PATCH v6 08/17] ASoC: fsl_ssi: Add DAIFMT define for AC97

2018-02-12 Thread Nicolin Chen
The _fsl_ssi_set_dai_fmt() bypasses an undefined format for AC97
mode. However, it's not really necessary if AC97 has its complete
format defined.

So this patch adds a DAIFMT macro of complete format including a
clock direction and polarity.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 9f024a9..fc5768d 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -90,6 +90,16 @@
 SNDRV_PCM_FMTBIT_S24_LE)
 #endif
 
+/*
+ * In AC97 mode, TXDIR bit is forced to 0 and TFDIR bit is forced to 1:
+ *  - SSI inputs external bit clock and outputs frame sync clock -- CBM_CFS
+ *  - Also have NB_NF to mark these two clocks will not be inverted
+ */
+#define FSLSSI_AC97_DAIFMT \
+   (SND_SOC_DAIFMT_AC97 | \
+SND_SOC_DAIFMT_CBM_CFS | \
+SND_SOC_DAIFMT_NB_NF)
+
 #define FSLSSI_SIER_DBG_RX_FLAGS \
(SSI_SIER_RFF0_EN | \
 SSI_SIER_RLS_EN | \
@@ -964,8 +974,7 @@ static int _fsl_ssi_set_dai_fmt(struct device *dev,
scr &= ~SSI_SCR_SYS_CLK_EN;
break;
default:
-   if (!fsl_ssi_is_ac97(ssi))
-   return -EINVAL;
+   return -EINVAL;
}
 
stcr |= strcr;
@@ -1372,7 +1381,7 @@ static int fsl_ssi_probe(struct platform_device *pdev)
sprop = of_get_property(np, "fsl,mode", NULL);
if (sprop) {
if (!strcmp(sprop, "ac97-slave"))
-   ssi->dai_fmt = SND_SOC_DAIFMT_AC97;
+   ssi->dai_fmt = FSLSSI_AC97_DAIFMT;
}
 
/* Select DMA or FIQ */
-- 
2.1.4



[PATCH v6 07/17] ASoC: fsl_ssi: Clean up helper functions of trigger()

2018-02-12 Thread Nicolin Chen
The trigger() calls fsl_ssi_tx_config() and fsl_ssi_rx_config(),
and both of them jump to fsl_ssi_config(). And fsl_ssi_config()
later calls another fsl_ssi_rxtx_config().

However, the whole routine, especially fsl_ssi_config() function,
is too complicated because of the folowing reasons:
1) It has to handle the concern of the opposite stream.
2) It has to handle cases of offline configurations support.
3) It has to handle enable and disable operations while they're
   mostly different.

Since the enable and disable routines have more differences than
TX and RX rountines, this patch simplifies these helper functions
with the following changes:
- Changing to two helper functions of enable and disable instead
  of TX and RX.
- Removing fsl_ssi_rxtx_config() by separately integrating it to
  two newly introduced enable & disable functions.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 256 +++-
 1 file changed, 122 insertions(+), 134 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index d276b78..9f024a9 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -382,31 +382,83 @@ static irqreturn_t fsl_ssi_isr(int irq, void *dev_id)
 }
 
 /**
- * Enable or disable all rx/tx config flags at once
+ * Set SCR, SIER, STCR and SRCR registers with cached values in regvals
+ *
+ * Notes:
+ * 1) For offline_config SoCs, enable all necessary bits of both streams
+ *when 1st stream starts, even if the opposite stream will not start
+ * 2) It also clears FIFO before setting regvals; SOR is safe to set online
  */
-static void fsl_ssi_rxtx_config(struct fsl_ssi *ssi, bool enable)
+static void fsl_ssi_config_enable(struct fsl_ssi *ssi, bool tx)
 {
-   struct regmap *regs = ssi->regs;
struct fsl_ssi_regvals *vals = ssi->regvals;
+   int dir = tx ? TX : RX;
+   u32 sier, srcr, stcr;
 
-   if (enable) {
-   regmap_update_bits(regs, REG_SSI_SIER,
-  vals[RX].sier | vals[TX].sier,
-  vals[RX].sier | vals[TX].sier);
-   regmap_update_bits(regs, REG_SSI_SRCR,
-  vals[RX].srcr | vals[TX].srcr,
-  vals[RX].srcr | vals[TX].srcr);
-   regmap_update_bits(regs, REG_SSI_STCR,
-  vals[RX].stcr | vals[TX].stcr,
-  vals[RX].stcr | vals[TX].stcr);
+   /* Clear dirty data in the FIFO; It also prevents channel slipping */
+   regmap_update_bits(ssi->regs, REG_SSI_SOR,
+  SSI_SOR_xX_CLR(tx), SSI_SOR_xX_CLR(tx));
+
+   /*
+* On offline_config SoCs, SxCR and SIER are already configured when
+* the previous stream started. So skip all SxCR and SIER settings
+* to prevent online reconfigurations, then jump to set SCR directly
+*/
+   if (ssi->soc->offline_config && ssi->streams)
+   goto enable_scr;
+
+   if (ssi->soc->offline_config) {
+   /*
+* Online reconfiguration not supported, so enable all bits for
+* both streams at once to avoid necessity of reconfigurations
+*/
+   srcr = vals[RX].srcr | vals[TX].srcr;
+   stcr = vals[RX].stcr | vals[TX].stcr;
+   sier = vals[RX].sier | vals[TX].sier;
} else {
-   regmap_update_bits(regs, REG_SSI_SRCR,
-  vals[RX].srcr | vals[TX].srcr, 0);
-   regmap_update_bits(regs, REG_SSI_STCR,
-  vals[RX].stcr | vals[TX].stcr, 0);
-   regmap_update_bits(regs, REG_SSI_SIER,
-  vals[RX].sier | vals[TX].sier, 0);
+   /* Otherwise, only set bits for the current stream */
+   srcr = vals[dir].srcr;
+   stcr = vals[dir].stcr;
+   sier = vals[dir].sier;
}
+
+   /* Configure SRCR, STCR and SIER at once */
+   regmap_update_bits(ssi->regs, REG_SSI_SRCR, srcr, srcr);
+   regmap_update_bits(ssi->regs, REG_SSI_STCR, stcr, stcr);
+   regmap_update_bits(ssi->regs, REG_SSI_SIER, sier, sier);
+
+enable_scr:
+   /*
+* Start DMA before setting TE to avoid FIFO underrun
+* which may cause a channel slip or a channel swap
+*
+* TODO: FIQ cases might also need this upon testing
+*/
+   if (ssi->use_dma && tx) {
+   int try = 100;
+   u32 sfcsr;
+
+   /* Enable SSI first to send TX DMA request */
+   regmap_update_bits(ssi->regs, REG_SSI_SCR,
+  SSI_SCR_SSIEN, SSI_SCR_SSIEN);
+
+   /* Busy wait until TX FIFO not empty -- DMA working */
+   do {
+

[PATCH v6 06/17] ASoC: fsl_ssi: Clear FIFO directly in fsl_ssi_config()

2018-02-12 Thread Nicolin Chen
The FIFO clear helper function is just one line of code now.
So it could be cleaned up by removing it and calling regmap
directly.

Meanwhile, FIFO clear could be applied to all use cases, not
confined to AC97. So this patch also moves FIFO clear in the
trigger() to fsl_ssi_config() and removes the AC97 check.

Note that SOR register is safe from offline_config HW limit.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 0d8c800..d276b78 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -410,17 +410,6 @@ static void fsl_ssi_rxtx_config(struct fsl_ssi *ssi, bool 
enable)
 }
 
 /**
- * Clear remaining data in the FIFO to avoid dirty data or channel slipping
- */
-static void fsl_ssi_fifo_clear(struct fsl_ssi *ssi, bool is_rx)
-{
-   bool tx = !is_rx;
-
-   regmap_update_bits(ssi->regs, REG_SSI_SOR,
-  SSI_SOR_xX_CLR(tx), SSI_SOR_xX_CLR(tx));
-}
-
-/**
  * Exclude bits that are used by the opposite stream
  *
  * When both streams are active, disabling some bits for the current stream
@@ -446,10 +435,11 @@ static void fsl_ssi_fifo_clear(struct fsl_ssi *ssi, bool 
is_rx)
 static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
   struct fsl_ssi_regvals *vals)
 {
-   int adir = (&ssi->regvals[TX] == vals) ? RX : TX;
-   int dir = (&ssi->regvals[TX] == vals) ? TX : RX;
+   bool tx = &ssi->regvals[TX] == vals;
struct regmap *regs = ssi->regs;
struct fsl_ssi_regvals *avals;
+   int adir = tx ? RX : TX;
+   int dir = tx ? TX : RX;
bool aactive;
 
/* Check if the opposite stream is active */
@@ -489,7 +479,9 @@ static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
 
/* Online configure single direction while SSI is running */
if (enable) {
-   fsl_ssi_fifo_clear(ssi, vals->scr & SSI_SCR_RE);
+   /* Clear FIFO to prevent dirty data or channel slipping */
+   regmap_update_bits(ssi->regs, REG_SSI_SOR,
+  SSI_SOR_xX_CLR(tx), SSI_SOR_xX_CLR(tx));
 
regmap_update_bits(regs, REG_SSI_SRCR, vals->srcr, vals->srcr);
regmap_update_bits(regs, REG_SSI_STCR, vals->stcr, vals->stcr);
@@ -511,6 +503,10 @@ static void fsl_ssi_config(struct fsl_ssi *ssi, bool 
enable,
regmap_update_bits(regs, REG_SSI_SRCR, srcr, 0);
regmap_update_bits(regs, REG_SSI_STCR, stcr, 0);
regmap_update_bits(regs, REG_SSI_SIER, sier, 0);
+
+   /* Clear FIFO to prevent dirty data or channel slipping */
+   regmap_update_bits(ssi->regs, REG_SSI_SOR,
+  SSI_SOR_xX_CLR(tx), SSI_SOR_xX_CLR(tx));
}
 
 config_done:
@@ -1091,7 +1087,6 @@ static int fsl_ssi_trigger(struct snd_pcm_substream 
*substream, int cmd,
 {
struct snd_soc_pcm_runtime *rtd = substream->private_data;
struct fsl_ssi *ssi = snd_soc_dai_get_drvdata(rtd->cpu_dai);
-   struct regmap *regs = ssi->regs;
 
switch (cmd) {
case SNDRV_PCM_TRIGGER_START:
@@ -1116,14 +,6 @@ static int fsl_ssi_trigger(struct snd_pcm_substream 
*substream, int cmd,
return -EINVAL;
}
 
-   /* Clear corresponding FIFO */
-   if (fsl_ssi_is_ac97(ssi)) {
-   if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
-   regmap_write(regs, REG_SSI_SOR, SSI_SOR_TX_CLR);
-   else
-   regmap_write(regs, REG_SSI_SOR, SSI_SOR_RX_CLR);
-   }
-
return 0;
 }
 
-- 
2.1.4



[PATCH v6 05/17] ASoC: fsl_ssi: Rename fsl_ssi_disable_val macro

2018-02-12 Thread Nicolin Chen
The define of fsl_ssi_disable_val is not so clear as it mixes two
steps of calculations together. And those parameter names are also
a bit long to read.

Since it just tries to exclude the shared bits from the regvals of
current stream while the opposite stream is active, it's better to
use something like ssi_excl_shared_bits.

This patch also bisects fsl_ssi_disable_val into two macros of two
corresponding steps and then shortens its parameter names. It also
updates callers in the fsl_ssi_config() accordingly.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 55 +
 1 file changed, 23 insertions(+), 32 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index b277a56..0d8c800 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -421,24 +421,24 @@ static void fsl_ssi_fifo_clear(struct fsl_ssi *ssi, bool 
is_rx)
 }
 
 /**
- * Calculate the bits that have to be disabled for the current stream that is
- * getting disabled. This keeps the bits enabled that are necessary for the
- * second stream to work if 'stream_active' is true.
+ * Exclude bits that are used by the opposite stream
  *
- * Detailed calculation:
- * These are the values that need to be active after disabling. For non-active
- * second stream, this is 0:
- * vals_stream * !!stream_active
+ * When both streams are active, disabling some bits for the current stream
+ * might break the other stream if these bits are used by it.
  *
- * The following computes the overall differences between the setup for the
- * to-disable stream and the active stream, a simple XOR:
- * vals_disable ^ (vals_stream * !!(stream_active))
+ * @vals : regvals of the current stream
+ * @avals: regvals of the opposite stream
+ * @aactive: active state of the opposite stream
  *
- * The full expression adds a mask on all values we care about
+ *  1) XOR vals and avals to get the differences if the other stream is active;
+ * Otherwise, return current vals if the other stream is not active
+ *  2) AND the result of 1) with the current vals
  */
-#define fsl_ssi_disable_val(vals_disable, vals_stream, stream_active) \
-   ((vals_disable) & \
-((vals_disable) ^ ((vals_stream) * (u32)!!(stream_active
+#define _ssi_xor_shared_bits(vals, avals, aactive) \
+   ((vals) ^ ((avals) * (aactive)))
+
+#define ssi_excl_shared_bits(vals, avals, aactive) \
+   ((vals) & _ssi_xor_shared_bits(vals, avals, aactive))
 
 /**
  * Enable or disable SSI configuration.
@@ -446,19 +446,14 @@ static void fsl_ssi_fifo_clear(struct fsl_ssi *ssi, bool 
is_rx)
 static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
   struct fsl_ssi_regvals *vals)
 {
+   int adir = (&ssi->regvals[TX] == vals) ? RX : TX;
int dir = (&ssi->regvals[TX] == vals) ? TX : RX;
struct regmap *regs = ssi->regs;
struct fsl_ssi_regvals *avals;
-   int nr_active_streams;
-   int keep_active;
-
-   nr_active_streams = !!(ssi->streams & BIT(TX)) +
-   !!(ssi->streams & BIT(RX));
+   bool aactive;
 
-   if (nr_active_streams - 1 > 0)
-   keep_active = 1;
-   else
-   keep_active = 0;
+   /* Check if the opposite stream is active */
+   aactive = ssi->streams & BIT(adir);
 
/* Get the opposite direction to keep its values untouched */
if (&ssi->regvals[RX] == vals)
@@ -471,8 +466,7 @@ static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
 * To keep the other stream safe, exclude shared bits between
 * both streams, and get safe bits to disable current stream
 */
-   u32 scr = fsl_ssi_disable_val(vals->scr, avals->scr,
- keep_active);
+   u32 scr = ssi_excl_shared_bits(vals->scr, avals->scr, aactive);
/* Safely disable SCR register for the stream */
regmap_update_bits(regs, REG_SSI_SCR, scr, 0);
 
@@ -487,7 +481,7 @@ static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
 * 2) Disable all remaining bits of both streams when last stream ends
 */
if (ssi->soc->offline_config) {
-   if ((enable && !nr_active_streams) || (!enable && !keep_active))
+   if ((enable && !ssi->streams) || (!enable && !aactive))
fsl_ssi_rxtx_config(ssi, enable);
 
goto config_done;
@@ -509,12 +503,9 @@ static void fsl_ssi_config(struct fsl_ssi *ssi, bool 
enable,
 * To keep the other stream safe, exclude shared bits between
 * both streams, and get safe bits to disable current stream
 */
-   sier = fsl_ssi_disable_val(vals->sier, avals->sier,
-  keep

[PATCH v6 04/17] ASoC: fsl_ssi: Maintain a mask of active streams

2018-02-12 Thread Nicolin Chen
Checking TE and RE bits in SCR register doesn't work for AC97 mode
which enables SSIEN, TE and RE in the fsl_ssi_setup_ac97() that's
called during probe().

So when running into the trigger(), it will always get the result
of both TE and RE being enabled already, even if actually there is
no active stream.

This patch fixes this issue by adding a variable to log the active
streams manually.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 14046c3..b277a56 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -205,6 +205,7 @@ struct fsl_ssi_soc_data {
  * @cpu_dai_drv: CPU DAI driver for this device
  *
  * @dai_fmt: DAI configuration this device is currently used with
+ * @streams: Mask of current active streams: BIT(TX) and BIT(RX)
  * @i2s_net: I2S and Network mode configurations of SCR register
  * @use_dma: DMA is used or FIQ with stream filter
  * @use_dual_fifo: DMA with support for dual FIFO mode
@@ -249,6 +250,7 @@ struct fsl_ssi {
struct snd_soc_dai_driver cpu_dai_drv;
 
unsigned int dai_fmt;
+   u8 streams;
u8 i2s_net;
bool use_dma;
bool use_dual_fifo;
@@ -444,15 +446,14 @@ static void fsl_ssi_fifo_clear(struct fsl_ssi *ssi, bool 
is_rx)
 static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
   struct fsl_ssi_regvals *vals)
 {
+   int dir = (&ssi->regvals[TX] == vals) ? TX : RX;
struct regmap *regs = ssi->regs;
struct fsl_ssi_regvals *avals;
int nr_active_streams;
-   u32 scr;
int keep_active;
 
-   regmap_read(regs, REG_SSI_SCR, &scr);
-
-   nr_active_streams = !!(scr & SSI_SCR_TE) + !!(scr & SSI_SCR_RE);
+   nr_active_streams = !!(ssi->streams & BIT(TX)) +
+   !!(ssi->streams & BIT(RX));
 
if (nr_active_streams - 1 > 0)
keep_active = 1;
@@ -474,6 +475,9 @@ static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
  keep_active);
/* Safely disable SCR register for the stream */
regmap_update_bits(regs, REG_SSI_SCR, scr, 0);
+
+   /* Log the disabled stream to the mask */
+   ssi->streams &= ~BIT(dir);
}
 
/*
@@ -549,6 +553,9 @@ static void fsl_ssi_config(struct fsl_ssi *ssi, bool enable,
}
/* Enable all remaining bits */
regmap_update_bits(regs, REG_SSI_SCR, vals->scr, vals->scr);
+
+   /* Log the enabled stream to the mask */
+   ssi->streams |= BIT(dir);
}
 }
 
-- 
2.1.4



[PATCH v6 01/17] ASoC: fsl_ssi: Redefine RX and TX macros

2018-02-12 Thread Nicolin Chen
The RX and TX macros were defined implicitly and there was
a potential risk if someone changes their values.

Since they were defined to index the array ssi->regvals[2],
this patch moves these two macros to fsl_ssi.c, closer to
its owner ssi->regvals. And it also puts some comments here
to limit their value within [0, 1].

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 4 
 sound/soc/fsl/fsl_ssi.h | 3 ---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 24fb672..3c8dd60 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -56,6 +56,10 @@
 #include "fsl_ssi.h"
 #include "imx-pcm.h"
 
+/* Define RX and TX to index ssi->regvals array; Can be 0 or 1 only */
+#define RX 0
+#define TX 1
+
 /**
  * FSLSSI_I2S_FORMATS: audio formats supported by the SSI
  *
diff --git a/sound/soc/fsl/fsl_ssi.h b/sound/soc/fsl/fsl_ssi.h
index de2fdc5..18f8dd5 100644
--- a/sound/soc/fsl/fsl_ssi.h
+++ b/sound/soc/fsl/fsl_ssi.h
@@ -12,9 +12,6 @@
 #ifndef _MPC8610_I2S_H
 #define _MPC8610_I2S_H
 
-#define RX 0
-#define TX 1
-
 /* -- SSI Register Map -- */
 
 /* SSI Transmit Data Register 0 */
-- 
2.1.4



[PATCH v6 03/17] ASoC: fsl_ssi: Clean up set_dai_tdm_slot()

2018-02-12 Thread Nicolin Chen
This patch replaces the register read with ssi->i2s_net for
simplification. It also removes masking SSIEN from scr value
since it's handled later by regmap_update_bits() to set this
scr value back.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index d4f1f0d..14046c3 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1055,9 +1055,7 @@ static int fsl_ssi_set_dai_tdm_slot(struct snd_soc_dai 
*dai, u32 tx_mask,
}
 
/* The slot number should be >= 2 if using Network mode or I2S mode */
-   regmap_read(regs, REG_SSI_SCR, &val);
-   val &= SSI_SCR_I2S_MODE_MASK | SSI_SCR_NET;
-   if (val && slots < 2) {
+   if (ssi->i2s_net && slots < 2) {
dev_err(dai->dev, "slot number should be >= 2 in I2S or NET\n");
return -EINVAL;
}
@@ -1067,9 +1065,8 @@ static int fsl_ssi_set_dai_tdm_slot(struct snd_soc_dai 
*dai, u32 tx_mask,
regmap_update_bits(regs, REG_SSI_SRCCR,
   SSI_SxCCR_DC_MASK, SSI_SxCCR_DC(slots));
 
-   /* Save SSIEN bit of the SCR register */
+   /* Save the SCR register value */
regmap_read(regs, REG_SSI_SCR, &val);
-   val &= SSI_SCR_SSIEN;
/* Temporarily enable SSI to allow SxMSKs to be configurable */
regmap_update_bits(regs, REG_SSI_SCR, SSI_SCR_SSIEN, SSI_SCR_SSIEN);
 
-- 
2.1.4



[PATCH v6 00/17] ASoC: fsl_ssi: Clean up - program flow level

2018-02-12 Thread Nicolin Chen
[ The v6 just rebased the series and fixed the comments in probe().
  There is no need to re-test it except the uncovered tests listed
  at the end of this cover letter.
  
  Timur, these patches have been in the list for quite a long time
  and there has been no review for nearly a month since last ver.
  Would you please give a review and send an ack? Thanks a lot. ]

==Change log==
v6
 * Added Tested-by and Reviewed-by from Maciej.
 * Fixed one line of comments in the probe() suggested by Maciej.
 * Rebased all patches since there's a change of fsl_ssi merged
   recently.
v5
 * Reworked the series by taking suggestions from Maciej for AC97
   + Fixed SSI lockup issue by changing cleanup sequence in PATCH-13
   + Moved fsl_ssi_hw_clean() after unregistering the CODEC device
 in PATCH-13
   + Set NULL as the parent of CODEC platform device to fix a NULL
 pointer dereference bug in PATCH-16
 * Updated comments of three variables/pointers in struct fsl_ssi
   to describe them more accurately in PATCH-16
v4
 * Reworked the series by taking suggestions from Maciej
   + Added TXBIT0 bit back to play safe in PATCH-14
   + Made bool synchronous exclusive with AC97 mode in PATCH-16
v3
 * Reworked the series by taking suggestions from Maciej
   + Added PATCH-01 to make RX and TX more clearly defined
   + Replaced "bool dir" with "int dir" in PATCH-04
   + Replaced "!dir" with "int adir" in PATCH-05
   + Put CBM_CFS behind the baudclk check to keep the same
 program flow in PATCH-14
   + Removed all cpu_dai_drv changes in PATCH-15
v2
 * Reworked the series by taking suggestions from Maciej
   + Added PATCH-01 to keep all ssi->i2s_net updated
   + Replaced bool tx with bool dir in PATCH-03 and PATCH-06
   + Moved all initial register configurations from dai probe() to
 platform probe() so as to let AC97 CODEC successfully probe.
 * Added Tested-by from Caleb for TDM test cases.

==Background==
The fsl_ssi driver was designed for PPC originally and then it has
been updated to support different modes for i.MX Series, including
SDMA, I2S Master mode, AC97 and older i.MXs with FIQ, by different
contributors for different use cases in different coding styles.

Additionally, in order to fix/work-around hardware bugs and design
flaws, the driver made a lot of compromise so now its program flow
looks very complicated and it's getting hard to maintain or update.

So I am going to clean up the driver on both coding style level and
program flow level.

==Introduction==
This series of patches is the second set to clean up fsl_ssi driver
in the program flow level. Any patch here may impact a fundamental
test case like playback or record.

==Verification==
This series of patches require to be fully tested. I have done such
tests on i.MX6SoloX with WM8962 using imx_v6_v7_defconfig as:
 - Playback via I2S Master and Slave mode
 - Record via I2S Master and Slave mode
 - Simultaneous playback and record via I2S Master and Slave mode
 - Background playback with foreground record (starting at different
   time) via I2S Master and Slave mode
 - Background record with foreground playback (starting at different
   time) via I2S Master and Slave mode
 * All tests above by hacking offline_config to true in imx51.

Caleb has tested all versions with TDM lookback tests on i.MX6.

Maciej has tested v5 with AC97 tests on i.MX6.

Example of uncovered tests: PowerPC and FIQ.

Nicolin Chen (17):
  ASoC: fsl_ssi: Redefine RX and TX macros
  ASoC: fsl_ssi: Keep ssi->i2s_net updated
  ASoC: fsl_ssi: Clean up set_dai_tdm_slot()
  ASoC: fsl_ssi: Maintain a mask of active streams
  ASoC: fsl_ssi: Rename fsl_ssi_disable_val macro
  ASoC: fsl_ssi: Clear FIFO directly in fsl_ssi_config()
  ASoC: fsl_ssi: Clean up helper functions of trigger()
  ASoC: fsl_ssi: Add DAIFMT define for AC97
  ASoC: fsl_ssi: Clean up fsl_ssi_setup_regvals()
  ASoC: fsl_ssi: Set xFEN0 and xFEN1 together
  ASoC: fsl_ssi: Use snd_soc_init_dma_data instead
  ASoC: fsl_ssi: Move one-time configurations to probe()
  ASoC: fsl_ssi: Setup AC97 in fsl_ssi_hw_init()
  ASoC: fsl_ssi: Clean up _fsl_ssi_set_dai_fmt()
  ASoC: fsl_ssi: Add bool synchronous to mark synchronous mode
  ASoC: fsl_ssi: Move DT related code to a separate probe()
  ASoC: fsl_ssi: Use ssi->streams instead of reading register

 sound/soc/fsl/fsl_ssi.c | 756 +---
 sound/soc/fsl/fsl_ssi.h |   3 -
 2 files changed, 395 insertions(+), 364 deletions(-)

-- 
2.1.4



[PATCH v6 02/17] ASoC: fsl_ssi: Keep ssi->i2s_net updated

2018-02-12 Thread Nicolin Chen
The hw_params() overwrites i2s_net settings for special cases like
mono-channel support, however, it doesn't update ssi->i2s_net as
set_dai_fmt() does.

This patch removes the local i2s_net variable and directly updates
ssi->i2s_net in the hw_params() so that the driver can simply look
up the ssi->i2s_net instead of reading the register.

Signed-off-by: Nicolin Chen 
Tested-by: Caleb Crome 
Tested-by: Maciej S. Szmigiero 
Reviewed-by: Maciej S. Szmigiero 
---
 sound/soc/fsl/fsl_ssi.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 3c8dd60..d4f1f0d 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -838,16 +838,16 @@ static int fsl_ssi_hw_params(struct snd_pcm_substream 
*substream,
}
 
if (!fsl_ssi_is_ac97(ssi)) {
-   u8 i2s_net;
/* Normal + Network mode to send 16-bit data in 32-bit frames */
if (fsl_ssi_is_i2s_cbm_cfs(ssi) && sample_size == 16)
-   i2s_net = SSI_SCR_I2S_MODE_NORMAL | SSI_SCR_NET;
-   else
-   i2s_net = ssi->i2s_net;
+   ssi->i2s_net = SSI_SCR_I2S_MODE_NORMAL | SSI_SCR_NET;
+
+   /* Use Normal mode to send mono data at 1st slot of 2 slots */
+   if (channels == 1)
+   ssi->i2s_net = SSI_SCR_I2S_MODE_NORMAL;
 
regmap_update_bits(regs, REG_SSI_SCR,
-  SSI_SCR_I2S_NET_MASK,
-  channels == 1 ? 0 : i2s_net);
+  SSI_SCR_I2S_NET_MASK, ssi->i2s_net);
}
 
/* In synchronous mode, the SSI uses STCCR for capture */
-- 
2.1.4



Re: [PATCH 3/5] mtd: Stop assuming mtd_erase() is asynchronous

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:09 CET schrieb Boris Brezillon:
> None of the mtd->_erase() implementations work in an asynchronous manner,
> so let's simplify MTD users that call mtd_erase(). All they need to do
> is check the value returned by mtd_erase() and assume that != 0 means
> failure.
> 
> Signed-off-by: Boris Brezillon 

Reviewed-by: Richard Weinberger 

Thanks,
//richard




Re: [PATCH 2/5] mtd: Get rid of unused fields in struct erase_info

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:08 CET schrieb Boris Brezillon:
> Some fields are not used by MTD drivers, users or core code. Moreover,
> those fields are not documented, so get rid of them to avoid any
> confusion.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  include/linux/mtd/mtd.h | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> index 205ededccc60..2a407dc9beaa 100644
> --- a/include/linux/mtd/mtd.h
> +++ b/include/linux/mtd/mtd.h
> @@ -48,14 +48,9 @@ struct erase_info {
>   uint64_t addr;
>   uint64_t len;
>   uint64_t fail_addr;
> - u_long time;
> - u_long retries;
> - unsigned dev;
> - unsigned cell;
>   void (*callback) (struct erase_info *self);
>   u_long priv;
>   u_char state;
> - struct erase_info *next;
>  };
> 
>  struct mtd_erase_region_info {

Reviewed-by: Richard Weinberger 

Thanks,
//richard



Re: [PATCH 1/5] mtd: Initialize ->fail_addr early in mtd_erase()

2018-02-12 Thread Richard Weinberger
Am Montag, 12. Februar 2018, 22:03:07 CET schrieb Boris Brezillon:
> mtd_erase() can return an error before ->fail_addr is initialized to
> MTD_FAIL_ADDR_UNKNOWN. Move this initialization at the very beginning
> of the function.
> 
> Signed-off-by: Boris Brezillon 
> ---
>  drivers/mtd/mtdcore.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index a1c94526fb88..c87859ff338b 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -953,6 +953,8 @@ EXPORT_SYMBOL_GPL(__put_mtd_device);
>   */
>  int mtd_erase(struct mtd_info *mtd, struct erase_info *instr)
>  {
> + instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
> +
>   if (!mtd->erasesize || !mtd->_erase)
>   return -ENOTSUPP;
> 
> @@ -961,7 +963,6 @@ int mtd_erase(struct mtd_info *mtd, struct erase_info
> *instr) if (!(mtd->flags & MTD_WRITEABLE))
>   return -EROFS;
> 
> - instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
>   if (!instr->len) {
>   instr->state = MTD_ERASE_DONE;
>   mtd_erase_callback(instr);

Reviewed-by: Richard Weinberger 

Thanks,
//richard


pata-macio WARNING at dmam_alloc_coherent+0xec/0x110

2018-02-12 Thread Meelis Roos
I tested 4.16-rc1 on my PowerMac G4 and got the following warning from 
macio pata driver. Since pata-macio has no recent changes, dma-mapping.h 
changes seem to be related.

[0.228408] MacIO PCI driver attached to Keylargo chipset
[1.283931] pata-macio 0.0001f000:ata-4: Activating pata-macio chipset 
KeyLargo ATA-4, Apple bus ID 2
[1.284398] WARNING: CPU: 0 PID: 1 at ./include/linux/dma-mapping.h:516 
dmam_alloc_coherent+0xec/0x110
[1.284689] Modules linked in:
[1.284797] CPU: 0 PID: 1 Comm: swapper Not tainted 4.16.0-rc1 #60
[1.284991] NIP:  c03259ec LR: c0325948 CTR: 
[1.285150] REGS: ef047c10 TRAP: 0700   Not tainted  (4.16.0-rc1)
[1.285337] MSR:  00029032   CR: 24fff228  XER: 2000
[1.285559] 
   GPR00: c0325948 ef047cc0 ef048000 ef1321b0   
ef1321bc  
   GPR08:   c04f1bd0  22fff884  
c0004c80  
   GPR16:       
c066 c05f0960 
   GPR24: 0007 c063d7a8  ef1e59ac 1020 ef1321b0 
ef135c18 014000c0 
[1.303085] NIP [c03259ec] dmam_alloc_coherent+0xec/0x110
[1.308751] LR [c0325948] dmam_alloc_coherent+0x48/0x110
[1.314511] Call Trace:
[1.320187] [ef047cc0] [c0325948] dmam_alloc_coherent+0x48/0x110 (unreliable)
[1.326133] [ef047ce0] [c0370a90] pata_macio_port_start+0x44/0xb8
[1.332110] [ef047d00] [c0355ed4] ata_host_start.part.5+0x138/0x254
[1.338100] [ef047d30] [c035c1e8] ata_host_activate+0x84/0x1a0
[1.344007] [ef047d50] [c0371214] pata_macio_common_init+0x3b0/0x608
[1.349890] [ef047db0] [c0336f9c] macio_device_probe+0x60/0x120
[1.355761] [ef047dd0] [c031868c] driver_probe_device+0x25c/0x35c
[1.361576] [ef047e00] [c031887c] __driver_attach+0xf0/0xf4
[1.367320] [ef047e20] [c0316340] bus_for_each_dev+0x80/0xc0
[1.373051] [ef047e50] [c031782c] bus_add_driver+0x144/0x258
[1.378805] [ef047e70] [c03190dc] driver_register+0x8c/0x140
[1.384580] [ef047e80] [c060ce14] pata_macio_init+0x5c/0x8c
[1.390303] [ef047ea0] [c0004aa0] do_one_initcall+0x48/0x18c
[1.396000] [ef047f00] [c05f1214] kernel_init_freeable+0x12c/0x1ec
[1.401615] [ef047f30] [c0004c98] kernel_init+0x18/0x128
[1.407208] [ef047f40] [c00122e4] ret_from_kernel_thread+0x5c/0x64
[1.412829] Instruction dump:
[1.418409] 939d 4bff6329 80010024 7fe3fb78 8361000c 83810010 7c0803a6 
83a10014
[1.424201] 83c10018 83e1001c 38210020 4e800020 <0fe0> 4b84 7fa3eb78 
3be0 
[1.430020] ---[ end trace 89c0f4a91a110769 ]---


-- 
Meelis Roos (mr...@linux.ee)


Re: [PATCH 2/4] powerpc/vas: Fix cleanup when VAS is not configured

2018-02-12 Thread Sukadev Bhattiprolu
Michael Ellerman [m...@ellerman.id.au] wrote:
> Sukadev Bhattiprolu  writes:
> 
> > When VAS is not configured in the system, make sure to remove
> > the VAS debugfs directory and unregister the platform driver.
> >
> > Signed-off-by: Sukadev Bhattiprolu 
> ...
> > diff --git a/arch/powerpc/platforms/powernv/vas.c 
> > b/arch/powerpc/platforms/powernv/vas.c
> > index aebbe95..f83e27d8 100644
> > --- a/arch/powerpc/platforms/powernv/vas.c
> > +++ b/arch/powerpc/platforms/powernv/vas.c
> > @@ -169,8 +169,11 @@ static int __init vas_init(void)
> > found++;
> > }
> >  
> > -   if (!found)
> > +   if (!found) {
> > +   platform_driver_unregister(&vas_driver);
> > +   vas_cleanup_dbgdir();
> > return -ENODEV;
> > +   }
> 
> The better patch would be to move the call to vas_init_dbgdir() down
> here, where we know we have successfully registered the driver.

Well, when VAS is configured, init_vas_instance() expects the top level
"vas" debugfs dir to already be setup.

We could have each init_vas_instance() assume it is the first and
unconditionally call vas_init_dbgdir(). vas_init_dbgdir() could make
sure to initialize only once.

Or, we could make a separate pass countng "ibm,vas" nodes. If there are
none, skip both steps (dbgdir and registering platform driver).

Sukadev



Re: [PATCH] headers: untangle kmemleak.h from mm.h

2018-02-12 Thread Randy Dunlap
On 02/12/2018 04:28 AM, Michael Ellerman wrote:
> Randy Dunlap  writes:
> 
>> From: Randy Dunlap 
>>
>> Currently  #includes  for no obvious
>> reason. It looks like it's only a convenience, so remove kmemleak.h
>> from slab.h and add  to any users of kmemleak_*
>> that don't already #include it.
>> Also remove  from source files that do not use it.
>>
>> This is tested on i386 allmodconfig and x86_64 allmodconfig. It
>> would be good to run it through the 0day bot for other $ARCHes.
>> I have neither the horsepower nor the storage space for the other
>> $ARCHes.
>>
>> [slab.h is the second most used header file after module.h; kernel.h
>> is right there with slab.h. There could be some minor error in the
>> counting due to some #includes having comments after them and I
>> didn't combine all of those.]
>>
>> This is Lingchi patch #1 (death by a thousand cuts, applied to kernel
>> header files).
>>
>> Signed-off-by: Randy Dunlap 
> 
> I threw it at a random selection of configs and so far the only failures
> I'm seeing are:
> 
>   lib/test_firmware.c:134:2: error: implicit declaration of function 'vfree' 
> [-Werror=implicit-function-declaration]   
>
>   lib/test_firmware.c:620:25: error: implicit declaration of function 
> 'vzalloc' [-Werror=implicit-function-declaration]
>   lib/test_firmware.c:620:2: error: implicit declaration of function 
> 'vzalloc' [-Werror=implicit-function-declaration]
>   security/integrity/digsig.c:146:2: error: implicit declaration of function 
> 'vfree' [-Werror=implicit-function-declaration]
> 

Both of those source files need to #include .

> Full results trickling in here, not all the failures there are caused by
> this patch, ie. some configs are broken in mainline:
> 
>   http://kisskb.ellerman.id.au/kisskb/head/13396/
> 
> cheers

:)

-- 
~Randy


samples/seccomp/ broken when cross compiling s390, ppc allyesconfig

2018-02-12 Thread Michal Hocko
Hi,
my build test machinery chokes on samples/seccomp when cross compiling
s390 and ppc64 allyesconfig. This has been the case for quite some
time already but I never found time to look at the problem and report
it. It seems this is not new issue and similar thing happend for
MIPS e9107f88c985 ("samples/seccomp/Makefile: do not build tests if
cross-compiling for MIPS").

The build logs are attached.

What is the best way around this? Should we simply skip compilation on
cross compile or is actually anybody relying on that? Or should I simply
disable it for s390 and ppc?
-- 
Michal Hocko
SUSE Labs
=== Config /home/mhocko/work/build-test/configs/s390/allyesconfig
security/integrity/ima/ima_api.c: In function 'ima_audit_measurement':
security/integrity/ima/ima_api.c:337:1: warning: 'ima_audit_measurement' uses 
dynamic stack allocation
 }
 ^
arch/s390/crypto/aes_s390.c: In function 'fallback_blk_dec':
arch/s390/crypto/aes_s390.c:217:1: warning: 'fallback_blk_dec' uses dynamic 
stack allocation
 }
 ^
arch/s390/crypto/aes_s390.c: In function 'fallback_blk_enc':
arch/s390/crypto/aes_s390.c:234:1: warning: 'fallback_blk_enc' uses dynamic 
stack allocation
 }
 ^
arch/s390/crypto/aes_s390.c: In function 'xts_aes_decrypt':
arch/s390/crypto/aes_s390.c:607:1: warning: 'xts_aes_decrypt' uses dynamic 
stack allocation
 }
 ^
arch/s390/crypto/aes_s390.c: In function 'xts_aes_encrypt':
arch/s390/crypto/aes_s390.c:593:1: warning: 'xts_aes_encrypt' uses dynamic 
stack allocation
 }
 ^
security/integrity/ima/ima_crypto.c: In function 
'ima_calc_field_array_hash_tfm.isra.3':
security/integrity/ima/ima_crypto.c:491:1: warning: 
'ima_calc_field_array_hash_tfm.isra.3' uses dynamic stack allocation
 }
 ^
security/integrity/ima/ima_crypto.c: In function 'ima_calc_file_hash':
security/integrity/ima/ima_crypto.c:441:1: warning: 'ima_calc_file_hash' uses 
dynamic stack allocation
 }
 ^
security/integrity/ima/ima_crypto.c: In function 'ima_calc_buffer_hash':
security/integrity/ima/ima_crypto.c:628:1: warning: 'ima_calc_buffer_hash' uses 
dynamic stack allocation
 }
 ^
security/integrity/ima/ima_crypto.c: In function 'ima_calc_boot_aggregate':
security/integrity/ima/ima_crypto.c:682:1: warning: 'ima_calc_boot_aggregate' 
uses dynamic stack allocation
 }
 ^
security/keys/dh.c: In function 'keyctl_dh_compute_kdf':
security/keys/dh.c:237:1: warning: 'keyctl_dh_compute_kdf' uses dynamic stack 
allocation
 }
 ^
security/keys/big_key.c: In function 'big_key_crypt':
security/keys/big_key.c:130:1: warning: 'big_key_crypt' uses dynamic stack 
allocation
 }
 ^
security/apparmor/crypto.c: In function 'aa_calc_hash':
security/apparmor/crypto.c:64:1: warning: 'aa_calc_hash' uses dynamic stack 
allocation
 }
 ^
security/apparmor/crypto.c: In function 'aa_calc_profile_hash':
security/apparmor/crypto.c:106:1: warning: 'aa_calc_profile_hash' uses dynamic 
stack allocation
 }
 ^
security/keys/encrypted-keys/encrypted.c: In function 'calc_hash':
security/keys/encrypted-keys/encrypted.c:337:1: warning: 'calc_hash' uses 
dynamic stack allocation
 }
 ^
crypto/cipher.c: In function 'cipher_crypt_unaligned':
crypto/cipher.c:76:1: warning: 'cipher_crypt_unaligned' uses dynamic stack 
allocation
 }
 ^
drivers/android/binder_alloc.c: In function 'binder_alloc_shrinker_init':
drivers/android/binder_alloc.c:1008:2: warning: ignoring return value of 
'register_shrinker', declared with attribute warn_unused_result 
[-Wunused-result]
  register_shrinker(&binder_shrinker);
  ^
drivers/gpio/gpiolib.c: In function 'gpiod_get_array_value_complex':
drivers/gpio/gpiolib.c:2644:1: warning: 'gpiod_get_array_value_complex' uses 
dynamic stack allocation
 }
 ^
drivers/gpio/gpiolib.c: In function 'gpiod_set_array_value_complex':
drivers/gpio/gpiolib.c:2873:1: warning: 'gpiod_set_array_value_complex' uses 
dynamic stack allocation
 }
 ^
drivers/atm/ambassador.c: In function 'do_loader_command':
drivers/atm/ambassador.c:1762:45: warning: passing argument 1 of 'virt_to_bus' 
discards 'volatile' qualifier from pointer target type
   wr_mem (dev, offsetof(amb_mem, doorbell), virt_to_bus (lb) & ~onegigmask);
 ^
In file included from ./arch/s390/include/asm/io.h:79:0,
 from ./include/linux/io.h:25,
 from ./include/linux/pci.h:33,
 from drivers/atm/ambassador.c:27:
./include/asm-generic/io.h:946:29: note: expected 'void *' but argument is of 
type 'volatile struct loader_block *'
 static inline unsigned long virt_to_bus(void *address)
 ^
In file included from samples/seccomp/bpf-fancy.c:21:0:
samples/seccomp/bpf-helper.h:135:2: error: #error __BITS_PER_LONG value 
unusable.
 #error __BITS_PER_LONG value unusable.
  ^
In file included from samples/seccomp/bpf-fancy.c:13:0:
samples/seccomp/bpf-fancy.c: In function ‘main’:
samples/seccomp/bpf-fancy.c:38:11: error: ‘__NR_exit’ undeclared (first use in 
this function)
   SYSCALL(__NR_exit, ALLOW),
   ^
./usr/include

[RFC REBASED 5/5] powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations

2018-02-12 Thread Christophe Leroy
The number of high slices a process might use now depends on its
address space size, and what allocation address it has requested.

This patch uses that limit throughout call chains where possible,
rather than use the fixed SLICE_NUM_HIGH for bitmap operations.
This saves some cost for processes that don't use very large address
spaces.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/slice.c | 111 ++--
 1 file changed, 60 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index b8b691369c29..683ff4604ab4 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -61,13 +61,12 @@ static void slice_print_mask(const char *label, const 
struct slice_mask *mask) {
 #endif
 
 static void slice_range_to_mask(unsigned long start, unsigned long len,
-   struct slice_mask *ret)
+   struct slice_mask *ret,
+   unsigned long high_slices)
 {
unsigned long end = start + len - 1;
 
ret->low_slices = 0;
-   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
if (start < SLICE_LOW_TOP) {
unsigned long mend = min(end,
 (unsigned long)(SLICE_LOW_TOP - 1));
@@ -76,6 +75,7 @@ static void slice_range_to_mask(unsigned long start, unsigned 
long len,
- (1u << GET_LOW_SLICE_INDEX(start));
}
 
+   slice_bitmap_zero(ret->high_slices, high_slices);
if ((start + len) > SLICE_LOW_TOP) {
unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
@@ -119,28 +119,27 @@ static int slice_high_has_vma(struct mm_struct *mm, 
unsigned long slice)
 }
 
 static void slice_mask_for_free(struct mm_struct *mm, struct slice_mask *ret,
-   unsigned long high_limit)
+   unsigned long high_slices)
 {
unsigned long i;
 
ret->low_slices = 0;
-   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
for (i = 0; i < SLICE_NUM_LOW; i++)
if (!slice_low_has_vma(mm, i))
ret->low_slices |= 1u << i;
 
-   if (high_limit <= SLICE_LOW_TOP)
+   if (!high_slices)
return;
 
-   for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++)
+   slice_bitmap_zero(ret->high_slices, high_slices);
+   for (i = 0; i < high_slices; i++)
if (!slice_high_has_vma(mm, i))
__set_bit(i, ret->high_slices);
 }
 
 static void calc_slice_mask_for_size(struct mm_struct *mm, int psize,
struct slice_mask *ret,
-   unsigned long high_limit)
+   unsigned long high_slices)
 {
unsigned char *hpsizes;
int index, mask_index;
@@ -148,18 +147,17 @@ static void calc_slice_mask_for_size(struct mm_struct 
*mm, int psize,
u64 lpsizes;
 
ret->low_slices = 0;
-   slice_bitmap_zero(ret->high_slices, SLICE_NUM_HIGH);
-
lpsizes = mm->context.low_slices_psize;
for (i = 0; i < SLICE_NUM_LOW; i++)
if (((lpsizes >> (i * 4)) & 0xf) == psize)
ret->low_slices |= 1u << i;
 
-   if (high_limit <= SLICE_LOW_TOP)
+   if (!high_slices)
return;
 
+   slice_bitmap_zero(ret->high_slices, high_slices);
hpsizes = mm->context.high_slices_psize;
-   for (i = 0; i < GET_HIGH_SLICE_INDEX(high_limit); i++) {
+   for (i = 0; i < high_slices; i++) {
mask_index = i & 0x1;
index = i >> 1;
if (((hpsizes[index] >> (mask_index * 4)) & 0xf) == psize)
@@ -168,16 +166,15 @@ static void calc_slice_mask_for_size(struct mm_struct 
*mm, int psize,
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
-static void recalc_slice_mask_cache(struct mm_struct *mm)
+static void recalc_slice_mask_cache(struct mm_struct *mm, unsigned long 
high_slices)
 {
-   unsigned long l = mm->context.slb_addr_limit;
-   calc_slice_mask_for_size(mm, MMU_PAGE_4K, &mm->context.mask_4k, l);
+   calc_slice_mask_for_size(mm, MMU_PAGE_4K, &mm->context.mask_4k, 
high_slices);
 #ifdef CONFIG_PPC_64K_PAGES
-   calc_slice_mask_for_size(mm, MMU_PAGE_64K, &mm->context.mask_64k, l);
+   calc_slice_mask_for_size(mm, MMU_PAGE_64K, &mm->context.mask_64k, 
high_slices);
 #endif
 #ifdef CONFIG_HUGETLB_PAGE
-   calc_slice_mask_for_size(mm, MMU_PAGE_16M, &mm->context.mask_16m, l);
-   calc_slice_mask_for_size(mm, MMU_PAGE_16G, &mm->context.mask_16g, l);
+   calc_slice_mask_for_size(mm, MMU_PAGE_16M, &mm->context.mask_16m, 
high_slices);
+   calc_slice_mask_for_size(mm, MMU_PAGE_16G, &mm->context.mask_16g, 
high_slices);
 #endif
 }
 
@@ -198,17 +195,16 @@ static const s

[RFC REBASED 4/5] powerpc/mm/slice: Use const pointers to cached slice masks where possible

2018-02-12 Thread Christophe Leroy
The slice_mask cache was a basic conversion which copied the slice
mask into caller's structures, because that's how the original code
worked. In most cases the pointer can be used directly instead, saving
a copy and an on-stack structure.

This also converts the slice_mask bit operation helpers to be the usual
3-operand kind, which is clearer to work with. And we remove some
unnecessary intermediate bitmaps, reducing stack and copy overhead
further.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/slice.h |  7 +++
 arch/powerpc/include/asm/nohash/32/slice.h |  6 +++
 arch/powerpc/mm/slice.c| 77 ++
 3 files changed, 59 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/slice.h 
b/arch/powerpc/include/asm/book3s/64/slice.h
index f9a2c8bd7a77..be1ce8e91ad1 100644
--- a/arch/powerpc/include/asm/book3s/64/slice.h
+++ b/arch/powerpc/include/asm/book3s/64/slice.h
@@ -63,6 +63,13 @@ static inline void slice_bitmap_set(unsigned long *map, 
unsigned int start,
 {
bitmap_set(map, start, nbits);
 }
+
+static inline void slice_bitmap_copy(unsigned long *dst,
+const unsigned long *src,
+unsigned int nbits)
+{
+   bitmap_copy(dst, src, nbits);
+}
 #endif /* __ASSEMBLY__ */
 
 #else /* CONFIG_PPC_MM_SLICES */
diff --git a/arch/powerpc/include/asm/nohash/32/slice.h 
b/arch/powerpc/include/asm/nohash/32/slice.h
index bcb4924f7d22..38f041e01a0a 100644
--- a/arch/powerpc/include/asm/nohash/32/slice.h
+++ b/arch/powerpc/include/asm/nohash/32/slice.h
@@ -58,6 +58,12 @@ static inline void slice_bitmap_set(unsigned long *map, 
unsigned int start,
unsigned int nbits)
 {
 }
+
+static inline void slice_bitmap_copy(unsigned long *dst,
+const unsigned long *src,
+unsigned int nbits)
+{
+}
 #endif /* __ASSEMBLY__ */
 
 #endif /* CONFIG_PPC_MM_SLICES */
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 311168ca3939..b8b691369c29 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -468,21 +468,30 @@ static unsigned long slice_find_area(struct mm_struct 
*mm, unsigned long len,
return slice_find_area_bottomup(mm, len, mask, psize, 
high_limit);
 }
 
-static inline void slice_or_mask(struct slice_mask *dst,
+static inline void slice_copy_mask(struct slice_mask *dst,
const struct slice_mask *src)
 {
-   dst->low_slices |= src->low_slices;
-   slice_bitmap_or(dst->high_slices, dst->high_slices, src->high_slices,
+   dst->low_slices = src->low_slices;
+   slice_bitmap_copy(dst->high_slices, src->high_slices, SLICE_NUM_HIGH);
+}
+
+static inline void slice_or_mask(struct slice_mask *dst,
+const struct slice_mask *src1,
+const struct slice_mask *src2)
+{
+   dst->low_slices = src1->low_slices | src2->low_slices;
+   slice_bitmap_or(dst->high_slices, src1->high_slices, src2->high_slices,
SLICE_NUM_HIGH);
 }
 
 static inline void slice_andnot_mask(struct slice_mask *dst,
-   const struct slice_mask *src)
+const struct slice_mask *src1,
+const struct slice_mask *src2)
 {
-   dst->low_slices &= ~src->low_slices;
+   dst->low_slices = src1->low_slices & ~src2->low_slices;
 
-   slice_bitmap_andnot(dst->high_slices, dst->high_slices,
-   src->high_slices, SLICE_NUM_HIGH);
+   slice_bitmap_andnot(dst->high_slices, src1->high_slices,
+   src2->high_slices, SLICE_NUM_HIGH);
 }
 
 #ifdef CONFIG_PPC_64K_PAGES
@@ -495,10 +504,10 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
  unsigned long flags, unsigned int psize,
  int topdown)
 {
-   struct slice_mask mask;
struct slice_mask good_mask;
struct slice_mask potential_mask;
-   struct slice_mask compat_mask;
+   const struct slice_mask *maskp;
+   const struct slice_mask *compat_maskp = NULL;
int fixed = (flags & MAP_FIXED);
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
unsigned long page_size = 1UL << pshift;
@@ -537,9 +546,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
potential_mask.low_slices = 0;
slice_bitmap_zero(potential_mask.high_slices, SLICE_NUM_HIGH);
 
-   compat_mask.low_slices = 0;
-   slice_bitmap_zero(compat_mask.high_slices, SLICE_NUM_HIGH);
-
/* Sanity checks */
BUG_ON(mm->task_size == 0);
BUG_ON(mm->context.slb_addr_limit == 0);
@@ -56

[RFC REBASED 3/5] powerpc/mm/slice: implement slice_check_range_fits

2018-02-12 Thread Christophe Leroy
Rather than build slice masks from a range then use that to check for
fit in a candidate mask, implement slice_check_range_fits that checks
if a range fits in a mask directly.

This allows several structures to be removed from stacks, and also we
don't expect a huge range in a lot of these cases, so building and
comparing a full mask is going to be more expensive than testing just
one or two bits of the range.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/slice.c | 68 ++---
 1 file changed, 36 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index ddf015d2d05b..311168ca3939 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -233,22 +233,36 @@ static const struct slice_mask 
*slice_mask_for_size(struct mm_struct *mm, int ps
 #error "Must define the slice masks for page sizes supported by the platform"
 #endif
 
-static int slice_check_fit(struct mm_struct *mm,
-  const struct slice_mask *mask,
-  const struct slice_mask *available)
+static bool slice_check_range_fits(struct mm_struct *mm,
+  const struct slice_mask *available,
+  unsigned long start, unsigned long len)
 {
-   DECLARE_BITMAP(result, SLICE_NUM_HIGH);
-   /*
-* Make sure we just do bit compare only to the max
-* addr limit and not the full bit map size.
-*/
-   unsigned long slice_count = 
GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
+   unsigned long end = start + len - 1;
+   u64 low_slices = 0;
+
+   if (start < SLICE_LOW_TOP) {
+   unsigned long mend = min(end,
+(unsigned long)(SLICE_LOW_TOP - 1));
+
+   low_slices = (1u << (GET_LOW_SLICE_INDEX(mend) + 1))
+   - (1u << GET_LOW_SLICE_INDEX(start));
+   }
+   if ((low_slices & available->low_slices) != low_slices)
+   return false;
+
+   if ((start + len) > SLICE_LOW_TOP) {
+   unsigned long start_index = GET_HIGH_SLICE_INDEX(start);
+   unsigned long align_end = ALIGN(end, (1UL << SLICE_HIGH_SHIFT));
+   unsigned long count = GET_HIGH_SLICE_INDEX(align_end) - 
start_index;
+   unsigned long i;
 
-   slice_bitmap_and(result, mask->high_slices, available->high_slices,
-slice_count);
+   for (i = start_index; i < start_index + count; i++) {
+   if (!test_bit(i, available->high_slices))
+   return false;
+   }
+   }
 
-   return (mask->low_slices & available->low_slices) == mask->low_slices &&
-   slice_bitmap_equal(result, mask->high_slices, slice_count);
+   return true;
 }
 
 static void slice_flush_segments(void *parm)
@@ -519,12 +533,6 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
on_each_cpu(slice_flush_segments, mm, 1);
}
 
-   /*
-* init different masks
-*/
-   mask.low_slices = 0;
-   slice_bitmap_zero(mask.high_slices, SLICE_NUM_HIGH);
-
/* silence stupid warning */;
potential_mask.low_slices = 0;
slice_bitmap_zero(potential_mask.high_slices, SLICE_NUM_HIGH);
@@ -586,15 +594,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
 #endif
 
/* First check hint if it's valid or if we have MAP_FIXED */
-   if (addr != 0 || fixed) {
-   /* Build a mask for the requested range */
-   slice_range_to_mask(addr, len, &mask);
-   slice_print_mask(" mask", &mask);
-
+   if (addr || fixed) {
/* Check if we fit in the good mask. If we do, we just return,
 * nothing else to do
 */
-   if (slice_check_fit(mm, &mask, &good_mask)) {
+   if (slice_check_range_fits(mm, &good_mask, addr, len)) {
slice_dbg(" fits good !\n");
return addr;
}
@@ -620,10 +624,11 @@ unsigned long slice_get_unmapped_area(unsigned long addr, 
unsigned long len,
slice_or_mask(&potential_mask, &good_mask);
slice_print_mask(" potential", &potential_mask);
 
-   if ((addr != 0 || fixed) &&
-   slice_check_fit(mm, &mask, &potential_mask)) {
-   slice_dbg(" fits potential !\n");
-   goto convert;
+   if (addr || fixed) {
+   if (slice_check_range_fits(mm, &potential_mask, addr, len)) {
+   slice_dbg(" fits potential !\n");
+   goto convert;
+   }
}
 
/* If we have MAP_FIXED and failed the above steps, then error out */
@@ -829,13 +834,12 @@ void slice_set_range_psize(struct mm_struct *mm, uns

[RFC REBASED 2/5] powerpc/mm/slice: implement a slice mask cache

2018-02-12 Thread Christophe Leroy
Calculating the slice mask can become a signifcant overhead for
get_unmapped_area. This patch adds a struct slice_mask for
each page size in the mm_context, and keeps these in synch with
the slices psize arrays and slb_addr_limit.

This saves about 30% kernel time on a single-page mmap/munmap micro
benchmark.

Signed-off-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  20 ++-
 arch/powerpc/include/asm/mmu-8xx.h   |  16 -
 arch/powerpc/mm/slice.c  | 100 ++-
 3 files changed, 118 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
b/arch/powerpc/include/asm/book3s/64/mmu.h
index 0abeb0e2d616..b6d136fd8ffd 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -80,6 +80,16 @@ struct spinlock;
 /* Maximum possible number of NPUs in a system. */
 #define NV_MAX_NPUS 8
 
+/*
+ * One bit per slice. We have lower slices which cover 256MB segments
+ * upto 4G range. That gets us 16 low slices. For the rest we track slices
+ * in 1TB size.
+ */
+struct slice_mask {
+   u64 low_slices;
+   DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
+};
+
 typedef struct {
mm_context_id_t id;
u16 user_psize; /* page size index */
@@ -91,9 +101,17 @@ typedef struct {
struct npu_context *npu_context;
 
 #ifdef CONFIG_PPC_MM_SLICES
+   unsigned long slb_addr_limit;
u64 low_slices_psize;   /* SLB page size encodings */
unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
-   unsigned long slb_addr_limit;
+# ifdef CONFIG_PPC_64K_PAGES
+   struct slice_mask mask_64k;
+# endif
+   struct slice_mask mask_4k;
+# ifdef CONFIG_HUGETLB_PAGE
+   struct slice_mask mask_16m;
+   struct slice_mask mask_16g;
+# endif
 #else
u16 sllp;   /* SLB page size encoding */
 #endif
diff --git a/arch/powerpc/include/asm/mmu-8xx.h 
b/arch/powerpc/include/asm/mmu-8xx.h
index b324ab46d838..b97d4ed3dddf 100644
--- a/arch/powerpc/include/asm/mmu-8xx.h
+++ b/arch/powerpc/include/asm/mmu-8xx.h
@@ -187,15 +187,29 @@
 #define M_APG3 0x0060
 
 #ifndef __ASSEMBLY__
+struct slice_mask {
+   u64 low_slices;
+   DECLARE_BITMAP(high_slices, 0);
+};
+
 typedef struct {
unsigned int id;
unsigned int active;
unsigned long vdso_base;
 #ifdef CONFIG_PPC_MM_SLICES
+   unsigned long slb_addr_limit;
u16 user_psize; /* page size index */
u64 low_slices_psize;   /* page size encodings */
unsigned char high_slices_psize[0];
-   unsigned long slb_addr_limit;
+# ifdef CONFIG_PPC_16K_PAGES
+   struct slice_mask mask_16k;
+# else
+   struct slice_mask mask_4k;
+# endif
+# ifdef CONFIG_HUGETLB_PAGE
+   struct slice_mask mask_512k;
+   struct slice_mask mask_8m;
+# endif
 #endif
 } mm_context_t;
 
diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index db1278ac21c2..ddf015d2d05b 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -37,15 +37,6 @@
 #include 
 
 static DEFINE_SPINLOCK(slice_convert_lock);
-/*
- * One bit per slice. We have lower slices which cover 256MB segments
- * upto 4G range. That gets us 16 low slices. For the rest we track slices
- * in 1TB size.
- */
-struct slice_mask {
-   u64 low_slices;
-   DECLARE_BITMAP(high_slices, SLICE_NUM_HIGH);
-};
 
 #ifdef DEBUG
 int _slice_debug = 1;
@@ -147,7 +138,7 @@ static void slice_mask_for_free(struct mm_struct *mm, 
struct slice_mask *ret,
__set_bit(i, ret->high_slices);
 }
 
-static void slice_mask_for_size(struct mm_struct *mm, int psize,
+static void calc_slice_mask_for_size(struct mm_struct *mm, int psize,
struct slice_mask *ret,
unsigned long high_limit)
 {
@@ -176,6 +167,72 @@ static void slice_mask_for_size(struct mm_struct *mm, int 
psize,
}
 }
 
+#ifdef CONFIG_PPC_BOOK3S_64
+static void recalc_slice_mask_cache(struct mm_struct *mm)
+{
+   unsigned long l = mm->context.slb_addr_limit;
+   calc_slice_mask_for_size(mm, MMU_PAGE_4K, &mm->context.mask_4k, l);
+#ifdef CONFIG_PPC_64K_PAGES
+   calc_slice_mask_for_size(mm, MMU_PAGE_64K, &mm->context.mask_64k, l);
+#endif
+#ifdef CONFIG_HUGETLB_PAGE
+   calc_slice_mask_for_size(mm, MMU_PAGE_16M, &mm->context.mask_16m, l);
+   calc_slice_mask_for_size(mm, MMU_PAGE_16G, &mm->context.mask_16g, l);
+#endif
+}
+
+static const struct slice_mask *slice_mask_for_size(struct mm_struct *mm, int 
psize)
+{
+#ifdef CONFIG_PPC_64K_PAGES
+   if (psize == MMU_PAGE_64K)
+   return &mm->context.mask_64k;
+#endif
+   if (psize == MMU_PAGE_4K)
+   return &mm->context.mask_4k;
+#ifdef CONFIG_HUGETLB_PAGE
+   if (psize == MMU_PAGE_16M)
+   return &mm->context.mask_16m;
+   if (psize == MMU_PAGE_16G)
+  

[RFC REBASED 1/5] powerpc/mm/slice: pass pointers to struct slice_mask where possible

2018-02-12 Thread Christophe Leroy
Pass around const pointers to struct slice_mask where possible, rather
than copies of slice_mask, to reduce stack and call overhead.

checkstack.pl gives, before:
0x0de4 slice_get_unmapped_area [slice.o]:   656
0x1b4c is_hugepage_only_range [slice.o]:512
0x075c slice_find_area_topdown [slice.o]:   416
0x04c8 slice_find_area_bottomup.isra.1 [slice.o]:   272
0x1aa0 slice_set_range_psize [slice.o]: 240
0x0a64 slice_find_area [slice.o]:   176
0x0174 slice_check_fit [slice.o]:   112

after:
0x0bd4 slice_get_unmapped_area [slice.o]:   496
0x17cc is_hugepage_only_range [slice.o]:352
0x0758 slice_find_area [slice.o]:   144
0x1750 slice_set_range_psize [slice.o]: 144
0x0180 slice_check_fit [slice.o]:   128
0x05b0 slice_find_area_bottomup.isra.2 [slice.o]:   128

Signed-off-by: Nicholas Piggin 
Signed-off-by: Christophe Leroy 
---
 rebased on top of "[v4,3/5] powerpc/mm/slice: Fix hugepage allocation at hint 
address on 8xx" (https://patchwork.ozlabs.org/patch/871675/)

 arch/powerpc/mm/slice.c | 81 +++--
 1 file changed, 44 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/mm/slice.c b/arch/powerpc/mm/slice.c
index 549704dfa777..db1278ac21c2 100644
--- a/arch/powerpc/mm/slice.c
+++ b/arch/powerpc/mm/slice.c
@@ -50,19 +50,21 @@ struct slice_mask {
 #ifdef DEBUG
 int _slice_debug = 1;
 
-static void slice_print_mask(const char *label, struct slice_mask mask)
+static void slice_print_mask(const char *label, const struct slice_mask *mask)
 {
if (!_slice_debug)
return;
-   pr_devel("%s low_slice: %*pbl\n", label, (int)SLICE_NUM_LOW, 
&mask.low_slices);
-   pr_devel("%s high_slice: %*pbl\n", label, (int)SLICE_NUM_HIGH, 
mask.high_slices);
+   pr_devel("%s low_slice: %*pbl\n", label,
+   (int)SLICE_NUM_LOW, &mask->low_slices);
+   pr_devel("%s high_slice: %*pbl\n", label,
+   (int)SLICE_NUM_HIGH, mask->high_slices);
 }
 
 #define slice_dbg(fmt...) do { if (_slice_debug) pr_devel(fmt); } while (0)
 
 #else
 
-static void slice_print_mask(const char *label, struct slice_mask mask) {}
+static void slice_print_mask(const char *label, const struct slice_mask *mask) 
{}
 #define slice_dbg(fmt...)
 
 #endif
@@ -145,7 +147,8 @@ static void slice_mask_for_free(struct mm_struct *mm, 
struct slice_mask *ret,
__set_bit(i, ret->high_slices);
 }
 
-static void slice_mask_for_size(struct mm_struct *mm, int psize, struct 
slice_mask *ret,
+static void slice_mask_for_size(struct mm_struct *mm, int psize,
+   struct slice_mask *ret,
unsigned long high_limit)
 {
unsigned char *hpsizes;
@@ -174,7 +177,8 @@ static void slice_mask_for_size(struct mm_struct *mm, int 
psize, struct slice_ma
 }
 
 static int slice_check_fit(struct mm_struct *mm,
-  struct slice_mask mask, struct slice_mask available)
+  const struct slice_mask *mask,
+  const struct slice_mask *available)
 {
DECLARE_BITMAP(result, SLICE_NUM_HIGH);
/*
@@ -183,11 +187,11 @@ static int slice_check_fit(struct mm_struct *mm,
 */
unsigned long slice_count = 
GET_HIGH_SLICE_INDEX(mm->context.slb_addr_limit);
 
-   slice_bitmap_and(result, mask.high_slices, available.high_slices,
+   slice_bitmap_and(result, mask->high_slices, available->high_slices,
 slice_count);
 
-   return (mask.low_slices & available.low_slices) == mask.low_slices &&
-   slice_bitmap_equal(result, mask.high_slices, slice_count);
+   return (mask->low_slices & available->low_slices) == mask->low_slices &&
+   slice_bitmap_equal(result, mask->high_slices, slice_count);
 }
 
 static void slice_flush_segments(void *parm)
@@ -207,7 +211,8 @@ static void slice_flush_segments(void *parm)
 #endif
 }
 
-static void slice_convert(struct mm_struct *mm, struct slice_mask mask, int 
psize)
+static void slice_convert(struct mm_struct *mm,
+   const struct slice_mask *mask, int psize)
 {
int index, mask_index;
/* Write the new slice psize bits */
@@ -225,7 +230,7 @@ static void slice_convert(struct mm_struct *mm, struct 
slice_mask mask, int psiz
 
lpsizes = mm->context.low_slices_psize;
for (i = 0; i < SLICE_NUM_LOW; i++)
-   if (mask.low_slices & (1u << i))
+   if (mask->low_slices & (1u << i))
lpsizes = (lpsizes & ~(0xful << (i * 4))) |
(((unsigned long)psize) << (i * 4));
 
@@ -236,7 +241,7 @@ static void slice_convert(struct mm_struct *mm, struct 
slice_mask mask, int psiz
for (i = 0; i < GET_HIGH_SLICE_INDEX(m

Re: [RFC PATCH 0/5] powerpc/mm/slice: improve slice speed and stack use

2018-02-12 Thread Christophe LEROY



Le 12/02/2018 à 16:24, Nicholas Piggin a écrit :

On Mon, 12 Feb 2018 16:02:23 +0100
Christophe LEROY  wrote:


Le 10/02/2018 à 09:11, Nicholas Piggin a écrit :

This series intends to improve performance and reduce stack
consumption in the slice allocation code. It does it by keeping slice
masks in the mm_context rather than compute them for each allocation,
and by reducing bitmaps and slice_masks from stacks, using pointers
instead where possible.

checkstack.pl gives, before:
0x0de4 slice_get_unmapped_area [slice.o]:   656
0x1b4c is_hugepage_only_range [slice.o]:512
0x075c slice_find_area_topdown [slice.o]:   416
0x04c8 slice_find_area_bottomup.isra.1 [slice.o]:   272
0x1aa0 slice_set_range_psize [slice.o]: 240
0x0a64 slice_find_area [slice.o]:   176
0x0174 slice_check_fit [slice.o]:   112

after:
0x0d70 slice_get_unmapped_area [slice.o]:   320
0x08f8 slice_find_area [slice.o]:   144
0x1860 slice_set_range_psize [slice.o]: 144
0x18ec is_hugepage_only_range [slice.o]:144
0x0750 slice_find_area_bottomup.isra.4 [slice.o]:   128

The benchmark in https://github.com/linuxppc/linux/issues/49 gives, before:
$ time ./slicemask
real0m20.712s
user0m5.830s
sys 0m15.105s

after:
$ time ./slicemask
real0m13.197s
user0m5.409s
sys 0m7.779s


Hi,

I tested your serie on an 8xx, on top of patch
https://patchwork.ozlabs.org/patch/871675/

I don't get a result as significant as yours, but there is some
improvment anyway:

ITERATION 50

Before:

root@vgoip:~# time ./slicemask
real0m 33.26s
user0m 1.94s
sys 0m 30.85s

After:
root@vgoip:~# time ./slicemask
real0m 29.69s
user0m 2.11s
sys 0m 27.15s

Most significant improvment is obtained with the first patch of your serie:
root@vgoip:~# time ./slicemask
real0m 30.85s
user0m 1.80s
sys 0m 28.57s


Okay, thanks. Are you still spending significant time in the slice
code?


Do you mean am I still updating my patches ? No I hope we are at last 
run with v4 now that Aneesh has tagged all of them as reviewed-by himself.
Once the serie has been accepted, my next step will be to backport at 
least the 3 first ones in kernel 4.14






Had to modify your serie a bit, if you are interested I can post it.



Sure, that would be good.


Ok, lets share it. The patch are not 100% clean.

Christophe


[RFC PATCH 12/12] powerpc/64s/radix: allocate kernel page tables node-local if possible

2018-02-12 Thread Nicholas Piggin
Try to allocate kernel page tables according to the node of
the memory they will map.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/book3s/64/hash.h  |   2 +-
 arch/powerpc/include/asm/book3s/64/radix.h |   2 +-
 arch/powerpc/include/asm/sparsemem.h   |   2 +-
 arch/powerpc/mm/hash_utils_64.c|   2 +-
 arch/powerpc/mm/mem.c  |   4 +-
 arch/powerpc/mm/pgtable-book3s64.c |   6 +-
 arch/powerpc/mm/pgtable-radix.c| 178 +++--
 7 files changed, 128 insertions(+), 68 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 0920eff731b3..b1ace9619e94 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -201,7 +201,7 @@ extern int __meminit hash__vmemmap_create_mapping(unsigned 
long start,
 extern void hash__vmemmap_remove_mapping(unsigned long start,
 unsigned long page_size);
 
-int hash__create_section_mapping(unsigned long start, unsigned long end);
+int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index 19c44e1495ae..4edcc797cf43 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -319,7 +319,7 @@ static inline unsigned long radix__get_tree_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int radix__create_section_mapping(unsigned long start, unsigned long end);
+int radix__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index a7916ee6dfb6..bc66712bdc3c 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -17,7 +17,7 @@
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-extern int create_section_mapping(unsigned long start, unsigned long end);
+extern int create_section_mapping(unsigned long start, unsigned long end, int 
nid);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 7d07c7e17db6..ceb5494804b2 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -781,7 +781,7 @@ void resize_hpt_for_hotplug(unsigned long new_mem_size)
}
 }
 
-int hash__create_section_mapping(unsigned long start, unsigned long end)
+int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid)
 {
int rc = htab_bolt_mapping(start, end, __pa(start),
   pgprot_val(PAGE_KERNEL), mmu_linear_psize,
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 70f7b6426a15..3f75fbb10c87 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -117,7 +117,7 @@ int memory_add_physaddr_to_nid(u64 start)
 }
 #endif
 
-int __weak create_section_mapping(unsigned long start, unsigned long end)
+int __weak create_section_mapping(unsigned long start, unsigned long end, int 
nid)
 {
return -ENODEV;
 }
@@ -136,7 +136,7 @@ int arch_add_memory(int nid, u64 start, u64 size, bool 
want_memblock)
resize_hpt_for_hotplug(memblock_phys_mem_size());
 
start = (unsigned long)__va(start);
-   rc = create_section_mapping(start, start + size);
+   rc = create_section_mapping(start, start + size, nid);
if (rc) {
pr_warn("Unable to create mapping for hot added memory 
0x%llx..0x%llx: %d\n",
start, start + size, rc);
diff --git a/arch/powerpc/mm/pgtable-book3s64.c 
b/arch/powerpc/mm/pgtable-book3s64.c
index 3b65917785a5..2b7375f05408 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -152,12 +152,12 @@ void mmu_cleanup_all(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int create_section_mapping(unsigned long start, unsigned long end)
+int create_section_mapping(unsigned long start, unsigned long end, int nid)
 {
if (radix_enabled())
-   return radix__create_section_mapping(start, end);
+   return radix__create_section_mapping(start, end, nid);
 
-   return hash__create_section_mapping(start, end);
+   return hash__create_section_mapping(start, end, nid);
 }
 
 int remove_section_mapping(unsigned long start, unsigned long end)
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 573a9a2ee455..716a68baf137 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -46,

[RFC PATCH 11/12] powerpc/64: allocate per-cpu stacks node-local if possible

2018-02-12 Thread Nicholas Piggin
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/setup_64.c | 51 ++
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 02fa358982e6..16ea71fa1ead 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -611,6 +611,21 @@ __init u64 ppc64_bolted_size(void)
 #endif
 }
 
+static void *__init alloc_stack(unsigned long limit, int cpu)
+{
+   unsigned long pa;
+
+   pa = memblock_alloc_base_nid(THREAD_SIZE, THREAD_SIZE, limit,
+   early_cpu_to_node(cpu), MEMBLOCK_NONE);
+   if (!pa) {
+   pa = memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit);
+   if (!pa)
+   panic("cannot allocate stacks");
+   }
+
+   return __va(pa);
+}
+
 void __init irqstack_early_init(void)
 {
u64 limit = ppc64_bolted_size();
@@ -622,12 +637,8 @@ void __init irqstack_early_init(void)
 * accessed in realmode.
 */
for_each_possible_cpu(i) {
-   softirq_ctx[i] = (struct thread_info *)
-   __va(memblock_alloc_base(THREAD_SIZE,
-   THREAD_SIZE, limit));
-   hardirq_ctx[i] = (struct thread_info *)
-   __va(memblock_alloc_base(THREAD_SIZE,
-   THREAD_SIZE, limit));
+   softirq_ctx[i] = alloc_stack(limit, i);
+   hardirq_ctx[i] = alloc_stack(limit, i);
}
 }
 
@@ -635,20 +646,21 @@ void __init irqstack_early_init(void)
 void __init exc_lvl_early_init(void)
 {
unsigned int i;
-   unsigned long sp;
 
for_each_possible_cpu(i) {
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   critirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->crit_kstack = __va(sp + THREAD_SIZE);
+   void *sp;
 
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   dbgirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->dbg_kstack = __va(sp + THREAD_SIZE);
+   sp = alloc_stack(ULONG_MAX, i);
+   critirq_ctx[i] = sp;
+   paca_ptrs[i]->crit_kstack = sp + THREAD_SIZE;
 
-   sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
-   mcheckirq_ctx[i] = (struct thread_info *)__va(sp);
-   paca_ptrs[i]->mc_kstack = __va(sp + THREAD_SIZE);
+   sp = alloc_stack(ULONG_MAX, i);
+   dbgirq_ctx[i] = sp;
+   paca_ptrs[i]->dbg_kstack = sp + THREAD_SIZE;
+
+   sp = alloc_stack(ULONG_MAX, i);
+   mcheckirq_ctx[i] = sp;
+   paca_ptrs[i]->mc_kstack = sp + THREAD_SIZE;
}
 
if (cpu_has_feature(CPU_FTR_DEBUG_LVL_EXC))
@@ -702,20 +714,21 @@ void __init emergency_stack_init(void)
 
for_each_possible_cpu(i) {
struct thread_info *ti;
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
/* emergency stack for NMI exception handling. */
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->nmi_emergency_sp = (void *)ti + THREAD_SIZE;
 
/* emergency stack for machine check exception handling. */
-   ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
+   ti = alloc_stack(limit, i);
memset(ti, 0, THREAD_SIZE);
emerg_stack_init_thread_info(ti, i);
paca_ptrs[i]->mc_emergency_sp = (void *)ti + THREAD_SIZE;
-- 
2.16.1



[RFC PATCH 10/12] powerpc/64: allocate pacas per node

2018-02-12 Thread Nicholas Piggin
Per-node allocations are possible on 64s with radix that does
not have the bolted SLB limitation.

Hash would be able to do the same if all CPUs had the bottom of
their node-local memory bolted as well. This is left as an
exercise for the reader.
---
 arch/powerpc/kernel/paca.c | 41 +++--
 arch/powerpc/kernel/setup_64.c |  4 
 2 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 981886293369..f455e39e51e8 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -20,6 +20,37 @@
 
 #include "setup.h"
 
+static void *__init alloc_paca_data(unsigned long size, unsigned long align,
+   unsigned long limit, int cpu)
+{
+   unsigned long pa;
+   int nid;
+
+   /*
+* boot_cpuid paca is allocated very early before cpu_to_node is up.
+* Set bottom-up mode, because the boot CPU should be on node-0,
+* which will put its paca in the right place.
+*/
+   if (cpu == boot_cpuid) {
+   nid = -1;
+   memblock_set_bottom_up(true);
+   } else {
+   nid = early_cpu_to_node(cpu);
+   }
+
+   pa = memblock_alloc_base_nid(size, align, limit, nid, MEMBLOCK_NONE);
+   if (!pa) {
+   pa = memblock_alloc_base(size, align, limit);
+   if (!pa)
+   panic("cannot allocate paca data");
+   }
+
+   if (cpu == boot_cpuid)
+   memblock_set_bottom_up(false);
+
+   return __va(pa);
+}
+
 #ifdef CONFIG_PPC_PSERIES
 
 /*
@@ -52,7 +83,7 @@ static struct lppaca * __init new_lppaca(int cpu, unsigned 
long limit)
if (early_cpu_has_feature(CPU_FTR_HVMODE))
return NULL;
 
-   lp = __va(memblock_alloc_base(size, 0x400, limit));
+   lp = alloc_paca_data(size, 0x400, limit, cpu);
init_lppaca(lp);
 
return lp;
@@ -92,7 +123,7 @@ static struct slb_shadow * __init new_slb_shadow(int cpu, 
unsigned long limit)
return NULL;
}
 
-   s = __va(memblock_alloc_base(sizeof(*s), L1_CACHE_BYTES, limit));
+   s = alloc_paca_data(sizeof(*s), L1_CACHE_BYTES, limit, cpu);
memset(s, 0, sizeof(*s));
 
s->persistent = cpu_to_be32(SLB_NUM_BOLTED);
@@ -183,7 +214,6 @@ void __init allocate_paca_ptrs(void)
 void __init allocate_paca(int cpu)
 {
u64 limit;
-   unsigned long pa;
struct paca_struct *paca;
 
BUG_ON(cpu >= paca_nr_cpu_ids);
@@ -198,9 +228,8 @@ void __init allocate_paca(int cpu)
limit = ppc64_rma_size;
 #endif
 
-   pa = memblock_alloc_base(sizeof(struct paca_struct),
-   L1_CACHE_BYTES, limit);
-   paca = __va(pa);
+   paca = alloc_paca_data(sizeof(struct paca_struct), L1_CACHE_BYTES,
+   limit, cpu);
paca_ptrs[cpu] = paca;
memset(paca, 0, sizeof(struct paca_struct));
 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index dde34d35d1e7..02fa358982e6 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -312,6 +312,10 @@ void __init early_setup(unsigned long dt_ptr)
early_init_devtree(__va(dt_ptr));
 
/* Now we know the logical id of our boot cpu, setup the paca. */
+   if (boot_cpuid != 0) {
+   /* Poison paca_ptrs[0] again if it's not the boot cpu */
+   memset(&paca_ptrs[0], 0x88, sizeof(paca_ptrs[0]));
+   }
setup_paca(paca_ptrs[boot_cpuid]);
fixup_boot_paca();
 
-- 
2.16.1



[RFC PATCH 09/12] powerpc/64: defer paca allocation until memory topology is discovered

2018-02-12 Thread Nicholas Piggin
---
 arch/powerpc/include/asm/paca.h|  3 +-
 arch/powerpc/kernel/paca.c | 80 +-
 arch/powerpc/kernel/prom.c |  5 ++-
 arch/powerpc/kernel/setup-common.c |  2 +
 4 files changed, 35 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index f266b0a7be95..407a8076edd7 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -252,7 +252,8 @@ extern void copy_mm_to_paca(struct mm_struct *mm);
 extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
-extern void allocate_pacas(void);
+extern void allocate_paca_ptrs(void);
+extern void allocate_paca(int cpu);
 extern void free_unused_pacas(void);
 
 #else /* CONFIG_PPC64 */
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index e560072f122b..981886293369 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -169,12 +169,24 @@ void setup_paca(struct paca_struct *new_paca)
 
 static int __initdata paca_nr_cpu_ids;
 static int __initdata paca_ptrs_size;
+static int __initdata paca_struct_size;
 
-void __init allocate_pacas(void)
+void __init allocate_paca_ptrs(void)
+{
+   paca_nr_cpu_ids = nr_cpu_ids;
+
+   paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+   paca_ptrs = __va(memblock_alloc(paca_ptrs_size, 0));
+   memset(paca_ptrs, 0x88, paca_ptrs_size);
+}
+
+void __init allocate_paca(int cpu)
 {
u64 limit;
-   unsigned long size = 0;
-   int cpu;
+   unsigned long pa;
+   struct paca_struct *paca;
+
+   BUG_ON(cpu >= paca_nr_cpu_ids);
 
 #ifdef CONFIG_PPC_BOOK3S_64
/*
@@ -186,69 +198,30 @@ void __init allocate_pacas(void)
limit = ppc64_rma_size;
 #endif
 
-   paca_nr_cpu_ids = nr_cpu_ids;
-
-   paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
-   paca_ptrs = __va(memblock_alloc_base(paca_ptrs_size, 0, limit));
-   memset(paca_ptrs, 0, paca_ptrs_size);
-
-   size += paca_ptrs_size;
-
-   for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
-   unsigned long pa;
-
-   pa = memblock_alloc_base(sizeof(struct paca_struct),
-   L1_CACHE_BYTES, limit);
-   paca_ptrs[cpu] = __va(pa);
-   memset(paca_ptrs[cpu], 0, sizeof(struct paca_struct));
+   pa = memblock_alloc_base(sizeof(struct paca_struct),
+   L1_CACHE_BYTES, limit);
+   paca = __va(pa);
+   paca_ptrs[cpu] = paca;
+   memset(paca, 0, sizeof(struct paca_struct));
 
-   size += sizeof(struct paca_struct);
-   }
-
-   printk(KERN_DEBUG "Allocated %lu bytes for %u pacas\n",
-   size, nr_cpu_ids);
-
-   /* Can't use for_each_*_cpu, as they aren't functional yet */
-   for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
-   struct paca_struct *paca = paca_ptrs[cpu];
-
-   initialise_paca(paca, cpu);
+   initialise_paca(paca, cpu);
 #ifdef CONFIG_PPC_PSERIES
-   paca->lppaca_ptr = new_lppaca(cpu, limit);
+   paca->lppaca_ptr = new_lppaca(cpu, limit);
 #endif
 #ifdef CONFIG_PPC_BOOK3S_64
-   paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
+   paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
 #endif
-   }
+   paca_struct_size += sizeof(struct paca_struct);
 }
 
 void __init free_unused_pacas(void)
 {
-   unsigned long size = 0;
int new_ptrs_size;
-   int cpu;
-
-   for (cpu = 0; cpu < paca_nr_cpu_ids; cpu++) {
-   if (!cpu_possible(cpu)) {
-   unsigned long pa = __pa(paca_ptrs[cpu]);
-#ifdef CONFIG_PPC_PSERIES
-   free_lppaca(paca_ptrs[cpu]->lppaca_ptr);
-#endif
-   memblock_free(pa, sizeof(struct paca_struct));
-   paca_ptrs[cpu] = NULL;
-   size += sizeof(struct paca_struct);
-   }
-   }
 
new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
-   if (new_ptrs_size < paca_ptrs_size) {
+   if (new_ptrs_size < paca_ptrs_size)
memblock_free(__pa(paca_ptrs) + new_ptrs_size,
paca_ptrs_size - new_ptrs_size);
-   size += paca_ptrs_size - new_ptrs_size;
-   }
-
-   if (size)
-   printk(KERN_DEBUG "Freed %lu bytes for unused pacas\n", size);
 
paca_nr_cpu_ids = nr_cpu_ids;
paca_ptrs_size = new_ptrs_size;
@@ -261,6 +234,9 @@ void __init free_unused_pacas(void)
paca_ptrs[boot_cpuid]->slb_shadow_ptr = NULL;
}
 #endif
+
+   printk(KERN_DEBUG "Allocated %u bytes for %u pacas\n",
+   paca_ptrs_size + paca_struct_size, nr_cpu_ids);
 }
 
 void copy_mm_to_paca(struct mm_struct *mm)
diff --git a

[RFC PATCH 08/12] powerpc/setup: cpu_to_phys_id array

2018-02-12 Thread Nicholas Piggin
Build an array that finds hardware CPU number from logical CPU
number in firmware CPU discovery. Use that rather than setting
paca of other CPUs directly, to begin with. Subsequent patch will
not have pacas allocated at this point.
---
 arch/powerpc/include/asm/smp.h |  1 +
 arch/powerpc/kernel/prom.c |  9 -
 arch/powerpc/kernel/setup-common.c | 25 -
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ec7b299350d9..cfecfee1194b 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -31,6 +31,7 @@
 
 extern int boot_cpuid;
 extern int spinning_secondaries;
+extern u32 *cpu_to_phys_id;
 
 extern void cpu_die(void);
 extern int cpu_to_chip_id(int cpu);
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 4dffef947b8a..6b29eb1d06f4 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -872,7 +872,14 @@ int cpu_to_chip_id(int cpu)
 }
 EXPORT_SYMBOL(cpu_to_chip_id);
 
-bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
+bool arch_match_ucpu_phys_id(int cpu, u64 phys_id)
 {
+   /*
+* Early firmware scanning must use this rather than
+* get_hard_smp_processor_id because we don't have pacas allocated
+* until memory topology is discovered.
+*/
+   if (cpu_to_phys_id != NULL)
+   return (int)phys_id == cpu_to_phys_id[cpu];
return (int)phys_id == get_hard_smp_processor_id(cpu);
 }
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index f8a6b8ad13b4..169d7e730aa4 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -437,6 +437,8 @@ static void __init cpu_init_thread_core_maps(int tpc)
 }
 
 
+u32 *cpu_to_phys_id = NULL;
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *  cpu_possible_mask
@@ -463,6 +465,10 @@ void __init smp_setup_cpu_maps(void)
 
DBG("smp_setup_cpu_maps()\n");
 
+   cpu_to_phys_id = __va(memblock_alloc(nr_cpu_ids * sizeof(u32),
+   __alignof__(u32)));
+   memset(cpu_to_phys_id, 0, nr_cpu_ids * sizeof(u32));
+
for_each_node_by_type(dn, "cpu") {
const __be32 *intserv;
__be32 cpu_be;
@@ -480,6 +486,7 @@ void __init smp_setup_cpu_maps(void)
intserv = of_get_property(dn, "reg", &len);
if (!intserv) {
cpu_be = cpu_to_be32(cpu);
+   /* XXX: what is this? uninitialized?? */
intserv = &cpu_be;  /* assume logical == 
phys */
len = 4;
}
@@ -499,8 +506,8 @@ void __init smp_setup_cpu_maps(void)
"enable-method", "spin-table");
 
set_cpu_present(cpu, avail);
-   set_hard_smp_processor_id(cpu, be32_to_cpu(intserv[j]));
set_cpu_possible(cpu, true);
+   cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
cpu++;
}
 
@@ -835,6 +842,22 @@ static __init void print_system_info(void)
pr_info("-\n");
 }
 
+#ifdef CONFIG_SMP
+static void smp_setup_pacas(void)
+{
+   int cpu;
+
+   for_each_possible_cpu(cpu) {
+   if (cpu == smp_processor_id())
+   continue;
+   set_hard_smp_processor_id(cpu, cpu_to_phys_id[cpu]);
+   }
+
+   memblock_free(__pa(cpu_to_phys_id), nr_cpu_ids * sizeof(u32));
+   cpu_to_phys_id = NULL;
+}
+#endif
+
 /*
  * Called into from start_kernel this initializes memblock, which is used
  * to manage page allocation until mem_init is called.
-- 
2.16.1



[RFC PATCH 07/12] powerpc/64: move default SPR recording

2018-02-12 Thread Nicholas Piggin
Move this into the early setup code, and don't iterate over CPU masks.
We don't want to call into sysfs so early from setup, and a future patch
won't initialize CPU masks by the time this is called.
---
 arch/powerpc/kernel/paca.c |  3 +++
 arch/powerpc/kernel/setup.h|  9 +++--
 arch/powerpc/kernel/setup_64.c |  8 
 arch/powerpc/kernel/sysfs.c| 18 +++---
 4 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 2699f9009286..e560072f122b 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -133,6 +133,9 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
new_paca->kexec_state = KEXEC_STATE_NONE;
new_paca->__current = &init_task;
new_paca->data_offset = 0xfeeeULL;
+#ifdef CONFIG_PPC64
+   new_paca->dscr_default = spr_default_dscr;
+#endif
 #ifdef CONFIG_PPC_BOOK3S_64
new_paca->slb_shadow_ptr = NULL;
 #endif
diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h
index 3fc11e30308f..d144df54ad40 100644
--- a/arch/powerpc/kernel/setup.h
+++ b/arch/powerpc/kernel/setup.h
@@ -45,14 +45,11 @@ void emergency_stack_init(void);
 static inline void emergency_stack_init(void) { };
 #endif
 
-#ifdef CONFIG_PPC64
-void record_spr_defaults(void);
-#else
-static inline void record_spr_defaults(void) { };
-#endif
-
 #ifdef CONFIG_PPC64
 u64 ppc64_bolted_size(void);
+
+/* Default SPR values from firmware/kexec */
+extern unsigned long spr_default_dscr;
 #endif
 
 /*
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3ce12af4906f..dde34d35d1e7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -254,6 +254,14 @@ static void cpu_ready_for_interrupts(void)
get_paca()->kernel_msr = MSR_KERNEL;
 }
 
+unsigned long spr_default_dscr = 0;
+
+void __init record_spr_defaults(void)
+{
+   if (early_cpu_has_feature(CPU_FTR_DSCR))
+   spr_default_dscr = mfspr(SPRN_DSCR);
+}
+
 /*
  * Early initialization entry point. This is called by head.S
  * with MMU translation disabled. We rely on the "feature" of
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 1f9d94dac3a6..ab4eb61fe659 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -588,21 +588,17 @@ static DEVICE_ATTR(dscr_default, 0600,
 
 static void sysfs_create_dscr_default(void)
 {
-   int err = 0;
-   if (cpu_has_feature(CPU_FTR_DSCR))
-   err = device_create_file(cpu_subsys.dev_root, 
&dev_attr_dscr_default);
-}
-
-void __init record_spr_defaults(void)
-{
-   int cpu;
-
if (cpu_has_feature(CPU_FTR_DSCR)) {
-   dscr_default = mfspr(SPRN_DSCR);
-   for (cpu = 0; cpu < nr_cpu_ids; cpu++)
+   int err = 0;
+   int cpu;
+
+   for_each_possible_cpu(cpu)
paca_ptrs[cpu]->dscr_default = dscr_default;
+
+   err = device_create_file(cpu_subsys.dev_root, 
&dev_attr_dscr_default);
}
 }
+
 #endif /* CONFIG_PPC64 */
 
 #ifdef HAS_PPC_PMC_PA6T
-- 
2.16.1



[RFC PATCH 06/12] powerpc/mm/numa: move numa topology discovery earlier

2018-02-12 Thread Nicholas Piggin
Split sparsemem initialisation from basic numa topology discovery.

XXX: untested with lpars
---
 arch/powerpc/include/asm/setup.h   |  1 +
 arch/powerpc/kernel/setup-common.c |  3 +++
 arch/powerpc/mm/mem.c  |  5 -
 arch/powerpc/mm/numa.c | 32 +++-
 4 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 469b7fdc9be4..d2bf233aebd5 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -23,6 +23,7 @@ extern void reloc_got2(unsigned long);
 #define PTRRELOC(x)((typeof(x)) add_reloc_offset((unsigned long)(x)))
 
 void check_for_initrd(void);
+void mem_topology_setup(void);
 void initmem_init(void);
 void setup_panic(void);
 #define ARCH_PANIC_TIMEOUT 180
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index af128ee67248..f8a6b8ad13b4 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -888,6 +888,9 @@ void __init setup_arch(char **cmdline_p)
/* Check the SMT related command line arguments (ppc64). */
check_smt_enabled();
 
+   /* Parse memory topology */
+   mem_topology_setup();
+
/* On BookE, setup per-core TLB data structures. */
setup_tlb_core_data();
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 1281c6eb3a85..70f7b6426a15 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -213,7 +213,7 @@ walk_system_ram_range(unsigned long start_pfn, unsigned 
long nr_pages,
 EXPORT_SYMBOL_GPL(walk_system_ram_range);
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
-void __init initmem_init(void)
+void __init mem_topology_setup(void)
 {
max_low_pfn = max_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
min_low_pfn = MEMORY_START >> PAGE_SHIFT;
@@ -225,7 +225,10 @@ void __init initmem_init(void)
 * memblock_regions
 */
memblock_set_node(0, (phys_addr_t)ULLONG_MAX, &memblock.memory, 0);
+}
 
+void __init initmem_init(void)
+{
/* XXX need to clip this if using highmem? */
sparse_memory_present_with_active_regions(0);
sparse_init();
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index e9ec465068f1..1eec1bcc03a6 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -836,18 +836,13 @@ static void __init find_possible_nodes(void)
of_node_put(rtas);
 }
 
-void __init initmem_init(void)
+void __init mem_topology_setup(void)
 {
-   int nid, cpu;
-
-   max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
-   max_pfn = max_low_pfn;
+   int cpu;
 
if (parse_numa_properties())
setup_nonnuma();
 
-   memblock_dump_all();
-
/*
 * Modify the set of possible NUMA nodes to reflect information
 * available about the set of online nodes, and the set of nodes
@@ -858,6 +853,23 @@ void __init initmem_init(void)
 
find_possible_nodes();
 
+   setup_node_to_cpumask_map();
+
+   reset_numa_cpu_lookup_table();
+
+   for_each_present_cpu(cpu)
+   numa_setup_cpu(cpu);
+}
+
+void __init initmem_init(void)
+{
+   int nid;
+
+   max_low_pfn = memblock_end_of_DRAM() >> PAGE_SHIFT;
+   max_pfn = max_low_pfn;
+
+   memblock_dump_all();
+
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn;
 
@@ -868,10 +880,6 @@ void __init initmem_init(void)
 
sparse_init();
 
-   setup_node_to_cpumask_map();
-
-   reset_numa_cpu_lookup_table();
-
/*
 * We need the numa_cpu_lookup_table to be accurate for all CPUs,
 * even before we online them, so that we can use cpu_to_{node,mem}
@@ -881,8 +889,6 @@ void __init initmem_init(void)
 */
cpuhp_setup_state_nocalls(CPUHP_POWER_NUMA_PREPARE, 
"powerpc/numa:prepare",
  ppc_numa_cpu_prepare, ppc_numa_cpu_dead);
-   for_each_present_cpu(cpu)
-   numa_setup_cpu(cpu);
 }
 
 static int __init early_numa(char *p)
-- 
2.16.1



[RFC PATCH 05/12] mm: make memblock_alloc_base_nid non-static

2018-02-12 Thread Nicholas Piggin
This will be used by powerpc to allocate per-cpu stacks and other
data structures node-local where possible.

Signed-off-by: Nicholas Piggin 
---
 include/linux/memblock.h | 5 -
 mm/memblock.c| 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 7ed0f7782d16..c0a729c77340 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -316,9 +316,12 @@ static inline bool memblock_bottom_up(void)
 #define MEMBLOCK_ALLOC_ANYWHERE(~(phys_addr_t)0)
 #define MEMBLOCK_ALLOC_ACCESSIBLE  0
 
-phys_addr_t __init memblock_alloc_range(phys_addr_t size, phys_addr_t align,
+phys_addr_t memblock_alloc_range(phys_addr_t size, phys_addr_t align,
phys_addr_t start, phys_addr_t end,
ulong flags);
+phys_addr_t memblock_alloc_base_nid(phys_addr_t size,
+   phys_addr_t align, phys_addr_t max_addr,
+   int nid, ulong flags);
 phys_addr_t memblock_alloc_base(phys_addr_t size, phys_addr_t align,
phys_addr_t max_addr);
 phys_addr_t __memblock_alloc_base(phys_addr_t size, phys_addr_t align,
diff --git a/mm/memblock.c b/mm/memblock.c
index 46aacdfa4f4d..12e5e685e585 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1190,7 +1190,7 @@ phys_addr_t __init memblock_alloc_range(phys_addr_t size, 
phys_addr_t align,
flags);
 }
 
-static phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
+phys_addr_t __init memblock_alloc_base_nid(phys_addr_t size,
phys_addr_t align, phys_addr_t max_addr,
int nid, ulong flags)
 {
-- 
2.16.1



[RFC PATCH 04/12] powerpc/64s: allocate slb_shadow structures individually

2018-02-12 Thread Nicholas Piggin
Allocate slb_shadow structures individually.

slb_shadow structures are avoided for radix environment.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/paca.c | 65 +-
 1 file changed, 30 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 6cddb9bdc151..2699f9009286 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -72,41 +72,28 @@ static void __init free_lppaca(struct lppaca *lp)
 #ifdef CONFIG_PPC_BOOK3S_64
 
 /*
- * 3 persistent SLBs are registered here.  The buffer will be zero
+ * 3 persistent SLBs are allocated here.  The buffer will be zero
  * initially, hence will all be invaild until we actually write them.
  *
  * If you make the number of persistent SLB entries dynamic, please also
  * update PR KVM to flush and restore them accordingly.
  */
-static struct slb_shadow * __initdata slb_shadow;
-
-static void __init allocate_slb_shadows(int nr_cpus, int limit)
-{
-   int size = PAGE_ALIGN(sizeof(struct slb_shadow) * nr_cpus);
-
-   if (early_radix_enabled())
-   return;
-
-   slb_shadow = __va(memblock_alloc_base(size, PAGE_SIZE, limit));
-   memset(slb_shadow, 0, size);
-}
-
-static struct slb_shadow * __init init_slb_shadow(int cpu)
+static struct slb_shadow * __init new_slb_shadow(int cpu, unsigned long limit)
 {
struct slb_shadow *s;
 
-   if (early_radix_enabled())
-   return NULL;
+   if (cpu != boot_cpuid) {
+   /*
+* Boot CPU comes here before early_radix_enabled
+* is parsed (e.g., for disable_radix). So allocate
+* always and this will be fixed up in free_unused_pacas.
+*/
+   if (early_radix_enabled())
+   return NULL;
+   }
 
-   s = &slb_shadow[cpu];
-
-   /*
-* When we come through here to initialise boot_paca, the slb_shadow
-* buffers are not allocated yet. That's OK, we'll get one later in
-* boot, but make sure we don't corrupt memory at 0.
-*/
-   if (!slb_shadow)
-   return NULL;
+   s = __va(memblock_alloc_base(sizeof(*s), L1_CACHE_BYTES, limit));
+   memset(s, 0, sizeof(*s));
 
s->persistent = cpu_to_be32(SLB_NUM_BOLTED);
s->buffer_length = cpu_to_be32(sizeof(*s));
@@ -114,10 +101,6 @@ static struct slb_shadow * __init init_slb_shadow(int cpu)
return s;
 }
 
-#else /* !CONFIG_PPC_BOOK3S_64 */
-
-static void __init allocate_slb_shadows(int nr_cpus, int limit) { }
-
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 /* The Paca is an array with one entry per processor.  Each contains an
@@ -151,7 +134,7 @@ void __init initialise_paca(struct paca_struct *new_paca, 
int cpu)
new_paca->__current = &init_task;
new_paca->data_offset = 0xfeeeULL;
 #ifdef CONFIG_PPC_BOOK3S_64
-   new_paca->slb_shadow_ptr = init_slb_shadow(cpu);
+   new_paca->slb_shadow_ptr = NULL;
 #endif
 
 #ifdef CONFIG_PPC_BOOK3E
@@ -222,13 +205,16 @@ void __init allocate_pacas(void)
printk(KERN_DEBUG "Allocated %lu bytes for %u pacas\n",
size, nr_cpu_ids);
 
-   allocate_slb_shadows(nr_cpu_ids, limit);
-
/* Can't use for_each_*_cpu, as they aren't functional yet */
for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
-   initialise_paca(paca_ptrs[cpu], cpu);
+   struct paca_struct *paca = paca_ptrs[cpu];
+
+   initialise_paca(paca, cpu);
 #ifdef CONFIG_PPC_PSERIES
-   paca_ptrs[cpu]->lppaca_ptr = new_lppaca(cpu, limit);
+   paca->lppaca_ptr = new_lppaca(cpu, limit);
+#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+   paca->slb_shadow_ptr = new_slb_shadow(cpu, limit);
 #endif
}
 }
@@ -263,6 +249,15 @@ void __init free_unused_pacas(void)
 
paca_nr_cpu_ids = nr_cpu_ids;
paca_ptrs_size = new_ptrs_size;
+
+#ifdef CONFIG_PPC_BOOK3S_64
+   if (early_radix_enabled()) {
+   /* Ugly fixup, see new_slb_shadow() */
+   memblock_free(__pa(paca_ptrs[boot_cpuid]->slb_shadow_ptr),
+   sizeof(struct slb_shadow));
+   paca_ptrs[boot_cpuid]->slb_shadow_ptr = NULL;
+   }
+#endif
 }
 
 void copy_mm_to_paca(struct mm_struct *mm)
-- 
2.16.1



[RFC PATCH 03/12] powerpc/64s: allocate lppacas individually

2018-02-12 Thread Nicholas Piggin
Allocate LPPACAs individually.

We no longer allocate lppacas in an array, so this patch removes the 1kB
static alignment for the structure, and enforces the PAPR alignment
requirements at allocation time. We can not reduce the 1kB allocation size
however, due to existing KVM hypervisors.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/lppaca.h  | 24 -
 arch/powerpc/kernel/machine_kexec_64.c | 15 --
 arch/powerpc/kernel/paca.c | 89 --
 arch/powerpc/kvm/book3s_hv.c   |  3 +-
 arch/powerpc/mm/numa.c |  4 +-
 arch/powerpc/platforms/pseries/kexec.c |  7 ++-
 6 files changed, 63 insertions(+), 79 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index 6e4589eee2da..65d589689f01 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -36,14 +36,16 @@
 #include 
 
 /*
- * We only have to have statically allocated lppaca structs on
- * legacy iSeries, which supports at most 64 cpus.
- */
-#define NR_LPPACAS 1
-
-/*
- * The Hypervisor barfs if the lppaca crosses a page boundary.  A 1k
- * alignment is sufficient to prevent this
+ * The lppaca is the "virtual processor area" registered with the hypervisor,
+ * H_REGISTER_VPA etc.
+ *
+ * According to PAPR, the structure is 640 bytes long, must be L1 cache line
+ * aligned, and must not cross a 4kB boundary. Its size field must be at
+ * least 640 bytes (but may be more).
+ *
+ * Pre-v4.14 KVM hypervisors reject the VPA if its size field is smaller than
+ * 1kB, so we dynamically allocate 1kB and advertise size as 1kB, but keep
+ * this structure as the canonical 640 byte size.
  */
 struct lppaca {
/* cacheline 1 contains read-only data */
@@ -97,11 +99,9 @@ struct lppaca {
 
__be32  page_ins;   /* CMO Hint - # page ins by OS */
u8  reserved11[148];
-   volatile __be64 dtl_idx;/* Dispatch Trace Log head 
index */
+   volatile __be64 dtl_idx;/* Dispatch Trace Log head index */
u8  reserved12[96];
-} __attribute__((__aligned__(0x400)));
-
-extern struct lppaca lppaca[];
+} cacheline_aligned;
 
 #define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index a250e3331f94..1044bf15d5ed 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -323,17 +323,24 @@ void default_machine_kexec(struct kimage *image)
kexec_stack.thread_info.cpu = current_thread_info()->cpu;
 
/* We need a static PACA, too; copy this CPU's PACA over and switch to
-* it.  Also poison per_cpu_offset to catch anyone using non-static
-* data.
+* it. Also poison per_cpu_offset and NULL lppaca to catch anyone using
+* non-static data.
 */
memcpy(&kexec_paca, get_paca(), sizeof(struct paca_struct));
kexec_paca.data_offset = 0xedeaddeadeeeUL;
+#ifdef CONFIG_PPC_PSERIES
+   kexec_paca.lppaca_ptr = NULL;
+#endif
paca_ptrs[kexec_paca.paca_index] = &kexec_paca;
+
setup_paca(&kexec_paca);
 
-   /* XXX: If anyone does 'dynamic lppacas' this will also need to be
-* switched to a static version!
+   /*
+* The lppaca should be unregistered at this point so the HV won't
+* touch it. In the case of a crash, none of the lppacas are
+* unregistered so there is not much we can do about it here.
 */
+
/*
 * On Book3S, the copy must happen with the MMU off if we are either
 * using Radix page tables or we are not in an LPAR since we can
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index eef4891c9af6..6cddb9bdc151 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -23,82 +23,50 @@
 #ifdef CONFIG_PPC_PSERIES
 
 /*
- * The structure which the hypervisor knows about - this structure
- * should not cross a page boundary.  The vpa_init/register_vpa call
- * is now known to fail if the lppaca structure crosses a page
- * boundary.  The lppaca is also used on POWER5 pSeries boxes.
- * The lppaca is 640 bytes long, and cannot readily
- * change since the hypervisor knows its layout, so a 1kB alignment
- * will suffice to ensure that it doesn't cross a page boundary.
+ * See asm/lppaca.h for more detail.
+ *
+ * lppaca structures must must be 1kB in size, L1 cache line aligned,
+ * and not cross 4kB boundary. A 1kB size and 1kB alignment will satisfy
+ * these requirements.
  */
-struct lppaca lppaca[] = {
-   [0 ... (NR_LPPACAS-1)] = {
+static inline void init_lppaca(struct lppaca *lppaca)
+{
+   BUILD_BUG_ON(sizeof(struct lppaca) != 640);
+
+   *lppaca = (struct lppaca) {
.desc = cpu_to_be32(0xd397d781),/* "LpPa" */
-   .size = cpu_to_be16(sizeof(struct lppaca)),
+

[RFC PATCH 02/12] powerpc/64: Use array of paca pointers and allocate pacas individually

2018-02-12 Thread Nicholas Piggin
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/kvm_ppc.h   |  8 ++--
 arch/powerpc/include/asm/lppaca.h|  2 +-
 arch/powerpc/include/asm/paca.h  |  4 +-
 arch/powerpc/include/asm/smp.h   |  4 +-
 arch/powerpc/kernel/crash.c  |  2 +-
 arch/powerpc/kernel/head_64.S| 19 
 arch/powerpc/kernel/machine_kexec_64.c   | 22 -
 arch/powerpc/kernel/paca.c   | 70 +++-
 arch/powerpc/kernel/setup_64.c   | 23 -
 arch/powerpc/kernel/smp.c| 10 ++--
 arch/powerpc/kernel/sysfs.c  |  2 +-
 arch/powerpc/kvm/book3s_hv.c | 31 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c |  2 +-
 arch/powerpc/mm/tlb-radix.c  |  2 +-
 arch/powerpc/platforms/85xx/smp.c|  8 ++--
 arch/powerpc/platforms/cell/smp.c|  4 +-
 arch/powerpc/platforms/powernv/idle.c| 13 +++---
 arch/powerpc/platforms/powernv/setup.c   |  4 +-
 arch/powerpc/platforms/powernv/smp.c |  2 +-
 arch/powerpc/platforms/powernv/subcore.c |  2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  2 +-
 arch/powerpc/platforms/pseries/lpar.c|  4 +-
 arch/powerpc/platforms/pseries/setup.c   |  2 +-
 arch/powerpc/platforms/pseries/smp.c |  4 +-
 arch/powerpc/sysdev/xics/icp-native.c|  2 +-
 arch/powerpc/xmon/xmon.c |  2 +-
 26 files changed, 143 insertions(+), 107 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 9db18287b5f4..8908481cdfd7 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -432,15 +432,15 @@ struct openpic;
 extern void kvm_cma_reserve(void) __init;
 static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
 {
-   paca[cpu].kvm_hstate.xics_phys = (void __iomem *)addr;
+   paca_ptrs[cpu]->kvm_hstate.xics_phys = (void __iomem *)addr;
 }
 
 static inline void kvmppc_set_xive_tima(int cpu,
unsigned long phys_addr,
void __iomem *virt_addr)
 {
-   paca[cpu].kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
-   paca[cpu].kvm_hstate.xive_tima_virt = virt_addr;
+   paca_ptrs[cpu]->kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
+   paca_ptrs[cpu]->kvm_hstate.xive_tima_virt = virt_addr;
 }
 
 static inline u32 kvmppc_get_xics_latch(void)
@@ -454,7 +454,7 @@ static inline u32 kvmppc_get_xics_latch(void)
 
 static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
 {
-   paca[cpu].kvm_hstate.host_ipi = host_ipi;
+   paca_ptrs[cpu]->kvm_hstate.host_ipi = host_ipi;
 }
 
 static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/include/asm/lppaca.h 
b/arch/powerpc/include/asm/lppaca.h
index d0a2a2f99564..6e4589eee2da 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -103,7 +103,7 @@ struct lppaca {
 
 extern struct lppaca lppaca[];
 
-#define lppaca_of(cpu) (*paca[cpu].lppaca_ptr)
+#define lppaca_of(cpu) (*paca_ptrs[cpu]->lppaca_ptr)
 
 /*
  * We are using a non architected field to determine if a partition is
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 57fe8aa0c257..f266b0a7be95 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -246,10 +246,10 @@ struct paca_struct {
void *rfi_flush_fallback_area;
u64 l1d_flush_size;
 #endif
-};
+} cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
-extern struct paca_struct *paca;
+extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
 extern void allocate_pacas(void);
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index fac963e10d39..ec7b299350d9 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -170,12 +170,12 @@ static inline const struct cpumask *cpu_sibling_mask(int 
cpu)
 #ifdef CONFIG_PPC64
 static inline int get_hard_smp_processor_id(int cpu)
 {
-   return paca[cpu].hw_cpu_id;
+   return paca_ptrs[cpu]->hw_cpu_id;
 }
 
 static inline void set_hard_smp_processor_id(int cpu, int phys)
 {
-   paca[cpu].hw_cpu_id = phys;
+

[RFC PATCH 01/12] powerpc/64s: do not allocate lppaca if we are not virtualized

2018-02-12 Thread Nicholas Piggin
The "lppaca" is a structure registered with the hypervisor. This
is unnecessary when running on non-virtualised platforms. One field
from the lppaca (pmcregs_in_use) is also used by the host, so move
the host part out into the paca (lppaca field is still updated in
guest mode).

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/paca.h |  9 +++--
 arch/powerpc/include/asm/pmc.h  | 13 -
 arch/powerpc/kernel/asm-offsets.c   |  5 +
 arch/powerpc/kernel/paca.c  | 16 +---
 arch/powerpc/kvm/book3s_hv_interrupts.S |  3 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  3 +--
 6 files changed, 39 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index b62c31037cad..57fe8aa0c257 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -58,7 +58,7 @@ struct task_struct;
  * processor.
  */
 struct paca_struct {
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
/*
 * Because hw_cpu_id, unlike other paca fields, is accessed
 * routinely from other CPUs (from the IRQ code), we stick to
@@ -67,7 +67,8 @@ struct paca_struct {
 */
 
struct lppaca *lppaca_ptr;  /* Pointer to LpPaca for PLIC */
-#endif /* CONFIG_PPC_BOOK3S */
+#endif /* CONFIG_PPC_PSERIES */
+
/*
 * MAGIC: the spinlock functions in arch/powerpc/lib/locks.c 
 * load lock_token and paca_index with a single lwz
@@ -160,10 +161,14 @@ struct paca_struct {
u64 saved_msr;  /* MSR saved here by enter_rtas */
u16 trap_save;  /* Used when bad stack is encountered */
u8 irq_soft_mask;   /* mask for irq soft masking */
+   u8 soft_enabled;/* irq soft-enable flag */
u8 irq_happened;/* irq happened while soft-disabled */
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   u8 pmcregs_in_use;  /* pseries puts this in lppaca */
+#endif
u64 sprg_vdso;  /* Saved user-visible sprg */
 #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
u64 tm_scratch; /* TM scratch area for reclaim */
diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h
index 5a9ede4962cb..7ac3586c38ab 100644
--- a/arch/powerpc/include/asm/pmc.h
+++ b/arch/powerpc/include/asm/pmc.h
@@ -31,10 +31,21 @@ void ppc_enable_pmcs(void);
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #include 
+#include 
 
 static inline void ppc_set_pmu_inuse(int inuse)
 {
-   get_lppaca()->pmcregs_in_use = inuse;
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+   if (firmware_has_feature(FW_FEATURE_LPAR)) {
+#ifdef CONFIG_PPC_PSERIES
+   get_lppaca()->pmcregs_in_use = inuse;
+#endif
+   } else {
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   get_paca()->pmcregs_in_use = inuse;
+#endif
+   }
+#endif
 }
 
 extern void power4_enable_pmcs(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 88b84ac76b53..b9b52490acfd 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -221,12 +221,17 @@ int main(void)
OFFSET(PACA_EXMC, paca_struct, exmc);
OFFSET(PACA_EXSLB, paca_struct, exslb);
OFFSET(PACA_EXNMI, paca_struct, exnmi);
+#ifdef CONFIG_PPC_PSERIES
OFFSET(PACALPPACAPTR, paca_struct, lppaca_ptr);
+#endif
OFFSET(PACA_SLBSHADOWPTR, paca_struct, slb_shadow_ptr);
OFFSET(SLBSHADOW_STACKVSID, slb_shadow, save_area[SLB_NUM_BOLTED - 
1].vsid);
OFFSET(SLBSHADOW_STACKESID, slb_shadow, save_area[SLB_NUM_BOLTED - 
1].esid);
OFFSET(SLBSHADOW_SAVEAREA, slb_shadow, save_area);
OFFSET(LPPACA_PMCINUSE, lppaca, pmcregs_in_use);
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+   OFFSET(PACA_PMCINUSE, paca_struct, pmcregs_in_use);
+#endif
OFFSET(LPPACA_DTLIDX, lppaca, dtl_idx);
OFFSET(LPPACA_YIELDCOUNT, lppaca, yield_count);
OFFSET(PACA_DTL_RIDX, paca_struct, dtl_ridx);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 95ffedf14885..5900540e2ff8 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -20,7 +20,7 @@
 
 #include "setup.h"
 
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
 
 /*
  * The structure which the hypervisor knows about - this structure
@@ -47,6 +47,9 @@ static long __initdata lppaca_size;
 
 static void __init allocate_lppacas(int nr_cpus, unsigned long limit)
 {
+   if (early_cpu_has_feature(CPU_FTR_HVMODE))
+   return;
+
if (nr_cpus <= NR_LPPACAS)
return;
 
@@ -60,6 +63,9 @@ static struct lppaca 

[RFC PATCH 00/12] numa aware allocation for pacas, stacks,

2018-02-12 Thread Nicholas Piggin
This series allows numa aware allocations for various early data
structures for radix. Hash still has a bolted SLB limitation that
prevents at least pacas and stacks from node-affine allocations.

Since I last posted a feeble attempt at this, I went back and tried
to cover the setup / topology discovery code a bit more thoroughly.
Paca allocation is deferred until quite late, and numa discover is
moved slightly earlier.

Still requires more testing with different platforms, BookE, pseries,
etc., But it seems to work with powernv so far.

Thanks,
Nick

Nicholas Piggin (12):
  powerpc/64s: do not allocate lppaca if we are not virtualized
  powerpc/64: Use array of paca pointers and allocate pacas individually
  powerpc/64s: allocate lppacas individually
  powerpc/64s: allocate slb_shadow structures individually
  mm: make memblock_alloc_base_nid non-static
  powerpc/mm/numa: move numa topology discovery earlier
  powerpc/64: move default SPR recording
  powerpc/setup: cpu_to_phys_id array
  powerpc/64: defer paca allocation until memory topology is discovered
  powerpc/64: allocate pacas per node
  powerpc/64: allocate per-cpu stacks node-local if possible
  powerpc/64s/radix: allocate kernel page tables node-local if possible

 arch/powerpc/include/asm/book3s/64/hash.h|   2 +-
 arch/powerpc/include/asm/book3s/64/radix.h   |   2 +-
 arch/powerpc/include/asm/kvm_ppc.h   |   8 +-
 arch/powerpc/include/asm/lppaca.h|  26 +--
 arch/powerpc/include/asm/paca.h  |  16 +-
 arch/powerpc/include/asm/pmc.h   |  13 +-
 arch/powerpc/include/asm/setup.h |   1 +
 arch/powerpc/include/asm/smp.h   |   5 +-
 arch/powerpc/include/asm/sparsemem.h |   2 +-
 arch/powerpc/kernel/asm-offsets.c|   5 +
 arch/powerpc/kernel/crash.c  |   2 +-
 arch/powerpc/kernel/head_64.S|  19 ++-
 arch/powerpc/kernel/machine_kexec_64.c   |  37 +++--
 arch/powerpc/kernel/paca.c   | 236 ++-
 arch/powerpc/kernel/prom.c   |  14 +-
 arch/powerpc/kernel/setup-common.c   |  30 +++-
 arch/powerpc/kernel/setup.h  |   9 +-
 arch/powerpc/kernel/setup_64.c   |  80 ++---
 arch/powerpc/kernel/smp.c|  10 +-
 arch/powerpc/kernel/sysfs.c  |  18 +-
 arch/powerpc/kvm/book3s_hv.c |  34 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c |   2 +-
 arch/powerpc/kvm/book3s_hv_interrupts.S  |   3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  |   3 +-
 arch/powerpc/mm/hash_utils_64.c  |   2 +-
 arch/powerpc/mm/mem.c|   9 +-
 arch/powerpc/mm/numa.c   |  36 ++--
 arch/powerpc/mm/pgtable-book3s64.c   |   6 +-
 arch/powerpc/mm/pgtable-radix.c  | 178 +---
 arch/powerpc/mm/tlb-radix.c  |   2 +-
 arch/powerpc/platforms/85xx/smp.c|   8 +-
 arch/powerpc/platforms/cell/smp.c|   4 +-
 arch/powerpc/platforms/powernv/idle.c|  13 +-
 arch/powerpc/platforms/powernv/setup.c   |   4 +-
 arch/powerpc/platforms/powernv/smp.c |   2 +-
 arch/powerpc/platforms/powernv/subcore.c |   2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   2 +-
 arch/powerpc/platforms/pseries/kexec.c   |   7 +-
 arch/powerpc/platforms/pseries/lpar.c|   4 +-
 arch/powerpc/platforms/pseries/setup.c   |   2 +-
 arch/powerpc/platforms/pseries/smp.c |   4 +-
 arch/powerpc/sysdev/xics/icp-native.c|   2 +-
 arch/powerpc/xmon/xmon.c |   2 +-
 include/linux/memblock.h |   5 +-
 mm/memblock.c|   2 +-
 45 files changed, 527 insertions(+), 346 deletions(-)

-- 
2.16.1



Re: [RFC PATCH 0/5] powerpc/mm/slice: improve slice speed and stack use

2018-02-12 Thread Nicholas Piggin
On Mon, 12 Feb 2018 16:02:23 +0100
Christophe LEROY  wrote:

> Le 10/02/2018 à 09:11, Nicholas Piggin a écrit :
> > This series intends to improve performance and reduce stack
> > consumption in the slice allocation code. It does it by keeping slice
> > masks in the mm_context rather than compute them for each allocation,
> > and by reducing bitmaps and slice_masks from stacks, using pointers
> > instead where possible.
> > 
> > checkstack.pl gives, before:
> > 0x0de4 slice_get_unmapped_area [slice.o]:   656
> > 0x1b4c is_hugepage_only_range [slice.o]:512
> > 0x075c slice_find_area_topdown [slice.o]:   416
> > 0x04c8 slice_find_area_bottomup.isra.1 [slice.o]:   272
> > 0x1aa0 slice_set_range_psize [slice.o]: 240
> > 0x0a64 slice_find_area [slice.o]:   176
> > 0x0174 slice_check_fit [slice.o]:   112
> > 
> > after:
> > 0x0d70 slice_get_unmapped_area [slice.o]:   320
> > 0x08f8 slice_find_area [slice.o]:   144
> > 0x1860 slice_set_range_psize [slice.o]: 144
> > 0x18ec is_hugepage_only_range [slice.o]:144
> > 0x0750 slice_find_area_bottomup.isra.4 [slice.o]:   128
> > 
> > The benchmark in https://github.com/linuxppc/linux/issues/49 gives, before:
> > $ time ./slicemask
> > real0m20.712s
> > user0m5.830s
> > sys 0m15.105s
> > 
> > after:
> > $ time ./slicemask
> > real0m13.197s
> > user0m5.409s
> > sys 0m7.779s  
> 
> Hi,
> 
> I tested your serie on an 8xx, on top of patch 
> https://patchwork.ozlabs.org/patch/871675/
> 
> I don't get a result as significant as yours, but there is some 
> improvment anyway:
> 
> ITERATION 50
> 
> Before:
> 
> root@vgoip:~# time ./slicemask
> real0m 33.26s
> user0m 1.94s
> sys 0m 30.85s
> 
> After:
> root@vgoip:~# time ./slicemask
> real0m 29.69s
> user0m 2.11s
> sys 0m 27.15s
> 
> Most significant improvment is obtained with the first patch of your serie:
> root@vgoip:~# time ./slicemask
> real0m 30.85s
> user0m 1.80s
> sys 0m 28.57s

Okay, thanks. Are you still spending significant time in the slice
code?

> 
> Had to modify your serie a bit, if you are interested I can post it.
> 

Sure, that would be good.

Thanks,
Nick


Re: [RFC PATCH 0/5] powerpc/mm/slice: improve slice speed and stack use

2018-02-12 Thread Christophe LEROY



Le 10/02/2018 à 09:11, Nicholas Piggin a écrit :

This series intends to improve performance and reduce stack
consumption in the slice allocation code. It does it by keeping slice
masks in the mm_context rather than compute them for each allocation,
and by reducing bitmaps and slice_masks from stacks, using pointers
instead where possible.

checkstack.pl gives, before:
0x0de4 slice_get_unmapped_area [slice.o]:   656
0x1b4c is_hugepage_only_range [slice.o]:512
0x075c slice_find_area_topdown [slice.o]:   416
0x04c8 slice_find_area_bottomup.isra.1 [slice.o]:   272
0x1aa0 slice_set_range_psize [slice.o]: 240
0x0a64 slice_find_area [slice.o]:   176
0x0174 slice_check_fit [slice.o]:   112

after:
0x0d70 slice_get_unmapped_area [slice.o]:   320
0x08f8 slice_find_area [slice.o]:   144
0x1860 slice_set_range_psize [slice.o]: 144
0x18ec is_hugepage_only_range [slice.o]:144
0x0750 slice_find_area_bottomup.isra.4 [slice.o]:   128

The benchmark in https://github.com/linuxppc/linux/issues/49 gives, before:
$ time ./slicemask
real0m20.712s
user0m5.830s
sys 0m15.105s

after:
$ time ./slicemask
real0m13.197s
user0m5.409s
sys 0m7.779s


Hi,

I tested your serie on an 8xx, on top of patch 
https://patchwork.ozlabs.org/patch/871675/


I don't get a result as significant as yours, but there is some 
improvment anyway:


ITERATION 50

Before:

root@vgoip:~# time ./slicemask
real0m 33.26s
user0m 1.94s
sys 0m 30.85s

After:
root@vgoip:~# time ./slicemask
real0m 29.69s
user0m 2.11s
sys 0m 27.15s

Most significant improvment is obtained with the first patch of your serie:
root@vgoip:~# time ./slicemask
real0m 30.85s
user0m 1.80s
sys 0m 28.57s

Had to modify your serie a bit, if you are interested I can post it.

Christophe




Thanks,
Nick

Nicholas Piggin (5):
   powerpc/mm/slice: pass pointers to struct slice_mask where possible
   powerpc/mm/slice: implement a slice mask cache
   powerpc/mm/slice: implement slice_check_range_fits
   powerpc/mm/slice: Use const pointers to cached slice masks where
 possible
   powerpc/mm/slice: use the dynamic high slice size to limit bitmap
 operations

  arch/powerpc/include/asm/book3s/64/mmu.h |  20 +-
  arch/powerpc/mm/slice.c  | 302 +++
  2 files changed, 204 insertions(+), 118 deletions(-)



Re: [PATCH 2/3] cxl: Introduce module parameter 'enable_psltrace'

2018-02-12 Thread Frederic Barrat



Le 11/02/2018 à 18:10, Vaibhav Jain a écrit :

Thanks for reviewing the patch Christophe,

christophe lombard  writes:

+bool cxl_enable_psltrace = true;
+module_param_named(enable_psltrace, cxl_enable_psltrace, bool, 0600);
+MODULE_PARM_DESC(enable_psltrace, "Set PSL traces on probe. default: on");
+

I am not too agree to add a new parameter. This can cause doubts.
PSL team has confirmed that enabling traces has no impact.
Do you see any reason to disable the traces ?


Traces on PSL follow a 'set and fetch' model. So once the trace buffer for
a specific array is full it will stop and switch to 'FIN' state and at
that point we need to fetch the trace-data and reinit the array to
re-arm it.


If the PSL trace arrays don't wrap, is there anything to gain by 
enabling tracing by default instead of letting the developer handle it 
through sysfs? I was under the (now wrong) impression that the PSL would 
wrap.
I'm not a big fan of the module parameter. It seems we're giving a 
second way of activating traces on top of sysfs, more cumbersome and 
limited.


  Fred


There might be some circumstances where this model may lead to confusion
specifically when AFU developers assume that the trace arrays are
already armed and dont re-arm it causing miss of trace data.

So this module param is a compromise to keep the old behaviour of traces
array intact where in the arming/disarming of the trace arrays is
controlled completely by userspace tooling and not by cxl.





Re: [PATCH] powerpc/xmon: Dont register sysrq key when kernel param xmon=off

2018-02-12 Thread Vaibhav Jain
Thanks for reviewing this patch Balbir

Balbir Singh  writes:

> Any specific issue you've run into without this patch? 
Without this patch since xmon is still accessible via sysrq and there is
no indication/warning on the xmon console mentioning that its is not
fully functional. Specifically xmon-console would still allow user to
set instruction/data breakpoint eventhough they wont work and will
result in a kernel-oops.

Below is command log illustrating this problem on one of my test system
where I tried setting an instruction breakpoint on cmdline_proc_show()
with xmon=off:

~# cat /proc/cmdline 
root=UUID=248ad10e-a272-4187-8672-5b25f701e8b9 ro xmon=off

~# echo 'x' > /proc/sysrq-trigger   

  
[  458.904802] sysrq: SysRq : Entering xmon

[ snip ]

78:mon> ls cmdline_proc_show
cmdline_proc_show: c04196e0
78:mon> bi c04196e0
78:mon> x

~# cat /proc/cmdline
[  505.618702] Oops: Exception in kernel mode, sig: 5 [#1]
[ snip ]
[  505.620082] NIP [c04196e4] cmdline_proc_show+0x4/0x60
[  505.620136] LR [c03b1db0] seq_read+0x130/0x5e0
[  505.620177] Call Trace:
[  505.620202] [c000200e5078fc00] [c03b1d74] seq_read+0xf4/0x5e0 
(unreliable)
[  505.620267] [c000200e5078fca0] [c040cae0] proc_reg_read+0xb0/0x110
[  505.620322] [c000200e5078fcf0] [c037687c] __vfs_read+0x6c/0x1b0
[  505.620376] [c000200e5078fd90] [c0376a7c] vfs_read+0xbc/0x1b0
[  505.620430] [c000200e5078fde0] [c037724c] SyS_read+0x6c/0x110
[  505.620485] [c000200e5078fe30] [c000b320] system_call+0x58/0x6c
[  505.620536] Instruction dump:
[  505.620570] 3c82ff2a 7fe3fb78 38a0 3884dee0 4bf98c05 6000 38210030 
e8010010 
[  505.620656] ebe1fff8 7c0803a6 4e800020 3c4c00d6 <38422120> 7c0802a6 f8010010 
f821ff91 
[  505.620728] ---[ end trace eaf583921860b3de ]---
[  506.629019] 
Trace/breakpoint trap
~#


> I presume running xmon=off indicates we don't want xmon to take over in case 
> of
> panic/die/oops, 
I believe that when xmon console is available it should be fully
functional rather than partially, otherwise it gets really confusing to
the user as to why Instruction/Data break points arent working.

> why are we tying this to sysrq?
With xmon=off sysrq seems to be the only way to enter xmon console.

-- 
Vaibhav Jain 
Linux Technology Center, IBM India Pvt. Ltd.



Re: [PATCH] headers: untangle kmemleak.h from mm.h

2018-02-12 Thread Michael Ellerman
Randy Dunlap  writes:

> From: Randy Dunlap 
>
> Currently  #includes  for no obvious
> reason. It looks like it's only a convenience, so remove kmemleak.h
> from slab.h and add  to any users of kmemleak_*
> that don't already #include it.
> Also remove  from source files that do not use it.
>
> This is tested on i386 allmodconfig and x86_64 allmodconfig. It
> would be good to run it through the 0day bot for other $ARCHes.
> I have neither the horsepower nor the storage space for the other
> $ARCHes.
>
> [slab.h is the second most used header file after module.h; kernel.h
> is right there with slab.h. There could be some minor error in the
> counting due to some #includes having comments after them and I
> didn't combine all of those.]
>
> This is Lingchi patch #1 (death by a thousand cuts, applied to kernel
> header files).
>
> Signed-off-by: Randy Dunlap 

I threw it at a random selection of configs and so far the only failures
I'm seeing are:

  lib/test_firmware.c:134:2: error: implicit declaration of function 'vfree' 
[-Werror=implicit-function-declaration] 
 
  lib/test_firmware.c:620:25: error: implicit declaration of function 'vzalloc' 
[-Werror=implicit-function-declaration]
  lib/test_firmware.c:620:2: error: implicit declaration of function 'vzalloc' 
[-Werror=implicit-function-declaration]
  security/integrity/digsig.c:146:2: error: implicit declaration of function 
'vfree' [-Werror=implicit-function-declaration]

Full results trickling in here, not all the failures there are caused by
this patch, ie. some configs are broken in mainline:

  http://kisskb.ellerman.id.au/kisskb/head/13396/

cheers


Re: KVM compile error

2018-02-12 Thread Christian Zigotzky
It‘s only an info. I tried to compile the latest git version yesterday and I 
got this error. I will try to compile the RC1 today and test if this error 
still exists.

Cheers,
Christian

Sent from my iPhone

> On 12. Feb 2018, at 12:08, Michael Ellerman  wrote:
> 
> Christian Zigotzky  writes:
> 
>> Just for info: KVM doesn’t compile currently.
>> 
>> Error messages:
>> 
>> CC  arch/powerpc/kvm/powerpc.o
>> arch/powerpc/kvm/powerpc.c: In function 'kvm_arch_vcpu_ioctl_run':
>> arch/powerpc/kvm/powerpc.c:1611:1: error: label 'out' defined but not used 
>> [-Werror=unused-label]
>> out:
>> ^
>> cc1: all warnings being treated as errors
> 
> I don't see this, which compiler/config/commit is that?
> 
> cheers



Re: [PATCH] powerpc/xmon: Dont register sysrq key when kernel param xmon=off

2018-02-12 Thread Balbir Singh
On Mon, Feb 12, 2018 at 7:59 PM, Vaibhav Jain
 wrote:
> Presently sysrq key for xmon('x') is registered during kernel init
> irrespective of the value of kernel param 'xmon'. Thus xmon is enabled
> even if 'xmon=off' is passed on the kernel command line.
>
> This minor patch updates setup_xmon_sysrq() to register
> 'sysrq_xmon_op' only when variable 'xmon_on' is set.
>
> Signed-off-by: Vaibhav Jain 
> ---

Any specific issue you've run into without this patch? I presume
running xmon=off indicates we don't want xmon to take over in case of
panic/die/oops, why are we tying this to sysrq?

Balbir Singh.


Re: KVM compile error

2018-02-12 Thread Michael Ellerman
Christian Zigotzky  writes:

> Just for info: KVM doesn’t compile currently.
>
> Error messages:
>
> CC  arch/powerpc/kvm/powerpc.o
> arch/powerpc/kvm/powerpc.c: In function 'kvm_arch_vcpu_ioctl_run':
> arch/powerpc/kvm/powerpc.c:1611:1: error: label 'out' defined but not used 
> [-Werror=unused-label]
>  out:
>  ^
> cc1: all warnings being treated as errors

I don't see this, which compiler/config/commit is that?

cheers


Re: [PATCH 2/3] cxl: Introduce module parameter 'enable_psltrace'

2018-02-12 Thread christophe lombard

Le 11/02/2018 à 18:10, Vaibhav Jain a écrit :

Thanks for reviewing the patch Christophe,

christophe lombard  writes:

+bool cxl_enable_psltrace = true;
+module_param_named(enable_psltrace, cxl_enable_psltrace, bool, 0600);
+MODULE_PARM_DESC(enable_psltrace, "Set PSL traces on probe. default: on");
+

I am not too agree to add a new parameter. This can cause doubts.
PSL team has confirmed that enabling traces has no impact.
Do you see any reason to disable the traces ?


Traces on PSL follow a 'set and fetch' model. So once the trace buffer for
a specific array is full it will stop and switch to 'FIN' state and at
that point we need to fetch the trace-data and reinit the array to
re-arm it.

There might be some circumstances where this model may lead to confusion
specifically when AFU developers assume that the trace arrays are
already armed and dont re-arm it causing miss of trace data.

So this module param is a compromise to keep the old behaviour of traces
array intact where in the arming/disarming of the trace arrays is
controlled completely by userspace tooling and not by cxl.

and about P8 ? This new parameter is only useful for P9. It will be 
confusing.




Re: [PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-12 Thread Viresh Kumar
On 12-02-18, 16:03, Shilpasri G Bhat wrote:
> I agree too. There is no way we can get -1 with initialized cpu frequency 
> table.
> We don't initialize powernv-cpufreq if we don't have valid CPU frequency
> entries. Is there any other way to suppress the Coverity tool warning apart 
> from
> ignoring it?

So IIUC, this warning is generated by an external tool after static
analysis of the code ?

If yes, then just ignore the warning. We shouldn't try fixing the
kernel because a tool isn't smart enough to catch intentional
ignorance of the return value here.

-- 
viresh


Re: [PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-12 Thread Shilpasri G Bhat
Hi,

On 02/12/2018 03:59 PM, Viresh Kumar wrote:
> On 12-02-18, 15:51, Shilpasri G Bhat wrote:
>> This patch fixes the below Coverity warning:
>>
>> *** CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
>> /drivers/cpufreq/powernv-cpufreq.c: 1008 in powernv_fast_switch()
>> 1002 unsigned int target_freq)
>> 1003 {
>> 1004 int index;
>> 1005 struct powernv_smp_call_data freq_data;
>> 1006
>> 1007 index = cpufreq_table_find_index_dl(policy, target_freq);
> CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
> Using variable "index" as an index to array "powernv_freqs".
>> 1008 freq_data.pstate_id = powernv_freqs[index].driver_data;
>> 1009 freq_data.gpstate_id = powernv_freqs[index].driver_data;
>> 1010 set_pstate(&freq_data);
>> 1011
>> 1012 return powernv_freqs[index].frequency;
>> 1013 }
>>
>> Signed-off-by: Shilpasri G Bhat 
>> ---
>>  drivers/cpufreq/powernv-cpufreq.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
>> b/drivers/cpufreq/powernv-cpufreq.c
>> index 29cdec1..69edfe9 100644
>> --- a/drivers/cpufreq/powernv-cpufreq.c
>> +++ b/drivers/cpufreq/powernv-cpufreq.c
>> @@ -1005,6 +1005,9 @@ static unsigned int powernv_fast_switch(struct 
>> cpufreq_policy *policy,
>>  struct powernv_smp_call_data freq_data;
>>  
>>  index = cpufreq_table_find_index_dl(policy, target_freq);
>> +if (unlikely(index < 0))
>> +index = get_nominal_index();
>> +
> 
> AFAICT, you will get -1 here only if the freq table had no valid
> frequencies (or the freq table is empty). Why would that happen ?

I agree too. There is no way we can get -1 with initialized cpu frequency table.
We don't initialize powernv-cpufreq if we don't have valid CPU frequency
entries. Is there any other way to suppress the Coverity tool warning apart from
ignoring it?

Thanks and Regards,
Shilpa

> 
>>  freq_data.pstate_id = powernv_freqs[index].driver_data;
>>  freq_data.gpstate_id = powernv_freqs[index].driver_data;
>>  set_pstate(&freq_data);
>> -- 
>> 1.8.3.1
> 



Re: [PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-12 Thread Viresh Kumar
On 12-02-18, 15:51, Shilpasri G Bhat wrote:
> This patch fixes the below Coverity warning:
> 
> *** CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
> /drivers/cpufreq/powernv-cpufreq.c: 1008 in powernv_fast_switch()
> 1002  unsigned int target_freq)
> 1003 {
> 1004  int index;
> 1005  struct powernv_smp_call_data freq_data;
> 1006
> 1007  index = cpufreq_table_find_index_dl(policy, target_freq);
> >>> CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
> >>> Using variable "index" as an index to array "powernv_freqs".
> 1008  freq_data.pstate_id = powernv_freqs[index].driver_data;
> 1009  freq_data.gpstate_id = powernv_freqs[index].driver_data;
> 1010  set_pstate(&freq_data);
> 1011
> 1012  return powernv_freqs[index].frequency;
> 1013 }
> 
> Signed-off-by: Shilpasri G Bhat 
> ---
>  drivers/cpufreq/powernv-cpufreq.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/cpufreq/powernv-cpufreq.c 
> b/drivers/cpufreq/powernv-cpufreq.c
> index 29cdec1..69edfe9 100644
> --- a/drivers/cpufreq/powernv-cpufreq.c
> +++ b/drivers/cpufreq/powernv-cpufreq.c
> @@ -1005,6 +1005,9 @@ static unsigned int powernv_fast_switch(struct 
> cpufreq_policy *policy,
>   struct powernv_smp_call_data freq_data;
>  
>   index = cpufreq_table_find_index_dl(policy, target_freq);
> + if (unlikely(index < 0))
> + index = get_nominal_index();
> +

AFAICT, you will get -1 here only if the freq table had no valid
frequencies (or the freq table is empty). Why would that happen ?

>   freq_data.pstate_id = powernv_freqs[index].driver_data;
>   freq_data.gpstate_id = powernv_freqs[index].driver_data;
>   set_pstate(&freq_data);
> -- 
> 1.8.3.1

-- 
viresh


Re: Build regressions/improvements in v4.16-rc1

2018-02-12 Thread Geert Uytterhoeven
On Mon, Feb 12, 2018 at 11:17 AM, Geert Uytterhoeven
 wrote:
> Below is the list of build error/warning regressions/improvements in
> v4.16-rc1[1] compared to v4.15[2].
>
> Summarized:
>   - build errors: +13/-5
>   - build warnings: +1653/-1537
>
> Note that there may be false regressions, as some logs are incomplete.
> Still, they're build errors/warnings.
>
> Happy fixing! ;-)
>
> Thanks to the linux-next team for providing the build service.
>
> [1] 
> http://kisskb.ellerman.id.au/kisskb/head/7928b2cbe55b2a410a0f5c1f154610059c57b1b2/
>  (all 273 configs)
> [2] 
> http://kisskb.ellerman.id.au/kisskb/head/d8a5b80568a9cb66810e75b182018e9edb68e8ff/
>  (271 out of 273 configs)
>
>
> *** ERRORS ***
>
> 13 error regressions:
>   + /home/kisskb/slave/src/arch/powerpc/kvm/powerpc.c: error: 'emulated' may 
> be used uninitialized in this function [-Werror=uninitialized]:  => 1361:2

Lots of powerpc configs

>   + /home/kisskb/slave/src/drivers/net/ethernet/intel/i40e/i40e_ethtool.c: 
> error: implicit declaration of function 'cmpxchg64' 
> [-Werror=implicit-function-declaration]:  => 4443:6, 4443:2

mips{,el}-allmodconfig

>   + /home/kisskb/slave/src/fs/signalfd.c: error: 'BUS_MCEERR_AR' undeclared 
> (first use in this function):  => 126:26

Lots of blackfin configs

>   + error: "mdesc_get_property" [drivers/sbus/char/oradax.ko] undefined!:  => 
> N/A
>   + error: "mdesc_grab" [drivers/sbus/char/oradax.ko] undefined!:  => N/A
>   + error: "mdesc_node_by_name" [drivers/sbus/char/oradax.ko] undefined!:  => 
> N/A
>   + error: "mdesc_release" [drivers/sbus/char/oradax.ko] undefined!:  => N/A
>   + error: "sun4v_ccb_info" [drivers/sbus/char/oradax.ko] undefined!:  => N/A
>   + error: "sun4v_ccb_kill" [drivers/sbus/char/oradax.ko] undefined!:  => N/A
>   + error: "sun4v_ccb_submit" [drivers/sbus/char/oradax.ko] undefined!:  => 
> N/A
>   + error: "sun4v_hvapi_register" [drivers/sbus/char/oradax.ko] undefined!:  
> => N/A

sparc-allmodconfig (i.e. sparc32)

>   + error: No rule to make target arch/ia64/kernel/pci-swiotlb.o:  => N/A

ia64-defconfig (patch availavle)

>   + error: hotplug-cpu.c: undefined reference to `find_and_online_cpu_nid':  
> => .text+0x13c)

ppc64le/pseries_le_defconfig

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH] cpufreq: powernv: Check negative value returned by cpufreq_table_find_index_dl()

2018-02-12 Thread Shilpasri G Bhat
This patch fixes the below Coverity warning:

*** CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
/drivers/cpufreq/powernv-cpufreq.c: 1008 in powernv_fast_switch()
1002unsigned int target_freq)
1003 {
1004int index;
1005struct powernv_smp_call_data freq_data;
1006
1007index = cpufreq_table_find_index_dl(policy, target_freq);
>>> CID 182816:  Memory - illegal accesses  (NEGATIVE_RETURNS)
>>> Using variable "index" as an index to array "powernv_freqs".
1008freq_data.pstate_id = powernv_freqs[index].driver_data;
1009freq_data.gpstate_id = powernv_freqs[index].driver_data;
1010set_pstate(&freq_data);
1011
1012return powernv_freqs[index].frequency;
1013 }

Signed-off-by: Shilpasri G Bhat 
---
 drivers/cpufreq/powernv-cpufreq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index 29cdec1..69edfe9 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -1005,6 +1005,9 @@ static unsigned int powernv_fast_switch(struct 
cpufreq_policy *policy,
struct powernv_smp_call_data freq_data;
 
index = cpufreq_table_find_index_dl(policy, target_freq);
+   if (unlikely(index < 0))
+   index = get_nominal_index();
+
freq_data.pstate_id = powernv_freqs[index].driver_data;
freq_data.gpstate_id = powernv_freqs[index].driver_data;
set_pstate(&freq_data);
-- 
1.8.3.1



linux-4.16-rc1/drivers/misc/ocxl/file.c:320:broken error checking ?

2018-02-12 Thread David Binderman
Hello there,


linux-4.16-rc1/drivers/misc/ocxl/file.c:320]: (style) Checking if unsigned 
variable 'used' is less than zero.

Source code is


   used = append_xsl_error(ctx, &header, buf + sizeof(header));
if (used < 0)
return used;

Suggest put return value from function into signed variable, sanity check it,

then assign it to an unsigned variable.


Also, use of the gcc compiler flag -Wtype-limits will show up this kind of 
problem in future.


Regards


David Binderman



Re: linux-4.16-rc1/drivers/misc/ocxl/file.c:320:broken error checking ?

2018-02-12 Thread Frederic Barrat



Le 12/02/2018 à 09:58, David Binderman a écrit :

Hello there,


linux-4.16-rc1/drivers/misc/ocxl/file.c:320]: (style) Checking if 
unsigned variable 'used' is less than zero.


Source code is


    used = append_xsl_error(ctx, &header, buf + sizeof(header));
     if (used < 0)
     return used;

Suggest put return value from function into signed variable, sanity 
check it,


then assign it to an unsigned variable.


Also, use of the gcc compiler flag -Wtype-limits will show up this kind 
of problem in future.


Thanks for reporting it. A patch to address it is working its way up and 
should land in the next rc release.


  Fred




Regards


David Binderman






[PATCH] powerpc/xmon: Dont register sysrq key when kernel param xmon=off

2018-02-12 Thread Vaibhav Jain
Presently sysrq key for xmon('x') is registered during kernel init
irrespective of the value of kernel param 'xmon'. Thus xmon is enabled
even if 'xmon=off' is passed on the kernel command line.

This minor patch updates setup_xmon_sysrq() to register
'sysrq_xmon_op' only when variable 'xmon_on' is set.

Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/xmon/xmon.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 82e1a3ee6e0f..3b995474b102 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -3642,8 +3642,7 @@ static struct sysrq_key_op sysrq_xmon_op = {
 
 static int __init setup_xmon_sysrq(void)
 {
-   register_sysrq_key('x', &sysrq_xmon_op);
-   return 0;
+   return xmon_on ? register_sysrq_key('x', &sysrq_xmon_op) : 0;
 }
 device_initcall(setup_xmon_sysrq);
 #endif /* CONFIG_MAGIC_SYSRQ */
-- 
2.14.3



Re: [RFC PATCH] powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix

2018-02-12 Thread Aneesh Kumar K.V
"Aneesh Kumar K.V"  writes:

> This needs more performance test. But right now we are wasting lot of space
> in the level 4 page table.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-64k.h  | 9 -
>  arch/powerpc/include/asm/book3s/64/radix-64k.h | 8 
>  2 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 3bcf269f8f55..688f9018302e 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> @@ -24,16 +24,15 @@
>
>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | H_PAGE_COMBO)
> -/*
> - * we support 16 fragments per PTE page of 64K size.
> - */
> -#define H_PTE_FRAG_NR16
>  /*
>   * We use a 2K PTE page fragment and another 2K for storing
>   * real_pte_t hash index
>   */
>  #define H_PTE_FRAG_SIZE_SHIFT  12
> -#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
> +/*
> + * we support 16 fragments per PTE page of 64K size.
> + */
> +#define H_PTE_FRAG_NR(PAGE_SIZE >> RADIX_PTE_FRAG_SIZE_SHIFT)
>
>  #ifndef __ASSEMBLY__
>  #include 
> diff --git a/arch/powerpc/include/asm/book3s/64/radix-64k.h 
> b/arch/powerpc/include/asm/book3s/64/radix-64k.h
> index c7e71ba29555..8029732bb6c4 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix-64k.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix-64k.h
> @@ -10,4 +10,12 @@
>  #define RADIX_PUD_INDEX_SIZE  9
>  #define RADIX_PGD_INDEX_SIZE  13
>
> +/*
> + * We use a 256 byte PTE page fragment in radix
> + */
> +#define RADIX_PTE_FRAG_SIZE_SHIFT  8
> +/*
> + * we support 16 fragments per PTE page of 64K size.
> + */
> +#define RADIX_PTE_FRAG_NR(PAGE_SIZE >> RADIX_PTE_FRAG_SIZE_SHIFT)
>  #endif /* _ASM_POWERPC_PGTABLE_RADIX_64K_H */


missed git refresh

diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 27d096610369..404cdd74bc9c 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -561,8 +561,8 @@ void __init radix__early_init_mmu(void)
/*
 * For now radix also use the same frag size
 */
-   __pte_frag_nr = H_PTE_FRAG_NR;
-   __pte_frag_size_shift = H_PTE_FRAG_SIZE_SHIFT;
+   __pte_frag_nr = RADIX_PTE_FRAG_NR;
+   __pte_frag_size_shift = RADIX_PTE_FRAG_SIZE_SHIFT;
 
if (!firmware_has_feature(FW_FEATURE_LPAR)) {
radix_init_native();



[RFC PATCH] powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix

2018-02-12 Thread Aneesh Kumar K.V
This needs more performance test. But right now we are wasting lot of space
in the level 4 page table.

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/hash-64k.h  | 9 -
 arch/powerpc/include/asm/book3s/64/radix-64k.h | 8 
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 3bcf269f8f55..688f9018302e 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -24,16 +24,15 @@
 
 /* PTE flags to conserve for HPTE identification */
 #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | H_PAGE_COMBO)
-/*
- * we support 16 fragments per PTE page of 64K size.
- */
-#define H_PTE_FRAG_NR  16
 /*
  * We use a 2K PTE page fragment and another 2K for storing
  * real_pte_t hash index
  */
 #define H_PTE_FRAG_SIZE_SHIFT  12
-#define PTE_FRAG_SIZE (1UL << PTE_FRAG_SIZE_SHIFT)
+/*
+ * we support 16 fragments per PTE page of 64K size.
+ */
+#define H_PTE_FRAG_NR  (PAGE_SIZE >> RADIX_PTE_FRAG_SIZE_SHIFT)
 
 #ifndef __ASSEMBLY__
 #include 
diff --git a/arch/powerpc/include/asm/book3s/64/radix-64k.h 
b/arch/powerpc/include/asm/book3s/64/radix-64k.h
index c7e71ba29555..8029732bb6c4 100644
--- a/arch/powerpc/include/asm/book3s/64/radix-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/radix-64k.h
@@ -10,4 +10,12 @@
 #define RADIX_PUD_INDEX_SIZE9
 #define RADIX_PGD_INDEX_SIZE  13
 
+/*
+ * We use a 256 byte PTE page fragment in radix
+ */
+#define RADIX_PTE_FRAG_SIZE_SHIFT  8
+/*
+ * we support 16 fragments per PTE page of 64K size.
+ */
+#define RADIX_PTE_FRAG_NR  (PAGE_SIZE >> RADIX_PTE_FRAG_SIZE_SHIFT)
 #endif /* _ASM_POWERPC_PGTABLE_RADIX_64K_H */
-- 
2.14.3