date:20170918

Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-18 Thread Sergey Senozhatsky

On (09/18/17 10:44), Luck, Tony wrote:
[..]
> 
> A few new warnings when building on ia64:
> 
> arch/ia64/kernel/module.c:931: warning: passing argument 1 of 
> 'dereference_function_descriptor' makes pointer from integer without a cast
> arch/ia64/kernel/module.c:931: warning: return makes integer from pointer 
> without a cast
> kernel/kallsyms.c:325: warning: assignment makes integer from pointer without 
> a cast
> kernel/kallsyms.c:325: warning: passing argument 1 of 
> 'dereference_kernel_function_descriptor' makes pointer from integer without a 
> cast

got it, will address in v2.

[..]
> Which looks like what you wanted. People unaware of the vagaries
> of ppc64/ia64/parisc64 can use the wrong %p[SF] variant, but still
> get the right output.

thanks!

-ss

Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-18 Thread Sergey Senozhatsky

On (09/18/17 20:39), Helge Deller wrote:
[..]
> > A few new warnings when building on ia64:
> > 
> > arch/ia64/kernel/module.c:931: warning: passing argument 1 of 
> > 'dereference_function_descriptor' makes pointer from integer without a cast
> > arch/ia64/kernel/module.c:931: warning: return makes integer from pointer 
> > without a cast
> > kernel/kallsyms.c:325: warning: assignment makes integer from pointer 
> > without a cast
> > kernel/kallsyms.c:325: warning: passing argument 1 of 
> > 'dereference_kernel_function_descriptor' makes pointer from integer without 
> > a cast
> 
> 
> I got similiar warnings on parisc.
> This patch on top of yours fixed those:
> 

Tony, Helge,

thanks for the reports!

I'll simply convert everything to `unsigned long'. including the
dereference_function_descriptor() function [I believe there are
still some casts happening when we pass addr from kernel/module
dereference functions to dereference_function_descriptor(), or
when we return `void *' back to symbol resolution code, etc.)
besides, it seems that everything that uses
dereference_function_descriptor() wants `unsigned long' anyway:

drivers/misc/kgdbts.c:  addr = (unsigned long) 
dereference_function_descriptor((void *)addr);
init/main.c:addr = (unsigned long) dereference_function_descriptor(fn);
kernel/extable.c:   addr = (unsigned long) 
dereference_function_descriptor(ptr);
kernel/module.c:unsigned long a = (unsigned 
long)dereference_function_descriptor(addr);

so I'll just switch it to ulong.


> I did tried your testcases too.
> 
> "echo 1 > /proc/sys/vm/drop_caches" gave correct output:
>  printk#1 schedule_timeout+0x0/0x4a8
>  printk#2 schedule_timeout+0x0/0x4a8
>  printk#3 proc_sys_call_handler+0x120/0x180
>  printk#4 proc_sys_call_handler+0x120/0x180
>  printk#5 proc_sys_call_handler+0x120/0x180
>  printk#6 proc_sys_call_handler+0x120/0x180
> 
> and here is "modprobe zram":
>  printk#7 __UNIQUE_ID_vermagic8+0xb9a4/0xbd04 [zram]
>  printk#8 __UNIQUE_ID_vermagic8+0xb9a4/0xbd04 [zram]
>  printk#9 do_one_initcall+0x194/0x290
>  printk#10 do_one_initcall+0x194/0x290
>  printk#11 do_one_initcall+0x194/0x290
>  printk#12 do_one_initcall+0x194/0x290
>  printk#13 zram_init+0x22c/0x2a0 [zram]
>  printk#14 zram_init+0x22c/0x2a0 [zram]
>  printk#15 zram_init+0x22c/0x2a0 [zram]
>  printk#16 zram_init+0x22c/0x2a0 [zram]
> 
> I wonder why printk#7 and printk#8 don't show "zram_init"...

interesting... what does the unpatched kernel show?


> Regarding your patches:
> 
> In arch/parisc/kernel/process.c:
> +void *dereference_kernel_function_descriptor(void *ptr)
> +{
> +   if (ptr < (void *)__start_opd || (void *)__end_opd < ptr)
> 
> This needs to be (__end_opd is outside):
> +   if (ptr < (void *)__start_opd || (void *)__end_opd <= ptr)
> 
> The same is true for the checks in the other arches.

um... yeah. __end_opd is definitely not a valid place for a descriptor!
I think I had `if (!(ptr >= __start_opd && ptr < __end_opd))' which I
wrongly converted. "shame, shame, shame".

thanks!


> I'd suggest to move the various
>   extern char __start_opd[], __end_opd[];
> out of arch//include/asm/sections.h and into 

ok, will take a look.

-ss

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Eric Sandeen

On 9/18/17 4:31 PM, Dave Chinner wrote:
> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
>> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
>>> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
 Hi,

 A warning is triggered from:

 file fs/iomap.c in function iomap_dio_rw

 if (ret)
 goto out_free_dio;

 ret = invalidate_inode_pages2_range(mapping,
 start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>>  WARN_ON_ONCE(ret);
 ret = 0;

 inode_dio_begin(inode);
>>>
>>> This is expected and an indication of a problematic workload - which
>>> may be triggered by a fuzzer.
>>
>> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
>> the time running xfstests as well.
> 
> Because when a user reports a data corruption, the only evidence we
> have that they are running an app that does something stupid is this
> warning in their syslogs.  Tracepoints are not useful for replacing
> warnings about data corruption vectors being triggered.

Is the full WARN_ON spew really helpful to us, though?  Certainly
the user has no idea what it means, and will come away terrified
but none the wiser.

Would a more informative printk_once() still give us the evidence
without the ZOMG I THINK I OOPSED that a WARN_ON produces?  Or do we 
want/need the backtrace?

-Eric

> It needs to be on by default, bu tI'm sure we can wrap it with
> something like an xfs_alert_tag() type of construct so the tag can
> be set in /proc/fs/xfs/panic_mask to suppress it if testers so
> desire.
> 
> Cheers,
> 
> Dave.
>

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner

On Mon, Sep 18, 2017 at 05:00:58PM -0500, Eric Sandeen wrote:
> On 9/18/17 4:31 PM, Dave Chinner wrote:
> > On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> >> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> >>> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
>  Hi,
> 
>  A warning is triggered from:
> 
>  file fs/iomap.c in function iomap_dio_rw
> 
>  if (ret)
>  goto out_free_dio;
> 
>  ret = invalidate_inode_pages2_range(mapping,
>  start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> >>  WARN_ON_ONCE(ret);
>  ret = 0;
> 
>  inode_dio_begin(inode);
> >>>
> >>> This is expected and an indication of a problematic workload - which
> >>> may be triggered by a fuzzer.
> >>
> >> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> >> the time running xfstests as well.
> > 
> > Because when a user reports a data corruption, the only evidence we
> > have that they are running an app that does something stupid is this
> > warning in their syslogs.  Tracepoints are not useful for replacing
> > warnings about data corruption vectors being triggered.
> 
> Is the full WARN_ON spew really helpful to us, though?  Certainly
> the user has no idea what it means, and will come away terrified
> but none the wiser.
> 
> Would a more informative printk_once() still give us the evidence
> without the ZOMG I THINK I OOPSED that a WARN_ON produces?  Or do we 
> want/need the backtrace?

backtrace is actually useful - that's how I recently learnt that
splice now supports direct IO.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Darrick J. Wong

On Mon, Sep 18, 2017 at 05:00:58PM -0500, Eric Sandeen wrote:
> On 9/18/17 4:31 PM, Dave Chinner wrote:
> > On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> >> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> >>> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
>  Hi,
> 
>  A warning is triggered from:
> 
>  file fs/iomap.c in function iomap_dio_rw
> 
>  if (ret)
>  goto out_free_dio;
> 
>  ret = invalidate_inode_pages2_range(mapping,
>  start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> >>  WARN_ON_ONCE(ret);
>  ret = 0;
> 
>  inode_dio_begin(inode);
> >>>
> >>> This is expected and an indication of a problematic workload - which
> >>> may be triggered by a fuzzer.
> >>
> >> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> >> the time running xfstests as well.
> > 
> > Because when a user reports a data corruption, the only evidence we
> > have that they are running an app that does something stupid is this
> > warning in their syslogs.  Tracepoints are not useful for replacing
> > warnings about data corruption vectors being triggered.
> 
> Is the full WARN_ON spew really helpful to us, though?  Certainly
> the user has no idea what it means, and will come away terrified
> but none the wiser.
> 
> Would a more informative printk_once() still give us the evidence
> without the ZOMG I THINK I OOPSED that a WARN_ON produces?  Or do we 
> want/need the backtrace?

Maybe we could state a little more directly what's going on:

if (err)
printk_once(KERN_INFO "Urk, collision detected between direct IO and 
page cache, YHL. HAND.\n"); ?

8-)

--D

> 
> -Eric
> 
> > It needs to be on by default, bu tI'm sure we can wrap it with
> > something like an xfs_alert_tag() type of construct so the tag can
> > be set in /proc/fs/xfs/panic_mask to suppress it if testers so
> > desire.
> > 
> > Cheers,
> > 
> > Dave.
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-next" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] net/ethernet/freescale: fix warning for ucc_geth

2017-09-18 Thread David Miller

From: Valentin Longchamp 
Date: Fri, 15 Sep 2017 07:58:47 +0200

> uf_info.regs is resource_size_t i.e. phys_addr_t that can be either u32
> or u64 according to CONFIG_PHYS_ADDR_T_64BIT.
> 
> The printk format is thus adaptet to u64 and the regs value cast to u64
> to take both u32 and u64 into account.
> 
> Signed-off-by: Valentin Longchamp 

Applied to net-next, thanks.

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner

On Mon, Sep 18, 2017 at 09:51:29AM -0600, Jens Axboe wrote:
> On 09/18/2017 09:43 AM, Al Viro wrote:
> > On Mon, Sep 18, 2017 at 05:39:47PM +0200, Christoph Hellwig wrote:
> >> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> >>> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> >>> the time running xfstests as well.
> >>
> >> Dave insisted on it to decourage users/applications from mixing
> >> mmap and direct I/O.
> >>
> >> In many ways a tracepoint might be the better way to diagnose these.
> > 
> > sysctl suppressing those two, perhaps?
> 
> I'd rather just make it a trace point, but don't care too much.
> 
> The code doesn't even have a comment as to why that WARN_ON() is
> there or expected.

The big comment about how bad cache invalidation failures are is on
the second, post-io invocation of the page cache flush. That's the
failure that exposes the data coherency problem to userspace:

/*
 * Try again to invalidate clean pages which might have been cached by
 * non-direct readahead, or faulted in by get_user_pages() if the source
 * of the write was an mmap'ed region of the file we're writing.  Either
 * one is a pretty crazy thing to do, so we don't support it 100%.  If
 * this invalidation fails, tough, the write still worked...
 */
if (iov_iter_rw(iter) == WRITE) {
int err = invalidate_inode_pages2_range(mapping,
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
WARN_ON_ONCE(err);
}

IOWs, the first warning is a "bad things might be about to
happen" warning, the second is "bad things have happened".

> Seems pretty sloppy to me, not a great way
> to "discourage" users to mix mmap/dio.

Again, it has nothing to do with "discouraging users" and everything
about post-bug report problem triage.

Yes, the first invalidation should also have a comment like the post
IO invalidation - the comment probably got dropped and not noticed
when the changeover from internal XFS code to generic iomap code was
made...

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Dave Chinner

On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> > On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
> >> Hi,
> >>
> >> A warning is triggered from:
> >>
> >> file fs/iomap.c in function iomap_dio_rw
> >>
> >> if (ret)
> >> goto out_free_dio;
> >>
> >> ret = invalidate_inode_pages2_range(mapping,
> >> start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>   WARN_ON_ONCE(ret);
> >> ret = 0;
> >>
> >> inode_dio_begin(inode);
> > 
> > This is expected and an indication of a problematic workload - which
> > may be triggered by a fuzzer.
> 
> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> the time running xfstests as well.

Because when a user reports a data corruption, the only evidence we
have that they are running an app that does something stupid is this
warning in their syslogs.  Tracepoints are not useful for replacing
warnings about data corruption vectors being triggered.

It needs to be on by default, bu tI'm sure we can wrap it with
something like an xfs_alert_tag() type of construct so the tag can
be set in /proc/fs/xfs/panic_mask to suppress it if testers so
desire.

Cheers,

Dave.

-- 
Dave Chinner
da...@fromorbit.com

linux-4.14-rc1/arch/powerpc/perf/hv-24x7.c:541: bad condition ?

2017-09-18 Thread David Binderman

Hello there,

linux-4.14-rc1/arch/powerpc/perf/hv-24x7.c:543]: (warning) Identical condition 
's1 s1)
return -1;

Suggest code rework.

Regards

David Binderman

[v5 12/12] soc/fsl/qbman: Enable FSL_LAYERSCAPE config on ARM

2017-09-18 Thread Roy Pledge

From: Madalin Bucur 

Signed-off-by: Madalin Bucur 
Signed-off-by: Claudiu Manoil 
[Stuart: changed to use ARCH_LAYERSCAPE]
Signed-off-by: Stuart Yoder 
Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/Kconfig b/drivers/soc/fsl/qbman/Kconfig
index 757033c..fb4e6bf 100644
--- a/drivers/soc/fsl/qbman/Kconfig
+++ b/drivers/soc/fsl/qbman/Kconfig
@@ -1,6 +1,6 @@
 menuconfig FSL_DPAA
bool "Freescale DPAA 1.x support"
-   depends on FSL_SOC_BOOKE
+   depends on (FSL_SOC_BOOKE || ARCH_LAYERSCAPE)
select GENERIC_ALLOCATOR
help
  The Freescale Data Path Acceleration Architecture (DPAA) is a set of
-- 
2.7.4

[v5 11/12] soc/fsl/qbman: Add missing headers on ARM

2017-09-18 Thread Roy Pledge

From: Claudiu Manoil 

Unlike PPC builds, ARM builds need following headers
explicitly:
+#include   for ioread32be()
+#includefor udelay()

Signed-off-by: Claudiu Manoil 
Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/dpaa_sys.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h
index 5a2c0af..9f37900 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.h
+++ b/drivers/soc/fsl/qbman/dpaa_sys.h
@@ -44,6 +44,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /* For 2-element tables related to cache-inhibited and cache-enabled mappings 
*/
 #define DPAA_PORTAL_CE 0
-- 
2.7.4

[v5 10/12] soc/fsl/qbman: different register offsets on ARM

2017-09-18 Thread Roy Pledge

From: Madalin Bucur 

Signed-off-by: Madalin Bucur 
Signed-off-by: Claudiu Manoil 
Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/bman.c | 22 ++
 drivers/soc/fsl/qbman/qman.c | 38 ++
 2 files changed, 60 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index 5dbb5cc..2e6e682 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -35,6 +35,27 @@
 
 /* Portal register assists */
 
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+/* Cache-inhibited register offsets */
+#define BM_REG_RCR_PI_CINH 0x3000
+#define BM_REG_RCR_CI_CINH 0x3100
+#define BM_REG_RCR_ITR 0x3200
+#define BM_REG_CFG 0x3300
+#define BM_REG_SCN(n)  (0x3400 + ((n) << 6))
+#define BM_REG_ISR 0x3e00
+#define BM_REG_IER 0x3e40
+#define BM_REG_ISDR0x3e80
+#define BM_REG_IIR 0x3ec0
+
+/* Cache-enabled register offsets */
+#define BM_CL_CR   0x
+#define BM_CL_RR0  0x0100
+#define BM_CL_RR1  0x0140
+#define BM_CL_RCR  0x1000
+#define BM_CL_RCR_PI_CENA  0x3000
+#define BM_CL_RCR_CI_CENA  0x3100
+
+#else
 /* Cache-inhibited register offsets */
 #define BM_REG_RCR_PI_CINH 0x
 #define BM_REG_RCR_CI_CINH 0x0004
@@ -53,6 +74,7 @@
 #define BM_CL_RCR  0x1000
 #define BM_CL_RCR_PI_CENA  0x3000
 #define BM_CL_RCR_CI_CENA  0x3100
+#endif
 
 /*
  * Portal modes.
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 8934c27..7cb7bad 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -41,6 +41,43 @@
 
 /* Portal register assists */
 
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+/* Cache-inhibited register offsets */
+#define QM_REG_EQCR_PI_CINH0x3000
+#define QM_REG_EQCR_CI_CINH0x3040
+#define QM_REG_EQCR_ITR0x3080
+#define QM_REG_DQRR_PI_CINH0x3100
+#define QM_REG_DQRR_CI_CINH0x3140
+#define QM_REG_DQRR_ITR0x3180
+#define QM_REG_DQRR_DCAP   0x31C0
+#define QM_REG_DQRR_SDQCR  0x3200
+#define QM_REG_DQRR_VDQCR  0x3240
+#define QM_REG_DQRR_PDQCR  0x3280
+#define QM_REG_MR_PI_CINH  0x3300
+#define QM_REG_MR_CI_CINH  0x3340
+#define QM_REG_MR_ITR  0x3380
+#define QM_REG_CFG 0x3500
+#define QM_REG_ISR 0x3600
+#define QM_REG_IER 0x3640
+#define QM_REG_ISDR0x3680
+#define QM_REG_IIR 0x36C0
+#define QM_REG_ITPR0x3740
+
+/* Cache-enabled register offsets */
+#define QM_CL_EQCR 0x
+#define QM_CL_DQRR 0x1000
+#define QM_CL_MR   0x2000
+#define QM_CL_EQCR_PI_CENA 0x3000
+#define QM_CL_EQCR_CI_CENA 0x3040
+#define QM_CL_DQRR_PI_CENA 0x3100
+#define QM_CL_DQRR_CI_CENA 0x3140
+#define QM_CL_MR_PI_CENA   0x3300
+#define QM_CL_MR_CI_CENA   0x3340
+#define QM_CL_CR   0x3800
+#define QM_CL_RR0  0x3900
+#define QM_CL_RR1  0x3940
+
+#else
 /* Cache-inhibited register offsets */
 #define QM_REG_EQCR_PI_CINH0x
 #define QM_REG_EQCR_CI_CINH0x0004
@@ -75,6 +112,7 @@
 #define QM_CL_CR   0x3800
 #define QM_CL_RR0  0x3900
 #define QM_CL_RR1  0x3940
+#endif
 
 /*
  * BTW, the drivers (and h/w programming model) already obtain the required
-- 
2.7.4

[v5 08/12] soc/fsl/qbman: Rework portal mapping calls for ARM/PPC

2017-09-18 Thread Roy Pledge

Rework portal mapping for PPC and ARM. The PPC devices require a
cacheable coherent mapping while ARM will work with a non-cachable/write
combine mapping. This also eliminates the need for manual cache
flushes on ARM. This also fixes the code so sparse checking is clean.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/bman.c| 18 ++
 drivers/soc/fsl/qbman/bman_portal.c | 23 ++-
 drivers/soc/fsl/qbman/bman_priv.h   |  8 +++-
 drivers/soc/fsl/qbman/dpaa_sys.h| 15 +++
 drivers/soc/fsl/qbman/qman.c| 31 +--
 drivers/soc/fsl/qbman/qman_portal.c | 23 ++-
 drivers/soc/fsl/qbman/qman_priv.h   |  8 +++-
 7 files changed, 60 insertions(+), 66 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index ff8998f..5dbb5cc 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -154,7 +154,8 @@ struct bm_mc {
 };
 
 struct bm_addr {
-   void __iomem *ce;   /* cache-enabled */
+   void *ce;   /* cache-enabled */
+   __be32 *ce_be;  /* Same as above but for direct access */
void __iomem *ci;   /* cache-inhibited */
 };
 
@@ -167,12 +168,12 @@ struct bm_portal {
 /* Cache-inhibited register access. */
 static inline u32 bm_in(struct bm_portal *p, u32 offset)
 {
-   return be32_to_cpu(__raw_readl(p->addr.ci + offset));
+   return ioread32be(p->addr.ci + offset);
 }
 
 static inline void bm_out(struct bm_portal *p, u32 offset, u32 val)
 {
-   __raw_writel(cpu_to_be32(val), p->addr.ci + offset);
+   iowrite32be(val, p->addr.ci + offset);
 }
 
 /* Cache Enabled Portal Access */
@@ -188,7 +189,7 @@ static inline void bm_cl_touch_ro(struct bm_portal *p, u32 
offset)
 
 static inline u32 bm_ce_in(struct bm_portal *p, u32 offset)
 {
-   return be32_to_cpu(__raw_readl(p->addr.ce + offset));
+   return be32_to_cpu(*(p->addr.ce_be + (offset/4)));
 }
 
 struct bman_portal {
@@ -408,7 +409,7 @@ static int bm_mc_init(struct bm_portal *portal)
 
mc->cr = portal->addr.ce + BM_CL_CR;
mc->rr = portal->addr.ce + BM_CL_RR0;
-   mc->rridx = (__raw_readb(>cr->_ncw_verb) & BM_MCC_VERB_VBIT) ?
+   mc->rridx = (mc->cr->_ncw_verb & BM_MCC_VERB_VBIT) ?
0 : 1;
mc->vbit = mc->rridx ? BM_MCC_VERB_VBIT : 0;
 #ifdef CONFIG_FSL_DPAA_CHECKING
@@ -466,7 +467,7 @@ static inline union bm_mc_result *bm_mc_result(struct 
bm_portal *portal)
 * its command is submitted and completed. This includes the valid-bit,
 * in case you were wondering...
 */
-   if (!__raw_readb(>verb)) {
+   if (!rr->verb) {
dpaa_invalidate_touch_ro(rr);
return NULL;
}
@@ -512,8 +513,9 @@ static int bman_create_portal(struct bman_portal *portal,
 * config, everything that follows depends on it and "config" is more
 * for (de)reference...
 */
-   p->addr.ce = c->addr_virt[DPAA_PORTAL_CE];
-   p->addr.ci = c->addr_virt[DPAA_PORTAL_CI];
+   p->addr.ce = c->addr_virt_ce;
+   p->addr.ce_be = c->addr_virt_ce;
+   p->addr.ci = c->addr_virt_ci;
if (bm_rcr_init(p, bm_rcr_pvb, bm_rcr_cce)) {
dev_err(c->dev, "RCR initialisation failed\n");
goto fail_rcr;
diff --git a/drivers/soc/fsl/qbman/bman_portal.c 
b/drivers/soc/fsl/qbman/bman_portal.c
index 39b39c8..2f71f7d 100644
--- a/drivers/soc/fsl/qbman/bman_portal.c
+++ b/drivers/soc/fsl/qbman/bman_portal.c
@@ -91,7 +91,6 @@ static int bman_portal_probe(struct platform_device *pdev)
struct device_node *node = dev->of_node;
struct bm_portal_config *pcfg;
struct resource *addr_phys[2];
-   void __iomem *va;
int irq, cpu;
 
pcfg = devm_kmalloc(dev, sizeof(*pcfg), GFP_KERNEL);
@@ -123,23 +122,21 @@ static int bman_portal_probe(struct platform_device *pdev)
}
pcfg->irq = irq;
 
-   va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]), 0);
-   if (!va) {
-   dev_err(dev, "ioremap::CE failed\n");
+   pcfg->addr_virt_ce = memremap(addr_phys[0]->start,
+   resource_size(addr_phys[0]),
+   QBMAN_MEMREMAP_ATTR);
+   if (!pcfg->addr_virt_ce) {
+   dev_err(dev, "memremap::CE failed\n");
goto err_ioremap1;
}
 
-   pcfg->addr_virt[DPAA_PORTAL_CE] = va;
-
-   va = ioremap_prot(addr_phys[1]->start, resource_size(addr_phys[1]),
- _PAGE_GUARDED | _PAGE_NO_CACHE);
-   if (!va) {
+   pcfg->addr_virt_ci = ioremap(addr_phys[1]->start,
+   resource_size(addr_phys[1]));
+   if (!pcfg->addr_virt_ci) {
dev_err(dev, "ioremap::CI failed\n");
goto err_ioremap2;
}
 
-

[v5 07/12] soc/fsl/qbman: Fix ARM32 typo

2017-09-18 Thread Roy Pledge

From: Valentin Rothberg 

The Kconfig symbol for 32bit ARM is 'ARM', not 'ARM32'.

Signed-off-by: Valentin Rothberg 
Signed-off-by: Claudiu Manoil 
Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/dpaa_sys.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h
index 4b1a467..61cfdb3 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.h
+++ b/drivers/soc/fsl/qbman/dpaa_sys.h
@@ -53,7 +53,7 @@ static inline void dpaa_flush(void *p)
 {
 #ifdef CONFIG_PPC
flush_dcache_range((unsigned long)p, (unsigned long)p+64);
-#elif defined(CONFIG_ARM32)
+#elif defined(CONFIG_ARM)
__cpuc_flush_dcache_area(p, 64);
 #elif defined(CONFIG_ARM64)
__flush_dcache_area(p, 64);
-- 
2.7.4

[v5 06/12] soc/fsl/qbman: Drop L1_CACHE_BYTES compile time check

2017-09-18 Thread Roy Pledge

From: Claudiu Manoil 

Not relevant and arch dependent. Overkill for PPC.

Signed-off-by: Claudiu Manoil 
Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/dpaa_sys.h | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h
index 676af82..4b1a467 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.h
+++ b/drivers/soc/fsl/qbman/dpaa_sys.h
@@ -49,10 +49,6 @@
 #define DPAA_PORTAL_CE 0
 #define DPAA_PORTAL_CI 1
 
-#if (L1_CACHE_BYTES != 32) && (L1_CACHE_BYTES != 64)
-#error "Unsupported Cacheline Size"
-#endif
-
 static inline void dpaa_flush(void *p)
 {
 #ifdef CONFIG_PPC
-- 
2.7.4

[v5 04/12] dt-bindings: soc/fsl: Update reserved memory binding for QBMan

2017-09-18 Thread Roy Pledge

Updates the QMan and BMan device tree bindings for reserved memory
nodes. This makes the reserved memory allocation compatible with
the shared-dma-pool usage.

Signed-off-by: Roy Pledge 
---
 Documentation/devicetree/bindings/soc/fsl/bman.txt | 12 +-
 Documentation/devicetree/bindings/soc/fsl/qman.txt | 26 --
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/Documentation/devicetree/bindings/soc/fsl/bman.txt 
b/Documentation/devicetree/bindings/soc/fsl/bman.txt
index 47ac834..48eed14 100644
--- a/Documentation/devicetree/bindings/soc/fsl/bman.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/bman.txt
@@ -65,8 +65,8 @@ to the respective BMan instance
 BMan Private Memory Node
 
 BMan requires a contiguous range of physical memory used for the backing store
-for BMan Free Buffer Proxy Records (FBPR). This memory is reserved/allocated 
as a
-node under the /reserved-memory node
+for BMan Free Buffer Proxy Records (FBPR). This memory is reserved/allocated as
+a node under the /reserved-memory node.
 
 The BMan FBPR memory node must be named "bman-fbpr"
 
@@ -75,7 +75,9 @@ PROPERTIES
 - compatible
Usage:  required
Value type: 
-   Definition: Must inclide "fsl,bman-fbpr"
+   Definition: PPC platforms: Must include "fsl,bman-fbpr"
+   ARM platforms: Must include "shared-dma-pool"
+  as well as the "no-map" property
 
 The following constraints are relevant to the FBPR private memory:
- The size must be 2^(size + 1), with size = 11..33. That is 4 KiB to
@@ -100,10 +102,10 @@ The example below shows a BMan FBPR dynamic allocation 
memory node
ranges;
 
bman_fbpr: bman-fbpr {
-   compatible = "fsl,bman-fbpr";
-   alloc-ranges = <0 0 0x10 0>;
+   compatible = "shared-mem-pool";
size = <0 0x100>;
alignment = <0 0x100>;
+   no-map;
};
};
 
diff --git a/Documentation/devicetree/bindings/soc/fsl/qman.txt 
b/Documentation/devicetree/bindings/soc/fsl/qman.txt
index 556ebb8..ee96afd 100644
--- a/Documentation/devicetree/bindings/soc/fsl/qman.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/qman.txt
@@ -60,6 +60,12 @@ are located at offsets 0xbf8 and 0xbfc
Value type: 
Definition: Reference input clock. Its frequency is half of the
platform clock
+- memory-regions
+   Usage:  Required for ARM
+   Value type: 
+   Definition: List of phandles referencing the QMan private memory
+   nodes (described below). The qman-fqd node must be
+   first followed by qman-pfdr node. Only used on ARM
 
 Devices connected to a QMan instance via Direct Connect Portals (DCP) must link
 to the respective QMan instance
@@ -74,7 +80,9 @@ QMan Private Memory Nodes
 
 QMan requires two contiguous range of physical memory used for the backing 
store
 for QMan Frame Queue Descriptor (FQD) and Packed Frame Descriptor Record 
(PFDR).
-This memory is reserved/allocated as a nodes under the /reserved-memory node
+This memory is reserved/allocated as a node under the /reserved-memory node.
+
+For additional details about reserved memory regions see reserved-memory.txt
 
 The QMan FQD memory node must be named "qman-fqd"
 
@@ -83,7 +91,9 @@ PROPERTIES
 - compatible
Usage:  required
Value type: 
-   Definition: Must inclide "fsl,qman-fqd"
+   Definition: PPC platforms: Must include "fsl,qman-fqd"
+   ARM platforms: Must include "shared-dma-pool"
+  as well as the "no-map" property
 
 The QMan PFDR memory node must be named "qman-pfdr"
 
@@ -92,7 +102,9 @@ PROPERTIES
 - compatible
Usage:  required
Value type: 
-   Definition: Must inclide "fsl,qman-pfdr"
+   Definition: PPC platforms: Must include "fsl,qman-pfdr"
+   ARM platforms: Must include "shared-dma-pool"
+  as well as the "no-map" property
 
 The following constraints are relevant to the FQD and PFDR private memory:
- The size must be 2^(size + 1), with size = 11..29. That is 4 KiB to
@@ -117,16 +129,16 @@ The example below shows a QMan FQD and a PFDR dynamic 
allocation memory nodes
ranges;
 
qman_fqd: qman-fqd {
-   compatible = "fsl,qman-fqd";
-   alloc-ranges = <0 0 0x10 0>;
+   compatible = "shared-dma-pool";
size = <0 0x40>;
alignment = <0 0x40>;
+   no-map;
};
qman_pfdr: qman-pfdr {
-

[v5 09/12] soc/fsl/qbman: add QMAN_REV32

2017-09-18 Thread Roy Pledge

From: Madalin Bucur 

Add revision 3.2 of the QBMan block.  This is the version
for LS1043A and LS1046A SoCs.

Signed-off-by: Madalin Bucur 
Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman_ccsr.c | 2 ++
 drivers/soc/fsl/qbman/qman_priv.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c 
b/drivers/soc/fsl/qbman/qman_ccsr.c
index 607355b9..79cba58 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -719,6 +719,8 @@ static int fsl_qman_probe(struct platform_device *pdev)
qman_ip_rev = QMAN_REV30;
else if (major == 3 && minor == 1)
qman_ip_rev = QMAN_REV31;
+   else if (major == 3 && minor == 2)
+   qman_ip_rev = QMAN_REV32;
else {
dev_err(dev, "Unknown QMan version\n");
return -ENODEV;
diff --git a/drivers/soc/fsl/qbman/qman_priv.h 
b/drivers/soc/fsl/qbman/qman_priv.h
index 9407d2e..75a8f90 100644
--- a/drivers/soc/fsl/qbman/qman_priv.h
+++ b/drivers/soc/fsl/qbman/qman_priv.h
@@ -183,6 +183,7 @@ struct qm_portal_config {
 #define QMAN_REV20 0x0200
 #define QMAN_REV30 0x0300
 #define QMAN_REV31 0x0301
+#define QMAN_REV32 0x0302
 extern u16 qman_ip_rev; /* 0 if uninitialised, otherwise QMAN_REVx */
 
 #define QM_FQID_RANGE_START 1 /* FQID 0 reserved for internal use */
-- 
2.7.4

[v5 05/12] soc/fsl/qbman: Drop set/clear_bits usage

2017-09-18 Thread Roy Pledge

From: Madalin Bucur 

Replace PPC specific set/clear_bits API with standard
bit twiddling so driver is portalable outside PPC.

Signed-off-by: Madalin Bucur 
Signed-off-by: Claudiu Manoil 
Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/bman.c | 2 +-
 drivers/soc/fsl/qbman/qman.c | 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/fsl/qbman/bman.c b/drivers/soc/fsl/qbman/bman.c
index 604e45c..ff8998f 100644
--- a/drivers/soc/fsl/qbman/bman.c
+++ b/drivers/soc/fsl/qbman/bman.c
@@ -616,7 +616,7 @@ int bman_p_irqsource_add(struct bman_portal *p, u32 bits)
unsigned long irqflags;
 
local_irq_save(irqflags);
-   set_bits(bits & BM_PIRQ_VISIBLE, >irq_sources);
+   p->irq_sources |= bits & BM_PIRQ_VISIBLE;
bm_out(>p, BM_REG_IER, p->irq_sources);
local_irq_restore(irqflags);
return 0;
diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index 1bcfc51..25419e1 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -908,12 +908,12 @@ static inline int qm_mc_result_timeout(struct qm_portal 
*portal,
 
 static inline void fq_set(struct qman_fq *fq, u32 mask)
 {
-   set_bits(mask, >flags);
+   fq->flags |= mask;
 }
 
 static inline void fq_clear(struct qman_fq *fq, u32 mask)
 {
-   clear_bits(mask, >flags);
+   fq->flags &= ~mask;
 }
 
 static inline int fq_isset(struct qman_fq *fq, u32 mask)
@@ -1574,7 +1574,7 @@ void qman_p_irqsource_add(struct qman_portal *p, u32 bits)
unsigned long irqflags;
 
local_irq_save(irqflags);
-   set_bits(bits & QM_PIRQ_VISIBLE, >irq_sources);
+   p->irq_sources |= bits & QM_PIRQ_VISIBLE;
qm_out(>p, QM_REG_IER, p->irq_sources);
local_irq_restore(irqflags);
 }
@@ -1597,7 +1597,7 @@ void qman_p_irqsource_remove(struct qman_portal *p, u32 
bits)
 */
local_irq_save(irqflags);
bits &= QM_PIRQ_VISIBLE;
-   clear_bits(bits, >irq_sources);
+   p->irq_sources &= ~bits;
qm_out(>p, QM_REG_IER, p->irq_sources);
ier = qm_in(>p, QM_REG_IER);
/*
-- 
2.7.4

[v5 01/12] soc/fsl/qbman: Add common routine for QBMan private allocations

2017-09-18 Thread Roy Pledge

The QBMan device uses several memory regions to manage frame
queues and buffers. Add a common routine for extracting and
initializing these reserved memory areas.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/Makefile   |  2 +-
 drivers/soc/fsl/qbman/dpaa_sys.c | 78 
 drivers/soc/fsl/qbman/dpaa_sys.h |  4 +++
 3 files changed, 83 insertions(+), 1 deletion(-)
 create mode 100644 drivers/soc/fsl/qbman/dpaa_sys.c

diff --git a/drivers/soc/fsl/qbman/Makefile b/drivers/soc/fsl/qbman/Makefile
index 7ae199f..3cbd08a 100644
--- a/drivers/soc/fsl/qbman/Makefile
+++ b/drivers/soc/fsl/qbman/Makefile
@@ -1,6 +1,6 @@
 obj-$(CONFIG_FSL_DPAA)  += bman_ccsr.o qman_ccsr.o \
   bman_portal.o qman_portal.o \
-  bman.o qman.o
+  bman.o qman.o dpaa_sys.o
 
 obj-$(CONFIG_FSL_BMAN_TEST) += bman-test.o
 bman-test-y  = bman_test.o
diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c
new file mode 100644
index 000..9436aa8
--- /dev/null
+++ b/drivers/soc/fsl/qbman/dpaa_sys.c
@@ -0,0 +1,78 @@
+/* Copyright 2017 NXP Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of NXP Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NXP Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL NXP Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include "dpaa_sys.h"
+
+/*
+ * Initialize a devices private memory region
+ */
+int qbman_init_private_mem(struct device *dev, int idx, dma_addr_t *addr,
+   size_t *size)
+{
+   int ret;
+   struct device_node *mem_node;
+   u64 size64;
+
+   ret = of_reserved_mem_device_init_by_idx(dev, dev->of_node, idx);
+   if (ret) {
+   dev_err(dev,
+   "of_reserved_mem_device_init_by_idx(%d) failed 0x%x\n",
+   idx, ret);
+   return -ENODEV;
+   }
+   mem_node = of_parse_phandle(dev->of_node, "memory-region", 0);
+   if (mem_node) {
+   ret = of_property_read_u64(mem_node, "size", );
+   if (ret) {
+   dev_err(dev, "of_address_to_resource fails 0x%x\n",
+   ret);
+   return -ENODEV;
+   }
+   *size = size64;
+   } else {
+   dev_err(dev, "No memory-region found for index %d\n", idx);
+   return -ENODEV;
+   }
+
+   if (!dma_zalloc_coherent(dev, *size, addr, 0)) {
+   dev_err(dev, "DMA Alloc memory failed\n");
+   return -ENODEV;
+   }
+
+   /*
+* Disassociate the reserved memory area from the device
+* because a device can only have one DMA memory area. This
+* should be fine since the memory is allocated and initialized
+* and only ever accessed by the QBMan device from now on
+*/
+   of_reserved_mem_device_release(dev);
+   return 0;
+}
diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h b/drivers/soc/fsl/qbman/dpaa_sys.h
index 2ce394a..676af82 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.h
+++ b/drivers/soc/fsl/qbman/dpaa_sys.h
@@ -102,4 +102,8 @@ static inline u8 dpaa_cyc_diff(u8 ringsize, u8 first, u8 
last)

[v5 00/12] soc/fsl/qbman: Enable QBMan on ARM Platforms

2017-09-18 Thread Roy Pledge

This patch series enables DPAA1 QBMan devices for ARM and
ARM64 architectures. This allows the LS1043A and LS1046A to use
QBMan functionality which allows access to ethernet and cyptographic
devices for example.

Changes since v4:
- Introduce a common function for QBMan private memory initialization
- Fix sparse warnings making sure that __iomem and __be32 are respected
- Control different memremap() attributes using a #define

Changes since v3:
- Use memremap() instead of ioremap() for non iomem QBMan portal regions
- Ensured the __iomem attribute is respected when accessing iomem mapped regions
- Removed calls to flush/invalidate/prefetch for ARM/ARM64 since mapping is 
done as write combine

Changes since v2:
- Fixed some misspellings
- Added 'no-map' constraint to device tree bindings
- Described ordering contraint on regions in the device tree
- Removed confusing comment regarding non-shareable mappings
- Added warning if old reserved-memory technique is used on ARM

Changes since v1:
- Reworked private memory allocations to use shared-dma-pool on ARM platforms


Claudiu Manoil (2):
  soc/fsl/qbman: Drop L1_CACHE_BYTES compile time check
  soc/fsl/qbman: Add missing headers on ARM

Madalin Bucur (4):
  soc/fsl/qbman: Drop set/clear_bits usage
  soc/fsl/qbman: add QMAN_REV32
  soc/fsl/qbman: different register offsets on ARM
  soc/fsl/qbman: Enable FSL_LAYERSCAPE config on ARM

Roy Pledge (5):
  soc/fsl/qbman: Add common routine for QBMan private allocations
  soc/fsl/qbman: Use shared-dma-pool for BMan private memory allocations
  soc/fsl/qbman: Use shared-dma-pool for QMan private memory allocations
  dt-bindings: soc/fsl: Update reserved memory binding for QBMan
  soc/fsl/qbman: Rework portal mapping calls for ARM/PPC

Valentin Rothberg (1):
  soc/fsl/qbman: Fix ARM32 typo

 Documentation/devicetree/bindings/soc/fsl/bman.txt | 12 +--
 Documentation/devicetree/bindings/soc/fsl/qman.txt | 26 --
 drivers/soc/fsl/qbman/Kconfig  |  2 +-
 drivers/soc/fsl/qbman/Makefile |  2 +-
 drivers/soc/fsl/qbman/bman.c   | 42 --
 drivers/soc/fsl/qbman/bman_ccsr.c  | 15 
 drivers/soc/fsl/qbman/bman_portal.c| 23 +++---
 drivers/soc/fsl/qbman/bman_priv.h  |  8 +-
 drivers/soc/fsl/qbman/dpaa_sys.c   | 78 ++
 drivers/soc/fsl/qbman/dpaa_sys.h   | 25 --
 drivers/soc/fsl/qbman/qman.c   | 77 +-
 drivers/soc/fsl/qbman/qman_ccsr.c  | 95 +++---
 drivers/soc/fsl/qbman/qman_portal.c| 23 +++---
 drivers/soc/fsl/qbman/qman_priv.h  | 11 +--
 drivers/soc/fsl/qbman/qman_test.h  |  2 -
 15 files changed, 318 insertions(+), 123 deletions(-)
 create mode 100644 drivers/soc/fsl/qbman/dpaa_sys.c

--
2.7.4

[v5 02/12] soc/fsl/qbman: Use shared-dma-pool for BMan private memory allocations

2017-09-18 Thread Roy Pledge

Use the shared-memory-pool mechanism for free buffer proxy record
area allocation.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/bman_ccsr.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/soc/fsl/qbman/bman_ccsr.c 
b/drivers/soc/fsl/qbman/bman_ccsr.c
index eaa9585..05c4223 100644
--- a/drivers/soc/fsl/qbman/bman_ccsr.c
+++ b/drivers/soc/fsl/qbman/bman_ccsr.c
@@ -201,6 +201,21 @@ static int fsl_bman_probe(struct platform_device *pdev)
return -ENODEV;
}
 
+   /*
+* If FBPR memory wasn't defined using the qbman compatible string
+* try using the of_reserved_mem_device method
+*/
+   if (!fbpr_a) {
+   ret = qbman_init_private_mem(dev, 0, _a, _sz);
+   if (ret) {
+   dev_err(dev, "qbman_init_private_mem() failed 0x%x\n",
+   ret);
+   return -ENODEV;
+   }
+   }
+
+   dev_dbg(dev, "Allocated FBPR 0x%llx 0x%zx\n", fbpr_a, fbpr_sz);
+
bm_set_memory(fbpr_a, fbpr_sz);
 
err_irq = platform_get_irq(pdev, 0);
-- 
2.7.4

[v5 03/12] soc/fsl/qbman: Use shared-dma-pool for QMan private memory allocations

2017-09-18 Thread Roy Pledge

Use the shared-memory-pool mechanism for frame queue descriptor and
packed frame descriptor record area allocations.

Signed-off-by: Roy Pledge 
---
 drivers/soc/fsl/qbman/qman_ccsr.c | 93 ++-
 drivers/soc/fsl/qbman/qman_priv.h |  2 -
 drivers/soc/fsl/qbman/qman_test.h |  2 -
 3 files changed, 63 insertions(+), 34 deletions(-)

diff --git a/drivers/soc/fsl/qbman/qman_ccsr.c 
b/drivers/soc/fsl/qbman/qman_ccsr.c
index 835ce94..607355b9 100644
--- a/drivers/soc/fsl/qbman/qman_ccsr.c
+++ b/drivers/soc/fsl/qbman/qman_ccsr.c
@@ -401,21 +401,42 @@ static int qm_init_pfdr(struct device *dev, u32 
pfdr_start, u32 num)
 }
 
 /*
- * Ideally we would use the DMA API to turn rmem->base into a DMA address
- * (especially if iommu translations ever get involved).  Unfortunately, the
- * DMA API currently does not allow mapping anything that is not backed with
- * a struct page.
+ * QMan needs two global memory areas initialized at boot time:
+ *  1) FQD: Frame Queue Descriptors used to manage frame queues
+ *  2) PFDR: Packed Frame Queue Descriptor Records used to store frames
+ * Both areas are reserved using the device tree reserved memory framework
+ * and the addresses and sizes are initialized when the QMan device is probed
  */
 static dma_addr_t fqd_a, pfdr_a;
 static size_t fqd_sz, pfdr_sz;
 
+#ifdef CONFIG_PPC
+/*
+ * Support for PPC Device Tree backward compatibility when compatible
+ * string is set to fsl-qman-fqd and fsl-qman-pfdr
+ */
+static int zero_priv_mem(phys_addr_t addr, size_t sz)
+{
+   /* map as cacheable, non-guarded */
+   void __iomem *tmpp = ioremap_prot(addr, sz, 0);
+
+   if (!tmpp)
+   return -ENOMEM;
+
+   memset_io(tmpp, 0, sz);
+   flush_dcache_range((unsigned long)tmpp,
+  (unsigned long)tmpp + sz);
+   iounmap(tmpp);
+
+   return 0;
+}
+
 static int qman_fqd(struct reserved_mem *rmem)
 {
fqd_a = rmem->base;
fqd_sz = rmem->size;
 
WARN_ON(!(fqd_a && fqd_sz));
-
return 0;
 }
 RESERVEDMEM_OF_DECLARE(qman_fqd, "fsl,qman-fqd", qman_fqd);
@@ -431,32 +452,13 @@ static int qman_pfdr(struct reserved_mem *rmem)
 }
 RESERVEDMEM_OF_DECLARE(qman_pfdr, "fsl,qman-pfdr", qman_pfdr);
 
+#endif
+
 static unsigned int qm_get_fqid_maxcnt(void)
 {
return fqd_sz / 64;
 }
 
-/*
- * Flush this memory range from data cache so that QMAN originated
- * transactions for this memory region could be marked non-coherent.
- */
-static int zero_priv_mem(struct device *dev, struct device_node *node,
-phys_addr_t addr, size_t sz)
-{
-   /* map as cacheable, non-guarded */
-   void __iomem *tmpp = ioremap_prot(addr, sz, 0);
-
-   if (!tmpp)
-   return -ENOMEM;
-
-   memset_io(tmpp, 0, sz);
-   flush_dcache_range((unsigned long)tmpp,
-  (unsigned long)tmpp + sz);
-   iounmap(tmpp);
-
-   return 0;
-}
-
 static void log_edata_bits(struct device *dev, u32 bit_count)
 {
u32 i, j, mask = 0x;
@@ -727,10 +729,41 @@ static int fsl_qman_probe(struct platform_device *pdev)
qm_channel_caam = QMAN_CHANNEL_CAAM_REV3;
}
 
-   ret = zero_priv_mem(dev, node, fqd_a, fqd_sz);
-   WARN_ON(ret);
-   if (ret)
-   return -ENODEV;
+   if (fqd_a) {
+#ifdef CONFIG_PPC
+   /*
+* For PPC backward DT compatibility
+* FQD memory MUST be zero'd by software
+*/
+   zero_priv_mem(fqd_a, fqd_sz);
+#else
+   WARN(1, "Unexpected architecture using non shared-dma-mem 
reservations");
+#endif
+   } else {
+   /*
+* Order of memory regions is assumed as FQD followed by PFDR
+* in order to ensure allocations from the correct regions the
+* driver initializes then allocates each piece in order
+*/
+   ret = qbman_init_private_mem(dev, 0, _a, _sz);
+   if (ret) {
+   dev_err(dev, "qbman_init_private_mem() for FQD failed 
0x%x\n",
+   ret);
+   return -ENODEV;
+   }
+   }
+   dev_dbg(dev, "Allocated FQD 0x%llx 0x%zx\n", fqd_a, fqd_sz);
+
+   if (!pfdr_a) {
+   /* Setup PFDR memory */
+   ret = qbman_init_private_mem(dev, 1, _a, _sz);
+   if (ret) {
+   dev_err(dev, "qbman_init_private_mem() for PFDR failed 
0x%x\n",
+   ret);
+   return -ENODEV;
+   }
+   }
+   dev_dbg(dev, "Allocated PFDR 0x%llx 0x%zx\n", pfdr_a, pfdr_sz);
 
ret = qman_init_ccsr(dev);
if (ret) {
diff --git a/drivers/soc/fsl/qbman/qman_priv.h 
b/drivers/soc/fsl/qbman/qman_priv.h
index 5fe9faf..b1e2cbf 100644
--- a/drivers/soc/fsl/qbman/qman_priv.h
+++

[PATCH v1 3/3] powerpc/kernel: Separate SR-IOV Calls

2017-09-18 Thread Bryant G. Ly

SR-IOV can now be enabled in PowerNV platforms and Pseries
platforms. Therefore, the appropriate calls were moved to
machine dependent code instead of definition at compile time.

Signed-off-by: Bryant G. Ly 
Signed-off-by: Juan J. Alvarez 
---
 arch/powerpc/include/asm/machdep.h   |  7 ++
 arch/powerpc/include/asm/pci-bridge.h|  4 +---
 arch/powerpc/kernel/eeh_driver.c |  4 ++--
 arch/powerpc/kernel/pci-common.c | 23 +++
 arch/powerpc/kernel/pci_dn.c |  6 -
 arch/powerpc/platforms/powernv/eeh-powernv.c | 34 +++-
 arch/powerpc/platforms/powernv/pci-ioda.c|  6 +++--
 7 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 73b92017b6d7..20f68d36af8c 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -172,11 +172,18 @@ struct machdep_calls {
/* Called after scan and before resource survey */
void (*pcibios_fixup_phb)(struct pci_controller *hose);
 
+   /* Called after device has been added to bus and
+* before sysfs has been created
+*/
+   void (*pcibios_bus_add_device)(struct pci_dev *pdev);
+
resource_size_t (*pcibios_default_alignment)(void);
 
 #ifdef CONFIG_PCI_IOV
void (*pcibios_fixup_sriov)(struct pci_dev *pdev);
resource_size_t (*pcibios_iov_resource_alignment)(struct pci_dev *, int 
resno);
+   int (*pcibios_sriov_enable)(struct pci_dev *pdev, u16 num_vfs);
+   int (*pcibios_sriov_disable)(struct pci_dev *pdev);
 #endif /* CONFIG_PCI_IOV */
 
/* Called to shutdown machine specific hardware not already controlled
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 0b8aa1fe2d5f..323628ca4d6d 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -203,10 +203,9 @@ struct pci_dn {
struct eeh_dev *edev;   /* eeh device */
 #endif
 #define IODA_INVALID_PE0x
-#ifdef CONFIG_PPC_POWERNV
unsigned int pe_number;
-   int vf_index;   /* VF index in the PF */
 #ifdef CONFIG_PCI_IOV
+   int vf_index;   /* VF index in the PF */
u16 vfs_expanded;   /* number of VFs IOV BAR expanded */
u16 num_vfs;/* number of VFs enabled*/
unsigned int *pe_num_map;   /* PE# for the first VF PE or array */
@@ -215,7 +214,6 @@ struct pci_dn {
int (*m64_map)[PCI_SRIOV_NUM_BARS];
 #endif /* CONFIG_PCI_IOV */
int mps;/* Maximum Payload Size */
-#endif
struct list_head child_list;
struct list_head list;
 };
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 8b840191df59..f2d1b369974d 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -440,7 +440,7 @@ static void *eeh_add_virt_device(void *data, void *userdata)
return NULL;
}
 
-#ifdef CONFIG_PPC_POWERNV
+#ifdef CONFIG_PCI_IOV
pci_iov_add_virtfn(edev->physfn, pdn->vf_index, 0);
 #endif
return NULL;
@@ -496,7 +496,7 @@ static void *eeh_rmv_device(void *data, void *userdata)
(*removed)++;
 
if (edev->physfn) {
-#ifdef CONFIG_PPC_POWERNV
+#ifdef CONFIG_PCI_IOV
struct pci_dn *pdn = eeh_dev_to_pdn(edev);
 
pci_iov_remove_virtfn(edev->physfn, pdn->vf_index, 0);
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 02831a396419..d45b956d2e3a 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -249,8 +249,31 @@ resource_size_t pcibios_iov_resource_alignment(struct 
pci_dev *pdev, int resno)
 
return pci_iov_resource_size(pdev, resno);
 }
+
+int pcibios_sriov_enable(struct pci_dev *pdev, u16 num_vfs)
+{
+   if (ppc_md.pcibios_sriov_enable)
+   return ppc_md.pcibios_sriov_enable(pdev, num_vfs);
+
+   return 0;
+}
+
+int pcibios_sriov_disable(struct pci_dev *pdev)
+{
+   if (ppc_md.pcibios_sriov_disable)
+   return ppc_md.pcibios_sriov_disable(pdev);
+
+   return 0;
+}
+
 #endif /* CONFIG_PCI_IOV */
 
+void pcibios_bus_add_device(struct pci_dev *pdev)
+{
+   if (ppc_md.pcibios_bus_add_device)
+   ppc_md.pcibios_bus_add_device(pdev);
+}
+
 static resource_size_t pcibios_io_size(const struct pci_controller *hose)
 {
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 0e395afbf0f4..ab147a1909c8 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -156,10 +156,8 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn 
*parent,
pdn->parent = parent;

[PATCH v1 2/3] pseries: Override pci_bus_match_virtfn_driver

2017-09-18 Thread Bryant G. Ly

For Powervm SR-IOV (Pseries) enablement we dont want to match
the virtual function's device drivers since firmware
plans to load the device node in the device tree
dynamically when Novalink assigns the VF to a partition.

Signed-off-by: Bryant G. Ly 
Signed-off-by: Juan J. Alvarez 
---
 arch/powerpc/platforms/pseries/pci.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/pci.c 
b/arch/powerpc/platforms/pseries/pci.c
index 09eba5a9929a..15d5145a622d 100644
--- a/arch/powerpc/platforms/pseries/pci.c
+++ b/arch/powerpc/platforms/pseries/pci.c
@@ -58,6 +58,23 @@ void pcibios_name_device(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_ANY_ID, PCI_ANY_ID, pcibios_name_device);
 #endif
 
+#ifdef CONFIG_PCI_IOV
+void pci_bus_match_virtfn_driver(struct pci_dev *dev)
+{
+   /*
+* Per PSeries SR-IOV requirement there is no need to
+* match virtual function device driver as firmware
+* will load the device node in the device tree dynamically.
+* Since there is no matching of device driver there is
+* no failure when attaching driver, therefore there is no
+* need to remove sysfs file. Furthermore, the VF platform
+* management still needs to exist in sysfs files to be used
+* by management.
+*/
+   dev->is_added = 1;
+}
+#endif
+
 static void __init pSeries_request_regions(void)
 {
if (!isa_io_base)
-- 
2.11.0 (Apple Git-81)

[PATCH v1 1/3] powerpc/kernel: Split up pci_bus_add_device

2017-09-18 Thread Bryant G. Ly

When enabling SR-IOV one might want to have their
own version of starting device drivers for the VFs.
This patch allows for SR-IOV callers to use
pci_bus_add_virtfn_device instead of generic
pci_bus_add_device.

When enabling SR-IOV in PSeries architecture the
dynamic VFs created within the sriov_configure sysfs call
will not load the device driver as firmware will load
the device node when the VF device is assigned to the
logical partition. So we needed a way to overwrite the
way device driver matching is done for virtual functions
on powervm.

Signed-off-by: Bryant G. Ly 
Signed-off-by: Juan J. Alvarez 
---
 drivers/pci/bus.c   | 51 ++-
 drivers/pci/iov.c   |  2 +-
 include/linux/pci.h |  3 +++
 3 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index bc56cf19afd3..86daf62c4048 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -302,16 +302,9 @@ void __weak pcibios_resource_survey_bus(struct pci_bus 
*bus) { }
 
 void __weak pcibios_bus_add_device(struct pci_dev *pdev) { }
 
-/**
- * pci_bus_add_device - start driver for a single device
- * @dev: device to add
- *
- * This adds add sysfs entries and start device drivers
- */
-void pci_bus_add_device(struct pci_dev *dev)
-{
-   int retval;
 
+void pci_bus_add_sysfs_entries(struct pci_dev *dev)
+{
/*
 * Can not put in pci_device_add yet because resources
 * are not assigned yet for some devices.
@@ -321,6 +314,11 @@ void pci_bus_add_device(struct pci_dev *dev)
pci_create_sysfs_dev_files(dev);
pci_proc_attach_device(dev);
pci_bridge_d3_update(dev);
+}
+
+void pci_bus_match_device_driver(struct pci_dev *dev)
+{
+   int retval;
 
dev->match_driver = true;
retval = device_attach(>dev);
@@ -333,6 +331,41 @@ void pci_bus_add_device(struct pci_dev *dev)
 
dev->is_added = 1;
 }
+
+#ifdef CONFIG_PCI_IOV
+void __weak pci_bus_match_virtfn_driver(struct pci_dev *dev)
+{
+   pci_bus_match_device_driver(dev);
+}
+
+/**
+ * pci_bus_add_virtfn_device - start driver for a virtual function device
+ * @dev: device to add
+ *
+ * This adds add sysfs entries and start device drivers for
+ * virtual function devices
+ *
+ */
+void pci_bus_add_virtfn_device(struct pci_dev *pdev)
+{
+   pci_bus_add_sysfs_entries(pdev);
+   pci_bus_match_virtfn_driver(pdev);
+}
+EXPORT_SYMBOL_GPL(pci_bus_add_virtfn_device);
+#endif
+
+/**
+ * pci_bus_add_device - start driver for a single device
+ * @dev: device to add
+ *
+ * This adds add sysfs entries and start device drivers
+ */
+void pci_bus_add_device(struct pci_dev *dev)
+{
+   pci_bus_add_sysfs_entries(dev);
+   pci_bus_match_device_driver(dev);
+}
+
 EXPORT_SYMBOL_GPL(pci_bus_add_device);
 
 /**
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ac41c8be9200..16cc72545847 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -162,7 +162,7 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int 
reset)
 
pci_device_add(virtfn, virtfn->bus);
 
-   pci_bus_add_device(virtfn);
+   pci_bus_add_virtfn_device(virtfn);
sprintf(buf, "virtfn%u", id);
rc = sysfs_create_link(>dev.kobj, >dev.kobj, buf);
if (rc)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f68c58a93dd0..39f5c0b4bf23 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -911,6 +911,9 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, 
int devfn);
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
 unsigned int pci_scan_child_bus(struct pci_bus *bus);
 void pci_bus_add_device(struct pci_dev *dev);
+#ifdef CONFIG_PCI_IOV
+void pci_bus_add_virtfn_device(struct pci_dev *dev);
+#endif
 void pci_read_bridge_bases(struct pci_bus *child);
 struct resource *pci_find_parent_resource(const struct pci_dev *dev,
  struct resource *res);
-- 
2.11.0 (Apple Git-81)

[PATCH v1 0/3] Prepartion for SR-IOV PowerVM Enablement

2017-09-18 Thread Bryant G. Ly

This patch series is to prepare for enabling SR-IOV
on pseries. It separates the calls to be machine dependent
and does not change any current functionality.

Bryant G. Ly (3):
  powerpc/kernel: Split up pci_bus_add_device
  pseries: Override pci_bus_match_virtfn_driver
  powerpc/kernel: Separate SR-IOV Calls

 arch/powerpc/include/asm/machdep.h   |  7 
 arch/powerpc/include/asm/pci-bridge.h|  4 +--
 arch/powerpc/kernel/eeh_driver.c |  4 +--
 arch/powerpc/kernel/pci-common.c | 23 +
 arch/powerpc/kernel/pci_dn.c |  6 
 arch/powerpc/platforms/powernv/eeh-powernv.c | 34 ++-
 arch/powerpc/platforms/powernv/pci-ioda.c|  6 ++--
 arch/powerpc/platforms/pseries/pci.c | 17 ++
 drivers/pci/bus.c| 51 +++-
 drivers/pci/iov.c|  2 +-
 include/linux/pci.h  |  3 ++
 11 files changed, 118 insertions(+), 39 deletions(-)

-- 
2.11.0 (Apple Git-81)

Re: [v4 07/11] soc/fsl/qbman: Rework portal mapping calls for ARM/PPC

2017-09-18 Thread Roy Pledge

On 9/15/2017 5:49 PM, Catalin Marinas wrote:
> On Thu, Sep 14, 2017 at 07:07:50PM +, Roy Pledge wrote:
>> On 9/14/2017 10:00 AM, Catalin Marinas wrote:
>>> On Thu, Aug 24, 2017 at 04:37:51PM -0400, Roy Pledge wrote:
 @@ -123,23 +122,34 @@ static int bman_portal_probe(struct platform_device 
 *pdev)
}
pcfg->irq = irq;

 -  va = ioremap_prot(addr_phys[0]->start, resource_size(addr_phys[0]), 0);
 -  if (!va) {
 -  dev_err(dev, "ioremap::CE failed\n");
 +  /*
 +   * TODO: Ultimately we would like to use a cacheable/non-shareable
 +   * (coherent) mapping for the portal on both architectures but that
 +   * isn't currently available in the kernel.  Because of HW differences
 +   * PPC needs to be mapped cacheable while ARM SoCs will work with non
 +   * cacheable mappings
 +   */
>>>
>>> This comment mentions "cacheable/non-shareable (coherent)". Was this
>>> meant for ARM platforms? Because non-shareable is not coherent, nor is
>>> this combination guaranteed to work with different CPUs and
>>> interconnects.
>>
>> My wording is poor I should have been clearer that non-shareable ==
>> non-coherent.  I will fix this.
>>
>> We do understand that cacheable/non shareable isn't supported on all
>> CPU/interconnect combinations but we have verified with ARM that for the
>> CPU/interconnects we have integrated QBMan on our use is OK. The note is
>> here to try to explain why the mapping is different right now. Once we
>> get the basic QBMan support integrated for ARM we do plan to try to have
>> patches integrated that enable the cacheable mapping as it gives a
>> significant performance boost.
> 
> I will definitely not ack those patches (at least not in the form I've
> seen, assuming certain eviction order of the bytes in a cacheline). The
> reason is that it is incredibly fragile, highly dependent on the CPU
> microarchitecture and interconnects. Assuming that you ever only have a
> single SoC with this device, you may get away with #ifdefs in the
> driver. But if you support two or more SoCs with different behaviours,
> you'd have to make run-time decisions in the driver or run-time code
> patching. We are very keen on single kernel binary image/drivers and
> architecturally compliant code (the cacheable mapping hacks are well
> outside the architecture behaviour).
> 

Let's put this particular point on hold for now, I would like to focus 
on getting the basic functions merged in ASAP. I removed the comment in 
question (it sort of happened naturally when I applied your other 
comments) in the next revision of the patchset.  I have submitted the 
patches to our automated test system for sanity checking and I will sent 
a new patchset once I get the results.

Thanks again for your comments - they have been very useful and have 
improved the quality of the code for sure.

 diff --git a/drivers/soc/fsl/qbman/dpaa_sys.h 
 b/drivers/soc/fsl/qbman/dpaa_sys.h
 index 81a9a5e..0a1d573 100644
 --- a/drivers/soc/fsl/qbman/dpaa_sys.h
 +++ b/drivers/soc/fsl/qbman/dpaa_sys.h
 @@ -51,12 +51,12 @@

static inline void dpaa_flush(void *p)
{
 +  /*
 +   * Only PPC needs to flush the cache currently - on ARM the mapping
 +   * is non cacheable
 +   */
#ifdef CONFIG_PPC
flush_dcache_range((unsigned long)p, (unsigned long)p+64);
 -#elif defined(CONFIG_ARM)
 -  __cpuc_flush_dcache_area(p, 64);
 -#elif defined(CONFIG_ARM64)
 -  __flush_dcache_area(p, 64);
#endif
}
>>>
>>> Dropping the private API cache maintenance is fine and the memory is WC
>>> now for ARM (mapping to Normal NonCacheable). However, do you require
>>> any barriers here? Normal NC doesn't guarantee any ordering.
>>
>> The barrier is done in the code where the command is formed. We follow
>> this pattern
>> a) Zero the command cache line (the device never reacts to a 0 command
>> verb so a cast out of this will have no effect)
>> b) Fill in everything in the command except the command verb (byte 0)
>> c) Execute a memory barrier
>> d) Set the command verb (byte 0)
>> e) Flush the command
>> If a castout happens between d) and e) doesn't matter since it was about
>> to be flushed anyway .  Any castout before d) will not cause HW to
>> process the command because verb is still 0. The barrier at c) prevents
>> reordering so the HW cannot see the verb set before the command is formed.
> 
> I think that's fine, the dpaa_flush() can be a no-op with non-cacheable
> memory (I had forgotten the details).
>

Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-18 Thread Helge Deller

* Luck, Tony :
> On Sat, Sep 16, 2017 at 12:53:42PM +0900, Sergey Senozhatsky wrote:
> > Hello
> > 
> > RFC
> > 
> > On some arches C function pointers are indirect and point to
> > a function descriptor, which contains the actual pointer to the code.
> > This mostly doesn't matter, except for cases when people want to print
> > out function pointers in symbolic format, because the usual '%pS/%ps'
> > does not work on those arches as expected. That's the reason why we
> > have '%pF/%pf', but since it's here because of a subtle ABI detail
> > specific to some arches (ppc64/ia64/parisc64) it's easy to misuse
> > '%pF/%pf' and '%pS/%ps' (see [1], for example).
> 
> A few new warnings when building on ia64:
> 
> arch/ia64/kernel/module.c:931: warning: passing argument 1 of 
> 'dereference_function_descriptor' makes pointer from integer without a cast
> arch/ia64/kernel/module.c:931: warning: return makes integer from pointer 
> without a cast
> kernel/kallsyms.c:325: warning: assignment makes integer from pointer without 
> a cast
> kernel/kallsyms.c:325: warning: passing argument 1 of 
> 'dereference_kernel_function_descriptor' makes pointer from integer without a 
> cast


I got similiar warnings on parisc.
This patch on top of yours fixed those:

diff --git a/arch/parisc/kernel/module.c b/arch/parisc/kernel/module.c
index bc2eae8..4f34b46 100644
--- a/arch/parisc/kernel/module.c
+++ b/arch/parisc/kernel/module.c
@@ -66,6 +66,7 @@
 
 #include 
 #include 
+#include 
 
 #if 0
 #define DEBUGP printk
@@ -959,12 +960,12 @@ void module_arch_cleanup(struct module *mod)
 unsigned long dereference_module_function_descriptor(struct module *mod,
 unsigned long addr)
 {
-   void *opd_sz = mod->arch.fdesc_offset +
+   unsigned long opd_sz = mod->arch.fdesc_offset +
   mod->arch.fdesc_max * sizeof(Elf64_Fdesc);
 
if (addr < mod->arch.fdesc_offset || opd_sz < addr)
return addr;
 
-   return dereference_function_descriptor(addr);
+   return (unsigned long) dereference_function_descriptor((void *) addr);
 }
 #endif
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index e2fc09e..76f4de6 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -322,7 +322,7 @@ const char *kallsyms_lookup(unsigned long addr,
if (is_ksym_addr(addr)) {
unsigned long pos;
 
-   addr = dereference_kernel_function_descriptor(addr);
+   addr = dereference_kernel_function_descriptor((void *) addr);
pos = get_symbol_pos(addr, symbolsize, offset);
/* Grab name */
kallsyms_expand_symbol(get_symbol_offset(pos),


I did tried your testcases too.

"echo 1 > /proc/sys/vm/drop_caches" gave correct output:
 printk#1 schedule_timeout+0x0/0x4a8
 printk#2 schedule_timeout+0x0/0x4a8
 printk#3 proc_sys_call_handler+0x120/0x180
 printk#4 proc_sys_call_handler+0x120/0x180
 printk#5 proc_sys_call_handler+0x120/0x180
 printk#6 proc_sys_call_handler+0x120/0x180

and here is "modprobe zram":
 printk#7 __UNIQUE_ID_vermagic8+0xb9a4/0xbd04 [zram]
 printk#8 __UNIQUE_ID_vermagic8+0xb9a4/0xbd04 [zram]
 printk#9 do_one_initcall+0x194/0x290
 printk#10 do_one_initcall+0x194/0x290
 printk#11 do_one_initcall+0x194/0x290
 printk#12 do_one_initcall+0x194/0x290
 printk#13 zram_init+0x22c/0x2a0 [zram]
 printk#14 zram_init+0x22c/0x2a0 [zram]
 printk#15 zram_init+0x22c/0x2a0 [zram]
 printk#16 zram_init+0x22c/0x2a0 [zram]

I wonder why printk#7 and printk#8 don't show "zram_init"...


Regarding your patches:

In arch/parisc/kernel/process.c:
+void *dereference_kernel_function_descriptor(void *ptr)
+{
+   if (ptr < (void *)__start_opd || (void *)__end_opd < ptr)

This needs to be (__end_opd is outside):
+   if (ptr < (void *)__start_opd || (void *)__end_opd <= ptr)

The same is true for the checks in the other arches.


I'd suggest to move the various
extern char __start_opd[], __end_opd[];
out of arch//include/asm/sections.h and into 


I'll continue to test.

Helge

[PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug

2017-09-18 Thread Michael Bringmann

powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
it may occur that the new resources are to be inserted into nodes
that were not used for memory resources at bootup.  Many different
configurations of PowerPC resources may need to be supported depending
upon the environment.  This patch fixes some problems encountered at
runtime with configurations that support memory-less nodes, but which
allow CPUs to be added at and after boot.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/numa.c |   17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b385cd0..e811dd1 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -551,7 +551,7 @@ static int numa_setup_cpu(unsigned long lcpu)
nid = of_node_to_nid_single(cpu);
 
 out_present:
-   if (nid < 0 || !node_online(nid))
+   if (nid < 0 || !node_possible(nid))
nid = first_online_node;
 
map_cpu_to_node(lcpu, nid);
@@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
return rc;
 }
 
+static int verify_node_preparation(int nid)
+{
+   if ((NODE_DATA(nid) == NULL) ||
+   (NODE_DATA(nid)->node_spanned_pages == 0)) {
+   if (try_online_node(nid))
+   return first_online_node;
+   }
+
+   return nid;
+}
+
 /*
  * Update the CPU maps and sysfs entries for a single CPU when its NUMA
  * characteristics change. This function doesn't perform any locking and is
@@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
/* Use associativity from first thread for all siblings */
vphn_get_associativity(cpu, associativity);
new_nid = associativity_to_nid(associativity);
-   if (new_nid < 0 || !node_online(new_nid))
+   if (new_nid < 0 || !node_possible(new_nid))
new_nid = first_online_node;
 
+   new_nid = verify_node_preparation(new_nid);
+
if (new_nid == numa_cpu_lookup_table[cpu]) {
cpumask_andnot(_associativity_changes_mask,
_associativity_changes_mask,

[PATCH 1/2] powerpc/nodes: Ensure enough nodes avail for operations

2017-09-18 Thread Michael Bringmann

powerpc/nodes: On systems like PowerPC which allow 'hot-add' of CPU
or memory resources, it may occur that the new resources are to be
inserted into nodes that were not used for these resources at bootup.
In the kernel, any node that is used must be defined and initialized
at boot.

This patch extracts the value of the lowest domain level (number of
allocable resources) from the "rtas" device tree property
"ibm,current-associativity-domains" or the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.  This new setting will
override the instruction,

nodes_and(node_possible_map, node_possible_map, node_online_map);

presently seen in the function arch/powerpc/mm/numa.c:initmem_init().

If the property is not present at boot, no operation will be performed
to define or enable additional nodes.

Signed-off-by: Michael Bringmann 
---
 arch/powerpc/mm/numa.c |   47 +++
 1 file changed, 47 insertions(+)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index ec098b3..b385cd0 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -892,6 +892,51 @@ static void __init setup_node_data(int nid, u64 start_pfn, 
u64 end_pfn)
NODE_DATA(nid)->node_spanned_pages = spanned_pages;
 }
 
+static void __init node_associativity_setup(void)
+{
+   struct device_node *rtas;
+
+   rtas = of_find_node_by_path("/rtas");
+   if (rtas) {
+   const __be32 *prop;
+   u32 len, entries, numnodes, i;
+
+   prop = of_get_property(rtas,
+   "ibm,current-associativity-domains", 
);
+   if (!prop || len < sizeof(unsigned int)) {
+   prop = of_get_property(rtas,
+   "ibm,max-associativity-domains", );
+   goto endit;
+   }
+
+   entries = of_read_number(prop++, 1);
+
+   if (len < (entries * sizeof(unsigned int)))
+   goto endit;
+
+   if ((0 <= min_common_depth) && (min_common_depth <= 
(entries-1)))
+   entries = min_common_depth;
+   else
+   entries -= 1;
+
+   numnodes = of_read_number([entries], 1);
+
+   printk(KERN_INFO "numa: Nodes = %d (mcd = %d)\n", numnodes,
+   min_common_depth);
+
+   for (i = 0; i < numnodes; i++) {
+   if (!node_possible(i)) {
+   setup_node_data(i, 0, 0);
+   node_set(i, node_possible_map);
+   }
+   }
+   }
+
+endit:
+   if (rtas)
+   of_node_put(rtas);
+}
+
 void __init initmem_init(void)
 {
int nid, cpu;
@@ -911,6 +956,8 @@ void __init initmem_init(void)
 */
nodes_and(node_possible_map, node_possible_map, node_online_map);
 
+   node_associativity_setup();
+
for_each_online_node(nid) {
unsigned long start_pfn, end_pfn;

[PATCH 0/2] powerpc/nodes/hotplug: Fix problem with memoryless nodes

2017-09-18 Thread Michael Bringmann

powerpc/nodes: Ensure enough nodes avail for operations.
On systems like PowerPC which allow 'hot-add' of CPU or memory
resources, it may occur that the new resources are to be inserted
into nodes that were not used for these resources at bootup.  In
the kernel, any node that is used must be defined and initialized
at boot.  This patch extracts the value of the lowest domain level
(number of allocable resources) from the "rtas" device tree property
"ibm,current-associativity-domains" or the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system.

powerpc/hotplug: Fix CPU-only node bringup bug
On systems like PowerPC which allow 'hot-add' of CPU, it may occur
that the new resources are to be inserted into nodes that were not
used for memory resources at bootup.  Many different configurations
of PowerPC resources may need to be supported depending upon the
environment.  This patch fixes some problems encountered at runtime
with configurations that support memory-less nodes, but which allow
CPUs to be added at and after boot.

Signed-off-by: Michael Bringmann 

Michael Bringmann (2):
  powerpc/vphn: Ensure enough nodes avail for operations
  powerpc/hotplug: Fix CPU-only node bringup bug

Re: [PATCH 0/5] [RFC] printk/ia64/ppc64/parisc64: let's deprecate %pF/%pf printk specifiers

2017-09-18 Thread Luck, Tony

On Sat, Sep 16, 2017 at 12:53:42PM +0900, Sergey Senozhatsky wrote:
>   Hello
> 
>   RFC
> 
>   On some arches C function pointers are indirect and point to
> a function descriptor, which contains the actual pointer to the code.
> This mostly doesn't matter, except for cases when people want to print
> out function pointers in symbolic format, because the usual '%pS/%ps'
> does not work on those arches as expected. That's the reason why we
> have '%pF/%pf', but since it's here because of a subtle ABI detail
> specific to some arches (ppc64/ia64/parisc64) it's easy to misuse
> '%pF/%pf' and '%pS/%ps' (see [1], for example).

A few new warnings when building on ia64:

arch/ia64/kernel/module.c:931: warning: passing argument 1 of 
'dereference_function_descriptor' makes pointer from integer without a cast
arch/ia64/kernel/module.c:931: warning: return makes integer from pointer 
without a cast
kernel/kallsyms.c:325: warning: assignment makes integer from pointer without a 
cast
kernel/kallsyms.c:325: warning: passing argument 1 of 
'dereference_kernel_function_descriptor' makes pointer from integer without a 
cast

Tried out the module case with a simple Hello-world test case.
This code:

char buf[1];

int init_module(void)
{
printk(KERN_INFO "Hello world 1.\n");

printk("using %%p  my init_module is at %p\n", init_module);
printk("using %%pF my init_module is at %pF\n", init_module);
printk("using %%pS my init_module is at %pS\n", init_module);

printk("using %%p  my buf is at %p\n", buf);
printk("using %%pF my buf is at %pF\n", buf);
printk("using %%pS my buf is at %pS\n", buf);

return 0;
}

Gave this console output:

Hello world 1.
using %p  my init_module is at a00203bf0328
using %pF my init_module is at init_module+0x0/0x140 [hello_1]
using %pS my init_module is at init_module+0x0/0x140 [hello_1]
using %p  my buf is at a00203bf0648
using %pF my buf is at buf+0x0/0xfb58 [hello_1]
using %pS my buf is at buf+0x0/0xfb58 [hello_1]


Which looks like what you wanted. People unaware of the vagaries
of ppc64/ia64/parisc64 can use the wrong %p[SF] variant, but still
get the right output.

-Tony

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Al Viro

On Mon, Sep 18, 2017 at 05:39:47PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> > If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> > the time running xfstests as well.
> 
> Dave insisted on it to decourage users/applications from mixing
> mmap and direct I/O.
> 
> In many ways a tracepoint might be the better way to diagnose these.

sysctl suppressing those two, perhaps?

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Jens Axboe

On 09/18/2017 09:43 AM, Al Viro wrote:
> On Mon, Sep 18, 2017 at 05:39:47PM +0200, Christoph Hellwig wrote:
>> On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
>>> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
>>> the time running xfstests as well.
>>
>> Dave insisted on it to decourage users/applications from mixing
>> mmap and direct I/O.
>>
>> In many ways a tracepoint might be the better way to diagnose these.
> 
> sysctl suppressing those two, perhaps?

I'd rather just make it a trace point, but don't care too much.

The code doesn't even have a comment as to why that WARN_ON() is
there or expected. Seems pretty sloppy to me, not a great way
to "discourage" users to mix mmap/dio.

-- 
Jens Axboe

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Christoph Hellwig

On Mon, Sep 18, 2017 at 09:28:55AM -0600, Jens Axboe wrote:
> If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
> the time running xfstests as well.

Dave insisted on it to decourage users/applications from mixing
mmap and direct I/O.

In many ways a tracepoint might be the better way to diagnose these.

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Jens Axboe

On 09/18/2017 09:27 AM, Christoph Hellwig wrote:
> On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
>> Hi,
>>
>> A warning is triggered from:
>>
>> file fs/iomap.c in function iomap_dio_rw
>>
>> if (ret)
>> goto out_free_dio;
>>
>> ret = invalidate_inode_pages2_range(mapping,
>> start >> PAGE_SHIFT, end >> PAGE_SHIFT);
  WARN_ON_ONCE(ret);
>> ret = 0;
>>
>> inode_dio_begin(inode);
> 
> This is expected and an indication of a problematic workload - which
> may be triggered by a fuzzer.

If it's expected, why don't we kill the WARN_ON_ONCE()? I get it all
the time running xfstests as well.

-- 
Jens Axboe

Re: [linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Christoph Hellwig

On Mon, Sep 18, 2017 at 08:26:05PM +0530, Abdul Haleem wrote:
> Hi,
> 
> A warning is triggered from:
> 
> file fs/iomap.c in function iomap_dio_rw
> 
> if (ret)
> goto out_free_dio;
> 
> ret = invalidate_inode_pages2_range(mapping,
> start >> PAGE_SHIFT, end >> PAGE_SHIFT);
> >>  WARN_ON_ONCE(ret);
> ret = 0;
> 
> inode_dio_begin(inode);

This is expected and an indication of a problematic workload - which
may be triggered by a fuzzer.

[linux-next][XFS][trinity] WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993

2017-09-18 Thread Abdul Haleem

Hi,

A warning is triggered from:

file fs/iomap.c in function iomap_dio_rw

if (ret)
goto out_free_dio;

ret = invalidate_inode_pages2_range(mapping,
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
>>  WARN_ON_ONCE(ret);
ret = 0;

inode_dio_begin(inode);

Machine Type: Power 7 PowerVM LPAR
Kernel : 4.13.0-next-20170915
gcc: 4.8.5
Test: trinity fuzzer


dmesg:
[ cut here ]
WARNING: CPU: 32 PID: 31369 at fs/iomap.c:993 .iomap_dio_rw+0x470/0x480
Modules linked in: dlci(E) 8021q(E) garp(E) mrp(E) af_key(E)
ieee802154_socket(E) ieee802154(E) rpcrdma(E) ib_isert(E)
iscsi_target_mod(E) ib_iser(E) libiscsi(E) ib_srpt(E) target_core_mod(E)
ib_srp(E) hidp(E) ib_ipoib(E) cmtp(E) kernelcapi(E) rdma_ucm(E)
ib_ucm(E) bnep(E) ib_uverbs(E) rfcomm(E) bluetooth(E) ib_umad(E)
rdma_cm(E) ecdh_generic(E) rfkill(E) ib_cm(E) iw_cm(E) pptp(E) gre(E)
l2tp_ppp(E) l2tp_netlink(E) l2tp_core(E) ip6_udp_tunnel(E) udp_tunnel(E)
pppoe(E) pppox(E) ppp_generic(E) slhc(E) crypto_user(E) ib_core(E)
nfnetlink(E) scsi_transport_iscsi(E) atm(E) sctp(E) dccp_ipv4(E)
netlink_diag(E) dccp_diag(E) ip6table_filter(E) af_packet_diag(E)
unix_diag(E) tcp_diag(E) udp_diag(E) ebtable_filter(E) bridge(E) sg(E)
ibmveth(E) rpadlpar_io(E) loop(E) xt_CHECKSUM(E) iptable_mangle(E)
 ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E)
nf_nat_ipv4(E) nf_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E)
xt_conntrack(E) nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) tun(E)
stp(E) llc(E) rpaphp(E) kvm_pr(E) kvm(E) ebtables(E) ip6_tables(E)
dccp(E) inet_diag(E) iptable_filter(E) nfsd(E) auth_rpcgss(E) nfs_acl(E)
lockd(E) grace(E) sunrpc(E) binfmt_misc(E) ip_tables(E) ext4(E)
mbcache(E) jbd2(E) sd_mod(E) ibmvscsi(E) scsi_transport_srp(E) [last
unloaded: netlink_diag]
CPU: 32 PID: 31369 Comm: trinity-c133 Tainted: GE
4.13.0-next-20170915-autotest #1
task: c009f4149d80 task.stack: c0054692
NIP:  c03aac40 LR: c03aa9e8 CTR: 
REGS: c00546923630 TRAP: 0700   Tainted: GE
(4.13.0-next-20170915-autotest)
MSR:  8282b032   CR: 28004428
XER: 2000  
CFAR: c03aa9f0 SOFTE: 1 
GPR00: c03aa9e8 c005469238b0 c1453b00
fff0 
GPR04:   
 
GPR08:   
 
GPR12: 28004422 ce945000 
10030a00 
GPR16: 10030bc8  
 
GPR20:   c013ebcc16a0
0010 
GPR24: c0099ec0c2d0 c0a49168 c0099ec0c158
c00546923b70 
GPR28: c00546923d40 0014 
c013ebcc1680 
NIP [c03aac40] .iomap_dio_rw+0x470/0x480
LR [c03aa9e8] .iomap_dio_rw+0x218/0x480
Call Trace:
[c005469238b0] [c03aa9e8] .iomap_dio_rw+0x218/0x480
(unreliable)
[c005469239d0] [c04639c8] .xfs_file_dio_aio_read+0x88/0x160
[c00546923a70] [c0463f44] .xfs_file_read_iter+0x104/0x120
[c00546923b00] [c03248f0] .do_iter_readv_writev+0x190/0x1c0
[c00546923bb0] [c0325d90] .do_iter_read+0xf0/0x280
[c00546923c50] [c032858c] .vfs_readv+0x6c/0xa0
[c00546923d90] [c03287b8] .do_preadv+0xd8/0x120
[c00546923e30] [c000b184] system_call+0x58/0x6c
Instruction dump:
7c0af000 40c20010 7c60492d 40c2fff0 7c0004ac 4bfffe90 6000 6000 
3be0fdef 4bfffc0c 3be0fff4 4bfffc04 <0fe0> 4bfffdb0 
 
---[ end trace bd674540a2bf235b ]---


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre


#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.10.0-rc5 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
CONFIG_GENERIC_CPU=y
# CONFIG_CELL_CPU is not set
# CONFIG_POWER4_CPU is not set
# CONFIG_POWER5_CPU is not set
# CONFIG_POWER6_CPU is not set
# CONFIG_POWER7_CPU is not set
# CONFIG_POWER8_CPU is not set
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
CONFIG_PPC_ICSWX=y
# CONFIG_PPC_ICSWX_PID is not set
# CONFIG_PPC_ICSWX_USE_SIGILL is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2048
CONFIG_PPC_DOORBELL=y
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
# CONFIG_CPU_LITTLE_ENDIAN is not set
CONFIG_64BIT=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_MMU=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_ARCH_HAS_ILOG2_U64=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y

[PATCH] qe/ic: make qe_ic_irq_chip const and __initconst

2017-09-18 Thread Bhumika Goyal

Make this const as it is used only as a copy operation. This usage is
inside an __init function, so make it __initconst too.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal 
---
Cross-compiled for powerpc.

 drivers/soc/fsl/qe/qe_ic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qe/qe_ic.c b/drivers/soc/fsl/qe/qe_ic.c
index ec2ca86..781e0ce 100644
--- a/drivers/soc/fsl/qe/qe_ic.c
+++ b/drivers/soc/fsl/qe/qe_ic.c
@@ -238,7 +238,7 @@ static void qe_ic_mask_irq(struct irq_data *d)
raw_spin_unlock_irqrestore(_ic_lock, flags);
 }
 
-static struct irq_chip qe_ic_irq_chip = {
+static const struct irq_chip qe_ic_irq_chip __initconst = {
.name = "QEIC",
.irq_unmask = qe_ic_unmask_irq,
.irq_mask = qe_ic_mask_irq,
-- 
1.9.1

[PATCH] powerpc/xmon: Add option to show uptime information

2017-09-18 Thread Guilherme G. Piccoli

It might be useful to quickly get the uptime of a running
system on xmon, without needing to grab data from memory and
doing math on struct addresses.

For example, it'd be useful to check for how long after a crash
a system is on xmon shell or if some test was started after the
first test crashed (and this 2nd test crashed too into xmon).

This small patch adds the 'U' command, to accomplish this.

Suggested-by: Murilo Fossa Vicentini 
Signed-off-by: Guilherme G. Piccoli 
---

Patch written against mpe's powerpc/next branch.

 arch/powerpc/xmon/xmon.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 33351c6704b1..a12f89f4916b 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -278,6 +278,7 @@ Commands:\n\
 #elif defined(CONFIG_44x) || defined(CONFIG_PPC_BOOK3E)
 "  u   dump TLB\n"
 #endif
+"  U   show uptime information\n"
 "  ?   help\n"
 "  # n limit output to n lines per page (for dp, dpa, dl)\n"
 "  zr  reboot\n\
@@ -896,6 +897,24 @@ static void remove_cpu_bpts(void)
write_ciabr(0);
 }
 
+/* Based on uptime_proc_show(). */
+static void
+show_uptime(void)
+{
+   struct timespec uptime;
+
+   if (setjmp(bus_error_jmp) == 0) {
+   catch_memory_errors = 1;
+   sync();
+
+   get_monotonic_boottime();
+   printf("Uptime: %lu.%.2lu\n", (unsigned long) uptime.tv_sec,
+   ((unsigned long) uptime.tv_nsec / (NSEC_PER_SEC/100)));
+
+   }
+   catch_memory_errors = 0;
+}
+
 static void set_lpp_cmd(void)
 {
unsigned long lpp;
@@ -1031,6 +1050,9 @@ cmds(struct pt_regs *excp)
dump_tlb_book3e();
break;
 #endif
+   case 'U':
+   show_uptime();
+   break;
default:
printf("Unrecognized command: ");
do {
-- 
2.14.1

[PATCH] powerpc: make irq_chip const, initdata and initconst

2017-09-18 Thread Bhumika Goyal

Make ehv_pic_irq_chip, mpic_ipi_chip and mpic_tm_chip const as they are
used only as a copy operation. This usage is during init, so make them
__initconst too.
Make mpic_ipi_chip __initdata as it is only modified during the init
phase and there is no reference of it anywhere after init.

Signed-off-by: Bhumika Goyal 
---
 arch/powerpc/sysdev/ehv_pic.c | 2 +-
 arch/powerpc/sysdev/mpic.c| 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/sysdev/ehv_pic.c b/arch/powerpc/sysdev/ehv_pic.c
index 48866e6..12a54f3 100644
--- a/arch/powerpc/sysdev/ehv_pic.c
+++ b/arch/powerpc/sysdev/ehv_pic.c
@@ -141,7 +141,7 @@ int ehv_pic_set_irq_type(struct irq_data *d, unsigned int 
flow_type)
return IRQ_SET_MASK_OK_NOCOPY;
 }
 
-static struct irq_chip ehv_pic_irq_chip = {
+static const struct irq_chip ehv_pic_irq_chip __initconst = {
.irq_mask   = ehv_pic_mask_irq,
.irq_unmask = ehv_pic_unmask_irq,
.irq_eoi= ehv_pic_end_irq,
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index ead3e25..6a435c0 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -964,21 +964,21 @@ static void mpic_set_destination(unsigned int virq, 
unsigned int cpuid)
 };
 
 #ifdef CONFIG_SMP
-static struct irq_chip mpic_ipi_chip = {
+static const struct irq_chip mpic_ipi_chip __initconst = {
.irq_mask   = mpic_mask_ipi,
.irq_unmask = mpic_unmask_ipi,
.irq_eoi= mpic_end_ipi,
 };
 #endif /* CONFIG_SMP */
 
-static struct irq_chip mpic_tm_chip = {
+static struct irq_chip mpic_tm_chip __initdata = {
.irq_mask   = mpic_mask_tm,
.irq_unmask = mpic_unmask_tm,
.irq_eoi= mpic_end_irq,
 };
 
 #ifdef CONFIG_MPIC_U3_HT_IRQS
-static struct irq_chip mpic_irq_ht_chip = {
+static const struct irq_chip mpic_irq_ht_chip __initconst = {
.irq_startup= mpic_startup_ht_irq,
.irq_shutdown   = mpic_shutdown_ht_irq,
.irq_mask   = mpic_mask_irq,
-- 
1.9.1

Re: [linux-next][DLPAR CPU][Oops] Bad kernel stack pointer

2017-09-18 Thread Rob Herring

On Mon, Sep 18, 2017 at 5:08 AM, Abdul Haleem
 wrote:
> Hi,
>
> Dynamic CPU remove operation resulted in Kernel Panic on today's
> next-20170915 kernel.
>
> Machine Type: Power 7 PowerVM LPAR
> Kernel : 4.13.0-next-20170915

I assume this is not something new to 9/15 -next nor only in -next
because you also reported that 4.13.0 broke. Can you provide some
details on what version worked? 4.12?

Rob

Re: [PATCH] Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"

2017-09-18 Thread David Hildenbrand

On 16.09.2017 22:12, Jan H. Schönherr wrote:
> This reverts commit 36ae3c0a36b7456432fedce38ae2f7bd3e01a563.
> 
> The commit broke compilation on !CONFIG_HAVE_KVM_IRQ_ROUTING. Also,
> there may be cases with CONFIG_HAVE_KVM_IRQ_ROUTING, where larger
> gsi values make sense.
> 
> As the commit was meant as an early indicator to user space that
> something is wrong, reverting just restores the previous behavior
> where overly large values are ignored when encountered (without
> any direct feedback).
> 
> Reported-by: Abdul Haleem 
> Signed-off-by: Jan H. Schönherr 
> ---
>  virt/kvm/eventfd.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index c608ab4..f2ac53a 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -565,8 +565,6 @@ kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
>  {
>   if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_RESAMPLE))
>   return -EINVAL;
> - if (args->gsi >= KVM_MAX_IRQ_ROUTES)
> - return -EINVAL;
>  
>   if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
>   return kvm_irqfd_deassign(kvm, args);
> 

Makes sense and shouldn't do any harm as you also mentioned.

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David

[linux-next][DLPAR CPU][Oops] Bad kernel stack pointer

2017-09-18 Thread Abdul Haleem

Hi,

Dynamic CPU remove operation resulted in Kernel Panic on today's
next-20170915 kernel.

Machine Type: Power 7 PowerVM LPAR
Kernel : 4.13.0-next-20170915
config : attached
test: DLPAR CPU remove


dmesg logs:
--
cpu 37 (hwid 37) Ready to die...
cpu 38 (hwid 38) Ready to die...
cpu 39 (hwid 39)
*** RTAS CReady to die...
ALL BUFFER CORRUPTION ***
[  673.435910] Bad kernel stack pointer eec51c8 365: rtas32_callat
480010897c601_buff_ptr=
b78
 001E  0001  0002  0027 [...']
[  673.435938] Oops: Bad kernel stack pointer, sig: 6 [#1]
     0005  0001 []
[  673.435942] BE SMP NR_CPUS=20C000  0048 NUMA pSeries
01 AF3C  000
0 0001 1248 [...<...H]
C000  0032 25D0     [.2%.]
0001     0004  0100 []
C000  0150 0AA0 C000 0013  A210 [.P..]
 0800       [.P..]
[  673.435976] Dumping ftrace buffer:
366: rtas64_map_buff_ptr=
        []
       D8F1 []
        []
        []
        []
        []
        []
   (ftrace buffer empty)
Kernel panic - not syncing: Alas, I survived.

Modules linked in: xt_CHECKSUM(E) iptable_mangle(E) ipt_MASQUERADE(E)
nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E)
nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) nf_conntrack(E)
ipt_REJECT(E) nf_reject_ipv4(E) tun(E) bridge(E) stp(E) llc(E) kvm_pr(E)
kvm(E) rpadlpar_io(E) rpaphp(E) ebtable_filter(E) ebtables(E)
ip6table_filter(E) ip6_tables(E) dccp_diag(E) dccp(E) tcp_diag(E)
udp_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E)
iptable_filter(E) sg(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E)
sunrpc(E) grace(E) binfmt_misc(E) ip_tables(E) ext4(E) mbcache(E)
jbd2(E) sd_mod(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E)
Dumping ftrace buffer:
   (ftrace buffer empty)
CPU: 0 PID: 8633 Comm: drmgr Tainted: GE
4.13.0-next-20170915-autotest #1
task: c000fd49c200 task.stack: c000fb824000
NIP:  480010897c601b78 LR: 480010897c601b78 CTR: 
REGS: cee6fd40 TRAP: 0400   Tainted: GE
(4.13.0-next-20170915-autotest)
MSR:  800042801000   CR: 2200  XER: 0020
CFAR: 0ee97a20 SOFTE: -1152921504565094016
GPR00: 480010897c601b78 0eec51c8 0eea3680
0fc7b5c0
GPR04:  00e0 b9fc
001e
GPR08: 0fa3b000 0fc7b5c0 0fa378f0

GPR12: 01500a90 ce93 

GPR16:   c0c8c7a0
1024
GPR20: c0e44f74 c0130670 
c0e44f74
GPR24: c0e44f70 c000fb8276d0 001e
0002
GPR28: 0001 0001 0002
900b4bfe8545
NIP [480010897c601b78] 0x480010897c601b78
LR [480010897c601b78] 0x480010897c601b78
Call Trace:
Instruction dump:
       
       
---[ end trace d504e921bec4201a ]---
-- 

Regard's

Abdul Haleem
IBM Linux Technology Centre


#
# Automatically generated file; DO NOT EDIT.
# Linux/powerpc 4.10.0-rc5 Kernel Configuration
#
CONFIG_PPC64=y

#
# Processor support
#
CONFIG_PPC_BOOK3S_64=y
# CONFIG_PPC_BOOK3E_64 is not set
CONFIG_GENERIC_CPU=y
# CONFIG_CELL_CPU is not set
# CONFIG_POWER4_CPU is not set
# CONFIG_POWER5_CPU is not set
# CONFIG_POWER6_CPU is not set
# CONFIG_POWER7_CPU is not set
# CONFIG_POWER8_CPU is not set
CONFIG_PPC_BOOK3S=y
CONFIG_PPC_FPU=y
CONFIG_ALTIVEC=y
CONFIG_VSX=y
CONFIG_PPC_ICSWX=y
# CONFIG_PPC_ICSWX_PID is not set
# CONFIG_PPC_ICSWX_USE_SIGILL is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_64=y
CONFIG_PPC_RADIX_MMU=y
CONFIG_PPC_MM_SLICES=y
CONFIG_PPC_HAVE_PMU_SUPPORT=y
CONFIG_PPC_PERF_CTRS=y
CONFIG_SMP=y
CONFIG_NR_CPUS=2048
CONFIG_PPC_DOORBELL=y
CONFIG_VDSO32=y
CONFIG_CPU_BIG_ENDIAN=y
# CONFIG_CPU_LITTLE_ENDIAN is not set
CONFIG_64BIT=y
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_MMU=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NR_IRQS=512
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_ARCH_HAS_ILOG2_U64=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_HAS_DMA_SET_COHERENT_MASK=y
CONFIG_PPC=y
# CONFIG_GENERIC_CSUM is not set

[PATCH 2/2] powerpc/vdso64: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE

2017-09-18 Thread Santosh Sivaraj

Current vDSO64 implementation does not have support for coarse clocks
(CLOCK_MONOTONIC_COARSE, CLOCK_REALTIME_COARSE), for which it falls back
to system call, increasing the response time, vDSO implementation reduces
the cycle time. Below is a benchmark of the difference in execution time
with and without vDSO support.

(Non-coarse clocks are also included just for completion)

Without vDSO support:

clock-gettime-realtime: syscall: 172 nsec/call
clock-gettime-realtime:libc: 26 nsec/call
clock-gettime-realtime:vdso: 21 nsec/call
clock-gettime-monotonic: syscall: 170 nsec/call
clock-gettime-monotonic:libc: 30 nsec/call
clock-gettime-monotonic:vdso: 24 nsec/call
clock-gettime-realtime-coarse: syscall: 153 nsec/call
clock-gettime-realtime-coarse:libc: 15 nsec/call
clock-gettime-realtime-coarse:vdso: 9 nsec/call
clock-gettime-monotonic-coarse: syscall: 167 nsec/call
clock-gettime-monotonic-coarse:libc: 15 nsec/call
clock-gettime-monotonic-coarse:vdso: 11 nsec/call

CC: Benjamin Herrenschmidt 
Signed-off-by: Santosh Sivaraj 
---
 arch/powerpc/kernel/asm-offsets.c |  2 ++
 arch/powerpc/kernel/vdso64/gettimeofday.S | 56 +++
 2 files changed, 58 insertions(+)

diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 8cfb20e38cfe..b55c68c54dc1 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -396,6 +396,8 @@ int main(void)
/* Other bits used by the vdso */
DEFINE(CLOCK_REALTIME, CLOCK_REALTIME);
DEFINE(CLOCK_MONOTONIC, CLOCK_MONOTONIC);
+   DEFINE(CLOCK_REALTIME_COARSE, CLOCK_REALTIME_COARSE);
+   DEFINE(CLOCK_MONOTONIC_COARSE, CLOCK_MONOTONIC_COARSE);
DEFINE(NSEC_PER_SEC, NSEC_PER_SEC);
DEFINE(CLOCK_REALTIME_RES, MONOTONIC_RES_NSEC);
 
diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/powerpc/kernel/vdso64/gettimeofday.S
index a0b4943811db..bae197a81add 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -71,6 +71,11 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
crorcr0*4+eq,cr0*4+eq,cr1*4+eq
beq cr0,49f
 
+   cmpwi   cr0,r3,CLOCK_REALTIME_COARSE
+   cmpwi   cr1,r3,CLOCK_MONOTONIC_COARSE
+   crorcr0*4+eq,cr0*4+eq,cr1*4+eq
+   beq cr0,65f
+
b   99f /* Fallback to syscall */
   .cfi_register lr,r12
 49:bl  V_LOCAL_FUNC(__get_datapage)/* get data page */
@@ -112,6 +117,57 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
 1: bge cr1,80f
addir4,r4,-1
add r5,r5,r7
+   b   80f
+
+   /*
+* For coarse clocks we get data directly from the vdso data page, so
+* we don't need to call __do_get_tspec, but we still need to do the
+* counter trick.
+*/
+65:bl  V_LOCAL_FUNC(__get_datapage)/* get data page */
+70:ld  r8,CFG_TB_UPDATE_COUNT(r3)
+   andi.   r0,r8,1 /* pending update ? loop */
+   bne-70b
+   xor r0,r8,r8/* create dependency */
+   add r3,r3,r0
+
+   /*
+* CLOCK_REALTIME_COARSE, below values are needed for MONOTONIC_COARSE
+* too
+*/
+   ld  r4,STAMP_XTIME+TSPC64_TV_SEC(r3)
+   ld  r5,STAMP_XTIME+TSPC64_TV_NSEC(r3)
+   bne cr1,78f
+
+   /* CLOCK_MONOTONIC_COARSE */
+   lwa r6,WTOM_CLOCK_SEC(r3)
+   lwa r9,WTOM_CLOCK_NSEC(r3)
+
+   /* check if counter has updated */
+78:or  r0,r6,r9
+   xor r0,r0,r0
+   add r3,r3,r0
+   ld  r0,CFG_TB_UPDATE_COUNT(r3)
+   cmpld   cr0,r0,r8   /* check if updated */
+   bne-70b
+
+   /* Counter has not updated, so continue calculating proper values for
+* sec and nsec if monotonic coarse, or just return with the proper
+* values for realtime.
+*/
+   bne cr1,80f
+
+   /* Add wall->monotonic offset and check for overflow or underflow */
+   add r4,r4,r6
+   add r5,r5,r9
+   cmpdcr0,r5,r7
+   cmpdi   cr1,r5,0
+   blt 79f
+   subfr5,r7,r5
+   addir4,r4,1
+79:bge cr1,80f
+   addir4,r4,-1
+   add r5,r5,r7
 
 80:std r4,TSPC64_TV_SEC(r11)
std r5,TSPC64_TV_NSEC(r11)
-- 
2.13.5

[PATCH 1/2] powerpc/vdso64: Coarse timer support preparatory patch

2017-09-18 Thread Santosh Sivaraj

Reorganize code to make it easy to introduce CLOCK_REALTIME_COARSE and
CLOCK_MONOTONIC_COARSE timer support.

Signed-off-by: Santosh Sivaraj 
---
 arch/powerpc/kernel/vdso64/gettimeofday.S | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/vdso64/gettimeofday.S 
b/arch/powerpc/kernel/vdso64/gettimeofday.S
index 382021324883..a0b4943811db 100644
--- a/arch/powerpc/kernel/vdso64/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso64/gettimeofday.S
@@ -60,18 +60,20 @@ V_FUNCTION_END(__kernel_gettimeofday)
  */
 V_FUNCTION_BEGIN(__kernel_clock_gettime)
   .cfi_startproc
+   mr  r11,r4  /* r11 saves tp */
+   mflrr12 /* r12 saves lr */
+   lis r7,NSEC_PER_SEC@h   /* want nanoseconds */
+   ori r7,r7,NSEC_PER_SEC@l
+
/* Check for supported clock IDs */
cmpwi   cr0,r3,CLOCK_REALTIME
cmpwi   cr1,r3,CLOCK_MONOTONIC
crorcr0*4+eq,cr0*4+eq,cr1*4+eq
-   bne cr0,99f
+   beq cr0,49f
 
-   mflrr12 /* r12 saves lr */
+   b   99f /* Fallback to syscall */
   .cfi_register lr,r12
-   mr  r11,r4  /* r11 saves tp */
-   bl  V_LOCAL_FUNC(__get_datapage)/* get data page */
-   lis r7,NSEC_PER_SEC@h   /* want nanoseconds */
-   ori r7,r7,NSEC_PER_SEC@l
+49:bl  V_LOCAL_FUNC(__get_datapage)/* get data page */
 50:bl  V_LOCAL_FUNC(__do_get_tspec)/* get time from tb & kernel */
bne cr1,80f /* if not monotonic, all done */
 
-- 
2.13.5

[PATCH 5/5] powerpc/powernv: implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET

2017-09-18 Thread Nicholas Piggin

This allows MSR[EE]=0 lockups to be detected on an OPAL (bare metal)
system similarly to the hcall NMI IPI on pseries guests, when the
platform/firmware supports it.

This is an example of CPU10 spinning with interrupts hard disabled:

Watchdog CPU:32 detected Hard LOCKUP other CPUS:10
Watchdog CPU:10 Hard LOCKUP
CPU: 10 PID: 4410 Comm: bash Not tainted 4.13.0-rc7-00074-ge89ce1f89f62-dirty 
#34
task: c003a82b4400 task.stack: c003af55c000
NIP: c00a7b38 LR: c0659044 CTR: c00a7b00
REGS: cfd23d80 TRAP: 0100   Not tainted  
(4.13.0-rc7-00074-ge89ce1f89f62-dirty)
MSR: 900c1033 
CR: 2842  XER: 2000
CFAR: c00a7b38 SOFTE: 0
GPR00: c0659044 c003af55fbb0 c1072a00 0078
GPR04: c003c81b5c80 c003c81cc7e8 90009033 
GPR08:  c00a7b00 0001 90001003
GPR12: c00a7b00 cfd83200 10180df8 10189e60
GPR16: 10189ed8 10151270 1018bd88 1018de78
GPR20: 370a0668 0001 101645e0 10163c10
GPR24: 7fffd14d6294 7fffd14d6290 c0fba6f0 0004
GPR28: c0f351d8 0078 c0f4095c 
NIP [c00a7b38] sysrq_handle_xmon+0x38/0x40
LR [c0659044] __handle_sysrq+0xe4/0x270
Call Trace:
[c003af55fbd0] [c0659044] __handle_sysrq+0xe4/0x270
[c003af55fc70] [c0659810] write_sysrq_trigger+0x70/0xa0
[c003af55fca0] [c03da650] proc_reg_write+0xb0/0x110
[c003af55fcf0] [c03423bc] __vfs_write+0x6c/0x1b0
[c003af55fd90] [c0344398] vfs_write+0xd8/0x240
[c003af55fde0] [c034632c] SyS_write+0x6c/0x110
[c003af55fe30] [c000b220] system_call+0x58/0x6c

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/include/asm/opal-api.h|  1 +
 arch/powerpc/include/asm/opal.h|  2 ++
 arch/powerpc/kernel/irq.c  | 43 +++---
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 arch/powerpc/platforms/powernv/powernv.h   |  1 +
 arch/powerpc/platforms/powernv/setup.c |  3 ++
 arch/powerpc/platforms/powernv/smp.c   | 50 ++
 7 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index 450a60b81d2a..9d191ebea706 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -188,6 +188,7 @@
 #define OPAL_XIVE_DUMP 142
 #define OPAL_XIVE_RESERVED3143
 #define OPAL_XIVE_RESERVED4144
+#define OPAL_SIGNAL_SYSTEM_RESET   145
 #define OPAL_NPU_INIT_CONTEXT  146
 #define OPAL_NPU_DESTROY_CONTEXT   147
 #define OPAL_NPU_MAP_LPAR  148
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 726c23304a57..7d7613c49f2b 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -281,6 +281,8 @@ int opal_get_power_shift_ratio(u32 handle, int token, u32 
*psr);
 int opal_set_power_shift_ratio(u32 handle, int token, u32 psr);
 int opal_sensor_group_clear(u32 group_hndl, int token);
 
+int64_t opal_signal_system_reset(int32_t cpu);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 4e65bf82f5e0..472d294d0df5 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -394,11 +394,21 @@ bool prep_irq_for_idle_irqsoff(void)
 /*
  * Take the SRR1 wakeup reason, index into this table to find the
  * appropriate irq_happened bit.
+ *
+ * Sytem reset exceptions taken in idle state also come through here,
+ * but they are NMI interrupts so do not need to wait for IRQs to be
+ * restored, and should be taken as early as practical. These are marked
+ * with 0xff in the table. The Power ISA specifies 0100b as the system
+ * reset interrupt reason, but POWER9 DD1 can set 0010b.
  */
+#define IRQ_SYSTEM_RESET   0xff
+
 static const u8 srr1_to_lazyirq[0x10] = {
-   0, 0, 0,
-   PACA_IRQ_DBELL,
0,
+   0,
+   IRQ_SYSTEM_RESET,
+   PACA_IRQ_DBELL,
+   IRQ_SYSTEM_RESET,
PACA_IRQ_DBELL,
PACA_IRQ_DEC,
0,
@@ -407,15 +417,40 @@ static const u8 srr1_to_lazyirq[0x10] = {
PACA_IRQ_HMI,
0, 0, 0, 0, 0 };
 
+static noinline void replay_system_reset(void)
+{
+   struct pt_regs regs;
+
+   ppc_save_regs();
+   regs.trap = 0x100;
+   get_paca()->in_nmi = 1;
+   system_reset_exception();
+   get_paca()->in_nmi = 0;
+}
+
 void irq_set_pending_from_srr1(unsigned long srr1)
 {
unsigned

[PATCH 4/5] powerpc/xmon: avoid tripping SMP hardlockup watchdog

2017-09-18 Thread Nicholas Piggin

The SMP hardlockup watchdog cross-checks other CPUs for lockups,
which causes xmon headaches because it's assuming interrupts
hard disabled means no watchdog troubles. Try to improve that by
calling touch_nmi_watchdog() in obvious places where secondaries
are spinning.

Also annotate these spin loops with spin_begin/end calls.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/xmon/xmon.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 33351c6704b1..d9a12102b111 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -530,14 +530,19 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
 
  waiting:
secondary = 1;
+   spin_begin();
while (secondary && !xmon_gate) {
if (in_xmon == 0) {
-   if (fromipi)
+   if (fromipi) {
+   spin_end();
goto leave;
+   }
secondary = test_and_set_bit(0, _xmon);
}
-   barrier();
+   spin_cpu_relax();
+   touch_nmi_watchdog();
}
+   spin_end();
 
if (!secondary && !xmon_gate) {
/* we are the first cpu to come in */
@@ -568,21 +573,25 @@ static int xmon_core(struct pt_regs *regs, int fromipi)
mb();
xmon_gate = 1;
barrier();
+   touch_nmi_watchdog();
}
 
  cmdloop:
while (in_xmon) {
if (secondary) {
+   spin_begin();
if (cpu == xmon_owner) {
if (!test_and_set_bit(0, _taken)) {
secondary = 0;
+   spin_end();
continue;
}
/* missed it */
while (cpu == xmon_owner)
-   barrier();
+   spin_cpu_relax();
}
-   barrier();
+   spin_cpu_relax();
+   touch_nmi_watchdog();
} else {
cmd = cmds(regs);
if (cmd != 0) {
-- 
2.13.3

[PATCH 3/5] powerpc/watchdog: do not trigger SMP crash from touch_nmi_watchdog

2017-09-18 Thread Nicholas Piggin

In xmon, touch_nmi_watchdog() is not expected to be checking that
other CPUs have not touched the watchdog, so the code will just
call touch_nmi_watchdog() once before re-enabling hard interrupts.

Just update our CPU's state, and ignore apparently stuck SMP threads.

Arguably touch_nmi_watchdog should check for SMP lockups, and
callers should be fixed, but that's not trivial for the input code
of xmon.
---
 arch/powerpc/kernel/watchdog.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 920e61c79f47..1fb9379dc683 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -277,9 +277,12 @@ void arch_touch_nmi_watchdog(void)
 {
unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000;
int cpu = smp_processor_id();
+   u64 tb = get_tb();
 
-   if (get_tb() - per_cpu(wd_timer_tb, cpu) >= ticks)
-   watchdog_timer_interrupt(cpu);
+   if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) {
+   per_cpu(wd_timer_tb, cpu) = tb;
+   wd_smp_clear_cpu_pending(cpu, tb);
+   }
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);
 
-- 
2.13.3

[PATCH 2/5] powerpc/watchdog: do not backtrace locked CPUs twice if allcpus backtrace is enabled

2017-09-18 Thread Nicholas Piggin

If sysctl_hardlockup_all_cpu_backtrace is enabled, there is no need to
IPI stuck CPUs for backtrace before trigger_allbutself_cpu_backtrace(),
which does the same thing again.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/watchdog.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 532a1adbe89b..920e61c79f47 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -133,15 +133,18 @@ static void watchdog_smp_panic(int cpu, u64 tb)
pr_emerg("Watchdog CPU:%d detected Hard LOCKUP other CPUS:%*pbl\n",
cpu, cpumask_pr_args(_smp_cpus_pending));
 
-   /*
-* Try to trigger the stuck CPUs.
-*/
-   for_each_cpu(c, _smp_cpus_pending) {
-   if (c == cpu)
-   continue;
-   smp_send_nmi_ipi(c, wd_lockup_ipi, 100);
+   if (!sysctl_hardlockup_all_cpu_backtrace) {
+   /*
+* Try to trigger the stuck CPUs, unless we are going to
+* get a backtrace on all of them anyway.
+*/
+   for_each_cpu(c, _smp_cpus_pending) {
+   if (c == cpu)
+   continue;
+   smp_send_nmi_ipi(c, wd_lockup_ipi, 100);
+   }
+   smp_flush_nmi_ipi(100);
}
-   smp_flush_nmi_ipi(100);
 
/* Take the stuck CPUs out of the watch group */
set_cpumask_stuck(_smp_cpus_pending, tb);
-- 
2.13.3

[PATCH 1/5] powerpc/watchdog: do not panic from locked CPU's IPI handler

2017-09-18 Thread Nicholas Piggin

The SMP watchdog will detect locked CPUs and IPI them to print a
backtrace and registers. If panic on hard lockup is enabled, do
not panic from this handler, because that can cause recursion into
the IPI layer during the panic.

The caller already panics in this case.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/watchdog.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 2f6eadd9408d..532a1adbe89b 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -97,8 +97,7 @@ static void wd_lockup_ipi(struct pt_regs *regs)
else
dump_stack();
 
-   if (hardlockup_panic)
-   nmi_panic(regs, "Hard LOCKUP");
+   /* Do not panic from here because that can recurse into NMI IPI layer */
 }
 
 static void set_cpumask_stuck(const struct cpumask *cpumask, u64 tb)
-- 
2.13.3

[PATCH 0/5] More NMI IPI enablement work

2017-09-18 Thread Nicholas Piggin

Hi,

Here is the NMI IPI enablement for powernv, which requires
supported firmware and hardware:

https://lists.ozlabs.org/pipermail/skiboot/2017-September/009111.html

And some preceeding patches that fix issues I ran into when
testing it.

Thanks,
Nick

Nicholas Piggin (5):
  powerpc/watchdog: do not panic from locked CPU's IPI handler
  powerpc/watchdog: do not backtrace locked CPUs twice if allcpus
backtrace is enabled
  powerpc/watchdog: do not trigger SMP crash from touch_nmi_watchdog
  powerpc/xmon: avoid tripping SMP hardlockup watchdog
  powerpc/powernv: implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET

 arch/powerpc/include/asm/opal-api.h|  1 +
 arch/powerpc/include/asm/opal.h|  2 ++
 arch/powerpc/kernel/irq.c  | 43 +++---
 arch/powerpc/kernel/watchdog.c | 29 ---
 arch/powerpc/platforms/powernv/opal-wrappers.S |  1 +
 arch/powerpc/platforms/powernv/powernv.h   |  1 +
 arch/powerpc/platforms/powernv/setup.c |  3 ++
 arch/powerpc/platforms/powernv/smp.c   | 50 ++
 arch/powerpc/xmon/xmon.c   | 17 ++---
 9 files changed, 127 insertions(+), 20 deletions(-)

-- 
2.13.3

Re: [PATCH] Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"

2017-09-18 Thread Paolo Bonzini

On 18/09/2017 08:44, Michael Ellerman wrote:
> "Jan H. Schönherr"  writes:
> 
>> This reverts commit 36ae3c0a36b7456432fedce38ae2f7bd3e01a563.
>>
>> The commit broke compilation on !CONFIG_HAVE_KVM_IRQ_ROUTING. Also,
>> there may be cases with CONFIG_HAVE_KVM_IRQ_ROUTING, where larger
>> gsi values make sense.
>>
>> As the commit was meant as an early indicator to user space that
>> something is wrong, reverting just restores the previous behavior
>> where overly large values are ignored when encountered (without
>> any direct feedback).
>>
>> Reported-by: Abdul Haleem 
>> Signed-off-by: Jan H. Schönherr 
>> ---
>>  virt/kvm/eventfd.c | 2 --
>>  1 file changed, 2 deletions(-)
> 
> Can someone merge this or preferably just send it straight to Linus, the
> original patch has broken the powerpc build in mainline.

I can send it later today.

Paolo

Re: [PATCH v3 00/20] Speculative page faults

2017-09-18 Thread Laurent Dufour

Despite the unprovable lockdep warning raised by Sergey, I didn't get any
feedback on this series.

Is there a chance to get it moved upstream ?

Thanks,
Laurent.

On 08/09/2017 20:06, Laurent Dufour wrote:
> This is a port on kernel 4.13 of the work done by Peter Zijlstra to
> handle page fault without holding the mm semaphore [1].
> 
> The idea is to try to handle user space page faults without holding the
> mmap_sem. This should allow better concurrency for massively threaded
> process since the page fault handler will not wait for other threads memory
> layout change to be done, assuming that this change is done in another part
> of the process's memory space. This type page fault is named speculative
> page fault. If the speculative page fault fails because of a concurrency is
> detected or because underlying PMD or PTE tables are not yet allocating, it
> is failing its processing and a classic page fault is then tried.
> 
> The speculative page fault (SPF) has to look for the VMA matching the fault
> address without holding the mmap_sem, so the VMA list is now managed using
> SRCU allowing lockless walking. The only impact would be the deferred file
> derefencing in the case of a file mapping, since the file pointer is
> released once the SRCU cleaning is done.  This patch relies on the change
> done recently by Paul McKenney in SRCU which now runs a callback per CPU
> instead of per SRCU structure [1].
> 
> The VMA's attributes checked during the speculative page fault processing
> have to be protected against parallel changes. This is done by using a per
> VMA sequence lock. This sequence lock allows the speculative page fault
> handler to fast check for parallel changes in progress and to abort the
> speculative page fault in that case.
> 
> Once the VMA is found, the speculative page fault handler would check for
> the VMA's attributes to verify that the page fault has to be handled
> correctly or not. Thus the VMA is protected through a sequence lock which
> allows fast detection of concurrent VMA changes. If such a change is
> detected, the speculative page fault is aborted and a *classic* page fault
> is tried.  VMA sequence locks are added when VMA attributes which are
> checked during the page fault are modified.
> 
> When the PTE is fetched, the VMA is checked to see if it has been changed,
> so once the page table is locked, the VMA is valid, so any other changes
> leading to touching this PTE will need to lock the page table, so no
> parallel change is possible at this time.
> 
> Compared to the Peter's initial work, this series introduces a spin_trylock
> when dealing with speculative page fault. This is required to avoid dead
> lock when handling a page fault while a TLB invalidate is requested by an
> other CPU holding the PTE. Another change due to a lock dependency issue
> with mapping->i_mmap_rwsem.
> 
> In addition some VMA field values which are used once the PTE is unlocked
> at the end the page fault path are saved into the vm_fault structure to
> used the values matching the VMA at the time the PTE was locked.
> 
> This series only support VMA with no vm_ops define, so huge page and mapped
> file are not managed with the speculative path. In addition transparent
> huge page are not supported. Once this series will be accepted upstream
> I'll extend the support to mapped files, and transparent huge pages.
> 
> This series builds on top of v4.13.9-mm1 and is functional on x86 and
> PowerPC.
> 
> Tests have been made using a large commercial in-memory database on a
> PowerPC system with 752 CPU using RFC v5 using a previous version of this
> series. The results are very encouraging since the loading of the 2TB
> database was faster by 14% with the speculative page fault.
> 
> Using ebizzy test [3], which spreads a lot of threads, the result are good
> when running on both a large or a small system. When using kernbench, the
> result are quite similar which expected as not so much multithreaded
> processes are involved. But there is no performance degradation neither
> which is good.
> 
> --
> Benchmarks results
> 
> Note these test have been made on top of 4.13.0-mm1.
> 
> Ebizzy:
> ---
> The test is counting the number of records per second it can manage, the
> higher is the best. I run it like this 'ebizzy -mTRp'. To get consistent
> result I repeated the test 100 times and measure the average result, mean
> deviation, max and min.
> 
> - 16 CPUs x86 VM
> Records/s 4.13.0-mm1  4.13.0-mm1-spf  delta
> Average   13217.9065765.94+397.55%
> Mean deviation690.37  2609.36 +277.97%
> Max   16726   77675   +364.40%
> Min   12194   616340  +405.45%
>   
> - 80 CPUs Power 8 node:
> Records/s 4.13.0-mm1  4.13.0-mm1-spf  delta
> Average   38175.4067635.5577.17% 
> Mean deviation600.09  2349.66

Re: [PATCH] Revert "KVM: Don't accept obviously wrong gsi values via KVM_IRQFD"

2017-09-18 Thread Michael Ellerman

"Jan H. Schönherr"  writes:

> This reverts commit 36ae3c0a36b7456432fedce38ae2f7bd3e01a563.
>
> The commit broke compilation on !CONFIG_HAVE_KVM_IRQ_ROUTING. Also,
> there may be cases with CONFIG_HAVE_KVM_IRQ_ROUTING, where larger
> gsi values make sense.
>
> As the commit was meant as an early indicator to user space that
> something is wrong, reverting just restores the previous behavior
> where overly large values are ignored when encountered (without
> any direct feedback).
>
> Reported-by: Abdul Haleem 
> Signed-off-by: Jan H. Schönherr 
> ---
>  virt/kvm/eventfd.c | 2 --
>  1 file changed, 2 deletions(-)

Can someone merge this or preferably just send it straight to Linus, the
original patch has broken the powerpc build in mainline.

cheers

Re: [PATCH] KVM: PPC: fix oops when checking KVM_CAP_PPC_HTM

2017-09-18 Thread Thomas Huth

On 15.09.2017 10:59, David Gibson wrote:
> On Fri, Sep 15, 2017 at 07:52:49AM +0200, Greg Kurz wrote:
>> Dang! The mail relay at OVH has blacklisted Paul's address :-\
>>
>> : host smtp.samba.org[144.76.82.148] said: 550-blacklisted 
>> at
>> zen.spamhaus.org 550 https://www.spamhaus.org/sbl/query/SBL370982 (in 
>> reply
>> to RCPT TO command)
>>
>> Cc'ing Paul at ozlabs.org
>>
>> On Fri, 15 Sep 2017 10:48:39 +1000
>> David Gibson  wrote:
>>
>>> On Thu, Sep 14, 2017 at 11:56:25PM +0200, Greg Kurz wrote:
 The following program causes a kernel oops:

 #include 
 #include 
 #include 
 #include 
 #include 

 main()
 {
 int fd = open("/dev/kvm", O_RDWR);
 ioctl(fd, KVM_CHECK_EXTENSION, KVM_CAP_PPC_HTM);
 }

 This happens because when using the global KVM fd with
 KVM_CHECK_EXTENSION, kvm_vm_ioctl_check_extension() gets
 called with a NULL kvm argument, which gets dereferenced
 in is_kvmppc_hv_enabled(). Spotted while reading the code.

 Let's use the hv_enabled fallback variable, like everywhere
 else in this function.

 Fixes: 23528bb21ee2 ("KVM: PPC: Introduce KVM_CAP_PPC_HTM")
 Cc: sta...@vger.kernel.org # v4.7+
 Signed-off-by: Greg Kurz   
>>>
>>> I don't think this is right.  I'm pretty sure you want to fall back to
>>> hv_enabled *only when* kvm is NULL.  Otherwise if you have a PR guest
>>> on an HV capable machine, this will give the wrong answer, when called
>>> for that specific VM.
>>>
>>
>> Hmmm... this is what we get with this patch applied:
>>
>> open("/dev/kvm", O_RDWR)= 3
>> ioctl(3, KVM_CHECK_EXTENSION, 0x84) = 1 <== if HV is present
>> ioctl(3, KVM_CREATE_VM, 0x1)= 4 <== HV
>> ioctl(4, KVM_CHECK_EXTENSION, 0x84) = 1
>> ioctl(3, KVM_CREATE_VM, 0x2)= 5 <== PR
>> ioctl(5, KVM_CHECK_EXTENSION, 0x84) = 0
>>
>> The hv_enabled variable is set as follows:
>>
>>  /* Assume we're using HV mode when the HV module is loaded */
>>  int hv_enabled = kvmppc_hv_ops ? 1 : 0;
>>
>>  if (kvm) {
>>  /*
>>   * Hooray - we know which VM type we're running on. Depend on
>>   * that rather than the guess above.
>>   */
>>  hv_enabled = is_kvmppc_hv_enabled(kvm);
>>  }
>>
>> so we're good. :)
> 
> Oh, sorry, missed that bit.  In that case.
> 
> Reviewed-by: David Gibson 

LGTM, too:

Reviewed-by: Thomas Huth 



signature.asc
Description: OpenPGP digital signature

56 matches

Mail list logo