[PATCH tip/locking/core v4 4/6] powerpc: atomic: Implement atomic{, 64}_*_return_* variants

2015-10-14 Thread Boqun Feng
On powerpc, acquire and release semantics can be achieved with
lightweight barriers("lwsync" and "ctrl+isync"), which can be used to
implement __atomic_op_{acquire,release}.

For release semantics, since we only need to ensure all memory accesses
that issue before must take effects before the -store- part of the
atomics, "lwsync" is what we only need. On the platform without
"lwsync", "sync" should be used. Therefore, smp_lwsync() is used here.

For acquire semantics, "lwsync" is what we only need for the similar
reason.  However on the platform without "lwsync", we can use "isync"
rather than "sync" as an acquire barrier. Therefore in
__atomic_op_acquire() we use PPC_ACQUIRE_BARRIER, which is barrier() on
UP, "lwsync" if available and "isync" otherwise.

__atomic_op_fence is defined as smp_lwsync() + _relaxed +
smp_mb__after_atomic() to guarantee a full barrier.

Implement atomic{,64}_{add,sub,inc,dec}_return_relaxed, and build other
variants with these helpers.

Signed-off-by: Boqun Feng 
---
 arch/powerpc/include/asm/atomic.h | 116 --
 1 file changed, 74 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 55f106e..ab76461 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -12,6 +12,33 @@
 
 #define ATOMIC_INIT(i) { (i) }
 
+/*
+ * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
+ * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
+ * on the platform without lwsync.
+ */
+#define __atomic_op_acquire(op, args...)   \
+({ \
+   typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
+   __asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory");\
+   __ret;  \
+})
+
+#define __atomic_op_release(op, args...)   \
+({ \
+   smp_lwsync();   \
+   op##_relaxed(args); \
+})
+
+#define __atomic_op_fence(op, args...) \
+({ \
+   typeof(op##_relaxed(args)) __ret;   \
+   smp_lwsync();   \
+   __ret = op##_relaxed(args); \
+   smp_mb__after_atomic(); \
+   __ret;  \
+})
+
 static __inline__ int atomic_read(const atomic_t *v)
 {
int t;
@@ -42,27 +69,27 @@ static __inline__ void atomic_##op(int a, atomic_t *v)  
\
: "cc");\
 }  \
 
-#define ATOMIC_OP_RETURN(op, asm_op)   \
-static __inline__ int atomic_##op##_return(int a, atomic_t *v) \
+#define ATOMIC_OP_RETURN_RELAXED(op, asm_op)   \
+static inline int atomic_##op##_return_relaxed(int a, atomic_t *v) \
 {  \
int t;  \
\
__asm__ __volatile__(   \
-   PPC_ATOMIC_ENTRY_BARRIER\
-"1:lwarx   %0,0,%2 # atomic_" #op "_return\n"  \
-   #asm_op " %0,%1,%0\n"   \
-   PPC405_ERR77(0,%2)  \
-"  stwcx.  %0,0,%2 \n" \
+"1:lwarx   %0,0,%3 # atomic_" #op "_return_relaxed\n"  \
+   #asm_op " %0,%2,%0\n"   \
+   PPC405_ERR77(0, %3) \
+"  stwcx.  %0,0,%3\n"  \
 "  bne-1b\n"   \
-   PPC_ATOMIC_EXIT_BARRIER \
-   : "=" (t) \
+   : "=" (t), "+m" (v->counter)  \
: "r" (a), "r" (>counter)\
-   : "cc", "memory");  \
+   : "cc");\
\
return t;

[PATCH tip/locking/core v4 3/6] atomics: Allow architectures to define their own __atomic_op_* helpers

2015-10-14 Thread Boqun Feng
Some architectures may have their special barriers for acquire, release
and fence semantics, so that general memory barriers(smp_mb__*_atomic())
in the default __atomic_op_*() may be too strong, so allow architectures
to define their own helpers which can overwrite the default helpers.

Signed-off-by: Boqun Feng 
---
 include/linux/atomic.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 27e580d..947c1dc 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -43,20 +43,29 @@ static inline int atomic_read_ctrl(const atomic_t *v)
  * The idea here is to build acquire/release variants by adding explicit
  * barriers on top of the relaxed variant. In the case where the relaxed
  * variant is already fully ordered, no additional barriers are needed.
+ *
+ * Besides, if an arch has a special barrier for acquire/release, it could
+ * implement its own __atomic_op_* and use the same framework for building
+ * variants
  */
+#ifndef __atomic_op_acquire
 #define __atomic_op_acquire(op, args...)   \
 ({ \
typeof(op##_relaxed(args)) __ret  = op##_relaxed(args); \
smp_mb__after_atomic(); \
__ret;  \
 })
+#endif
 
+#ifndef __atomic_op_release
 #define __atomic_op_release(op, args...)   \
 ({ \
smp_mb__before_atomic();\
op##_relaxed(args); \
 })
+#endif
 
+#ifndef __atomic_op_fence
 #define __atomic_op_fence(op, args...) \
 ({ \
typeof(op##_relaxed(args)) __ret;   \
@@ -65,6 +74,7 @@ static inline int atomic_read_ctrl(const atomic_t *v)
smp_mb__after_atomic(); \
__ret;  \
 })
+#endif
 
 /* atomic_add_return_relaxed */
 #ifndef atomic_add_return_relaxed
-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-14 Thread Nishanth Aravamudan
Hi Christoph,

On 12.10.2015 [14:06:51 -0700], Nishanth Aravamudan wrote:
> On 06.10.2015 [02:51:36 -0700], Christoph Hellwig wrote:
> > Do we need a function here or can we just have a IOMMU_PAGE_SHIFT define
> > with an #ifndef in common code?
> 
> On Power, since it's technically variable, we'd need a function. So are
> you suggesting define'ing it to a function just on Power and leaving it
> a constant elsewhere?
> 
> I noticed that sparc has a IOMMU_PAGE_SHIFT already, fwiw.

Sorry, I should have been more specific -- I'm ready to spin out a v3,
with a sparc-specific function.

Are you ok with leaving it a function for now (the only caller is in
NVMe obviously).

-Nish

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH tip/locking/core v4 5/6] powerpc: atomic: Implement xchg_* and atomic{, 64}_xchg_* variants

2015-10-14 Thread Boqun Feng
Implement xchg_relaxed and atomic{,64}_xchg_relaxed, based on these
_relaxed variants, release/acquire variants and fully ordered versions
can be built.

Note that xchg_relaxed and atomic_{,64}_xchg_relaxed are not compiler
barriers.

Signed-off-by: Boqun Feng 
---
 arch/powerpc/include/asm/atomic.h  |  2 ++
 arch/powerpc/include/asm/cmpxchg.h | 69 +-
 2 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index ab76461..1e9d526 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -186,6 +186,7 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t *v)
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
 /**
  * __atomic_add_unless - add unless the number is a given value
@@ -453,6 +454,7 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t 
*v)
 
 #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
+#define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
 /**
  * atomic64_add_unless - add unless the number is a given value
diff --git a/arch/powerpc/include/asm/cmpxchg.h 
b/arch/powerpc/include/asm/cmpxchg.h
index d1a8d93..17c7e14 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -9,21 +9,20 @@
 /*
  * Atomic exchange
  *
- * Changes the memory location '*ptr' to be val and returns
+ * Changes the memory location '*p' to be val and returns
  * the previous value stored there.
  */
+
 static __always_inline unsigned long
-__xchg_u32(volatile void *p, unsigned long val)
+__xchg_u32_local(volatile void *p, unsigned long val)
 {
unsigned long prev;
 
__asm__ __volatile__(
-   PPC_ATOMIC_ENTRY_BARRIER
 "1:lwarx   %0,0,%2 \n"
PPC405_ERR77(0,%2)
 "  stwcx.  %3,0,%2 \n\
bne-1b"
-   PPC_ATOMIC_EXIT_BARRIER
: "=" (prev), "+m" (*(volatile unsigned int *)p)
: "r" (p), "r" (val)
: "cc", "memory");
@@ -31,42 +30,34 @@ __xchg_u32(volatile void *p, unsigned long val)
return prev;
 }
 
-/*
- * Atomic exchange
- *
- * Changes the memory location '*ptr' to be val and returns
- * the previous value stored there.
- */
 static __always_inline unsigned long
-__xchg_u32_local(volatile void *p, unsigned long val)
+__xchg_u32_relaxed(u32 *p, unsigned long val)
 {
unsigned long prev;
 
__asm__ __volatile__(
-"1:lwarx   %0,0,%2 \n"
-   PPC405_ERR77(0,%2)
-"  stwcx.  %3,0,%2 \n\
-   bne-1b"
-   : "=" (prev), "+m" (*(volatile unsigned int *)p)
+"1:lwarx   %0,0,%2\n"
+   PPC405_ERR77(0, %2)
+"  stwcx.  %3,0,%2\n"
+"  bne-1b"
+   : "=" (prev), "+m" (*p)
: "r" (p), "r" (val)
-   : "cc", "memory");
+   : "cc");
 
return prev;
 }
 
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
-__xchg_u64(volatile void *p, unsigned long val)
+__xchg_u64_local(volatile void *p, unsigned long val)
 {
unsigned long prev;
 
__asm__ __volatile__(
-   PPC_ATOMIC_ENTRY_BARRIER
 "1:ldarx   %0,0,%2 \n"
PPC405_ERR77(0,%2)
 "  stdcx.  %3,0,%2 \n\
bne-1b"
-   PPC_ATOMIC_EXIT_BARRIER
: "=" (prev), "+m" (*(volatile unsigned long *)p)
: "r" (p), "r" (val)
: "cc", "memory");
@@ -75,18 +66,18 @@ __xchg_u64(volatile void *p, unsigned long val)
 }
 
 static __always_inline unsigned long
-__xchg_u64_local(volatile void *p, unsigned long val)
+__xchg_u64_relaxed(u64 *p, unsigned long val)
 {
unsigned long prev;
 
__asm__ __volatile__(
-"1:ldarx   %0,0,%2 \n"
-   PPC405_ERR77(0,%2)
-"  stdcx.  %3,0,%2 \n\
-   bne-1b"
-   : "=" (prev), "+m" (*(volatile unsigned long *)p)
+"1:ldarx   %0,0,%2\n"
+   PPC405_ERR77(0, %2)
+"  stdcx.  %3,0,%2\n"
+"  bne-1b"
+   : "=" (prev), "+m" (*p)
: "r" (p), "r" (val)
-   : "cc", "memory");
+   : "cc");
 
return prev;
 }
@@ -99,14 +90,14 @@ __xchg_u64_local(volatile void *p, unsigned long val)
 extern void __xchg_called_with_bad_pointer(void);
 
 static __always_inline unsigned long
-__xchg(volatile void *ptr, unsigned long x, unsigned int size)
+__xchg_local(volatile void *ptr, unsigned long x, unsigned int size)
 {
switch (size) {
case 4:
-   return __xchg_u32(ptr, x);
+   return __xchg_u32_local(ptr, x);
 #ifdef CONFIG_PPC64
case 8:
-   return __xchg_u64(ptr, x);
+   return __xchg_u64_local(ptr, x);
 #endif
}
__xchg_called_with_bad_pointer();
@@ -114,25 +105,19 @@ __xchg(volatile void *ptr, unsigned long x, unsigned int 
size)
 }
 
 static 

[PATCH tip/locking/core v4 2/6] atomics: Add test for atomic operations with _relaxed variants

2015-10-14 Thread Boqun Feng
Some atomic operations now have _relaxed/acquire/release variants, this
patch then adds some trivial tests for two purpose:

1.  test the behavior of these new operations in single-CPU
environment.

2.  make their code generated before we actually use them somewhere,
so that we can examine their assembly code.

Signed-off-by: Boqun Feng 
---
 lib/atomic64_test.c | 120 ++--
 1 file changed, 79 insertions(+), 41 deletions(-)

diff --git a/lib/atomic64_test.c b/lib/atomic64_test.c
index 83c33a5b..18e422b 100644
--- a/lib/atomic64_test.c
+++ b/lib/atomic64_test.c
@@ -27,6 +27,65 @@ do { 
\
(unsigned long long)r); \
 } while (0)
 
+/*
+ * Test for a atomic operation family,
+ * @test should be a macro accepting parameters (bit, op, ...)
+ */
+
+#define FAMILY_TEST(test, bit, op, args...)\
+do {   \
+   test(bit, op, ##args);  \
+   test(bit, op##_acquire, ##args);\
+   test(bit, op##_release, ##args);\
+   test(bit, op##_relaxed, ##args);\
+} while (0)
+
+#define TEST_RETURN(bit, op, c_op, val)\
+do {   \
+   atomic##bit##_set(, v0);  \
+   r = v0; \
+   r c_op val; \
+   BUG_ON(atomic##bit##_##op(val, ) != r);   \
+   BUG_ON(atomic##bit##_read() != r);\
+} while (0)
+
+#define RETURN_FAMILY_TEST(bit, op, c_op, val) \
+do {   \
+   FAMILY_TEST(TEST_RETURN, bit, op, c_op, val);   \
+} while (0)
+
+#define TEST_ARGS(bit, op, init, ret, expect, args...) \
+do {   \
+   atomic##bit##_set(, init);\
+   BUG_ON(atomic##bit##_##op(, ##args) != ret);  \
+   BUG_ON(atomic##bit##_read() != expect);   \
+} while (0)
+
+#define XCHG_FAMILY_TEST(bit, init, new)   \
+do {   \
+   FAMILY_TEST(TEST_ARGS, bit, xchg, init, init, new, new);\
+} while (0)
+
+#define CMPXCHG_FAMILY_TEST(bit, init, new, wrong) \
+do {   \
+   FAMILY_TEST(TEST_ARGS, bit, cmpxchg,\
+   init, init, new, init, new);\
+   FAMILY_TEST(TEST_ARGS, bit, cmpxchg,\
+   init, init, init, wrong, new);  \
+} while (0)
+
+#define INC_RETURN_FAMILY_TEST(bit, i) \
+do {   \
+   FAMILY_TEST(TEST_ARGS, bit, inc_return, \
+   i, (i) + one, (i) + one);   \
+} while (0)
+
+#define DEC_RETURN_FAMILY_TEST(bit, i) \
+do {   \
+   FAMILY_TEST(TEST_ARGS, bit, dec_return, \
+   i, (i) - one, (i) - one);   \
+} while (0)
+
 static __init void test_atomic(void)
 {
int v0 = 0xaaa31337;
@@ -45,6 +104,18 @@ static __init void test_atomic(void)
TEST(, and, &=, v1);
TEST(, xor, ^=, v1);
TEST(, andnot, &= ~, v1);
+
+   RETURN_FAMILY_TEST(, add_return, +=, onestwos);
+   RETURN_FAMILY_TEST(, add_return, +=, -one);
+   RETURN_FAMILY_TEST(, sub_return, -=, onestwos);
+   RETURN_FAMILY_TEST(, sub_return, -=, -one);
+
+   INC_RETURN_FAMILY_TEST(, v0);
+   DEC_RETURN_FAMILY_TEST(, v0);
+
+   XCHG_FAMILY_TEST(, v0, v1);
+   CMPXCHG_FAMILY_TEST(, v0, v1, onestwos);
+
 }
 
 #define INIT(c) do { atomic64_set(, c); r = c; } while (0)
@@ -74,25 +145,10 @@ static __init void test_atomic64(void)
TEST(64, xor, ^=, v1);
TEST(64, andnot, &= ~, v1);
 
-   INIT(v0);
-   r += onestwos;
-   BUG_ON(atomic64_add_return(onestwos, ) != r);
-   BUG_ON(v.counter != r);
-
-   INIT(v0);
-   r += -one;
-   BUG_ON(atomic64_add_return(-one, ) != r);
-   BUG_ON(v.counter != r);
-
-   INIT(v0);
-   r -= onestwos;
-   BUG_ON(atomic64_sub_return(onestwos, ) != r);
-   BUG_ON(v.counter != r);
-
-   INIT(v0);
-   r -= -one;
-   BUG_ON(atomic64_sub_return(-one, ) != r);
-   BUG_ON(v.counter != r);
+   RETURN_FAMILY_TEST(64, add_return, +=, onestwos);
+   RETURN_FAMILY_TEST(64, add_return, +=, -one);
+   RETURN_FAMILY_TEST(64, sub_return, -=, onestwos);
+   RETURN_FAMILY_TEST(64, sub_return, -=, -one);
 

[PATCH tip/locking/core v4 6/6] powerpc: atomic: Implement cmpxchg{, 64}_* and atomic{, 64}_cmpxchg_* variants

2015-10-14 Thread Boqun Feng
Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
which _release variants can be built.

To avoid superfluous barriers in _acquire variants, we implement these
operations with assembly code rather use __atomic_op_acquire() to build
them automatically.

For the same reason, we keep the assembly implementation of fully
ordered cmpxchg operations.

However, we don't do the similar for _release, because that will require
putting barriers in the middle of ll/sc loops, which is probably a bad
idea.

Note cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed are not
compiler barriers.

Signed-off-by: Boqun Feng 
---
 arch/powerpc/include/asm/atomic.h  |  10 +++
 arch/powerpc/include/asm/cmpxchg.h | 149 -
 2 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 1e9d526..e58188d 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -185,6 +185,11 @@ static __inline__ int atomic_dec_return_relaxed(atomic_t 
*v)
 #define atomic_dec_return_relaxed atomic_dec_return_relaxed
 
 #define atomic_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic_cmpxchg_relaxed(v, o, n) \
+   cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic_cmpxchg_acquire(v, o, n) \
+   cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
@@ -453,6 +458,11 @@ static __inline__ long atomic64_dec_if_positive(atomic64_t 
*v)
 }
 
 #define atomic64_cmpxchg(v, o, n) (cmpxchg(&((v)->counter), (o), (n)))
+#define atomic64_cmpxchg_relaxed(v, o, n) \
+   cmpxchg_relaxed(&((v)->counter), (o), (n))
+#define atomic64_cmpxchg_acquire(v, o, n) \
+   cmpxchg_acquire(&((v)->counter), (o), (n))
+
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
 #define atomic64_xchg_relaxed(v, new) xchg_relaxed(&((v)->counter), (new))
 
diff --git a/arch/powerpc/include/asm/cmpxchg.h 
b/arch/powerpc/include/asm/cmpxchg.h
index 17c7e14..cae4fa8 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -181,6 +181,56 @@ __cmpxchg_u32_local(volatile unsigned int *p, unsigned 
long old,
return prev;
 }
 
+static __always_inline unsigned long
+__cmpxchg_u32_relaxed(u32 *p, unsigned long old, unsigned long new)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__ (
+"1:lwarx   %0,0,%2 # __cmpxchg_u32_relaxed\n"
+"  cmpw0,%0,%3\n"
+"  bne-2f\n"
+   PPC405_ERR77(0, %2)
+"  stwcx.  %4,0,%2\n"
+"  bne-1b\n"
+"2:"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (old), "r" (new)
+   : "cc");
+
+   return prev;
+}
+
+/*
+ * cmpxchg family don't have order guarantee if cmp part fails, therefore we
+ * can avoid superfluous barriers if we use assembly code to implement
+ * cmpxchg() and cmpxchg_acquire(), however we don't do the similar for
+ * cmpxchg_release() because that will result in putting a barrier in the
+ * middle of a ll/sc loop, which is probably a bad idea. For example, this
+ * might cause the conditional store more likely to fail.
+ */
+static __always_inline unsigned long
+__cmpxchg_u32_acquire(u32 *p, unsigned long old, unsigned long new)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__ (
+"1:lwarx   %0,0,%2 # __cmpxchg_u32_acquire\n"
+"  cmpw0,%0,%3\n"
+"  bne-2f\n"
+   PPC405_ERR77(0, %2)
+"  stwcx.  %4,0,%2\n"
+"  bne-1b\n"
+   PPC_ACQUIRE_BARRIER
+   "\n"
+"2:"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (old), "r" (new)
+   : "cc", "memory");
+
+   return prev;
+}
+
 #ifdef CONFIG_PPC64
 static __always_inline unsigned long
 __cmpxchg_u64(volatile unsigned long *p, unsigned long old, unsigned long new)
@@ -224,6 +274,46 @@ __cmpxchg_u64_local(volatile unsigned long *p, unsigned 
long old,
 
return prev;
 }
+
+static __always_inline unsigned long
+__cmpxchg_u64_relaxed(u64 *p, unsigned long old, unsigned long new)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__ (
+"1:ldarx   %0,0,%2 # __cmpxchg_u64_relaxed\n"
+"  cmpd0,%0,%3\n"
+"  bne-2f\n"
+"  stdcx.  %4,0,%2\n"
+"  bne-1b\n"
+"2:"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (old), "r" (new)
+   : "cc");
+
+   return prev;
+}
+
+static __always_inline unsigned long
+__cmpxchg_u64_acquire(u64 *p, unsigned long old, unsigned long new)
+{
+   unsigned long prev;
+
+   __asm__ __volatile__ (
+"1:ldarx   %0,0,%2 # __cmpxchg_u64_acquire\n"
+"  cmpd0,%0,%3\n"
+"  bne-2f\n"
+"  stdcx.  %4,0,%2\n"
+"  bne-1b\n"
+   PPC_ACQUIRE_BARRIER
+   "\n"
+"2:"
+   : "=" (prev), "+m" (*p)
+   : "r" (p), "r" (old), "r" (new)
+   : "cc", 

[PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Boqun Feng
According to memory-barriers.txt, xchg, cmpxchg and their atomic{,64}_
versions all need to imply a full barrier, however they are now just
RELEASE+ACQUIRE, which is not a full barrier.

So replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
__{cmp,}xchg_{u32,u64} respectively to guarantee a full barrier
semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg().

This patch is a complement of commit b97021f85517 ("powerpc: Fix
atomic_xxx_return barrier semantics").

Acked-by: Michael Ellerman 
Cc:  # 3.4+
Signed-off-by: Boqun Feng 
---
 arch/powerpc/include/asm/cmpxchg.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/cmpxchg.h 
b/arch/powerpc/include/asm/cmpxchg.h
index ad6263c..d1a8d93 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -18,12 +18,12 @@ __xchg_u32(volatile void *p, unsigned long val)
unsigned long prev;
 
__asm__ __volatile__(
-   PPC_RELEASE_BARRIER
+   PPC_ATOMIC_ENTRY_BARRIER
 "1:lwarx   %0,0,%2 \n"
PPC405_ERR77(0,%2)
 "  stwcx.  %3,0,%2 \n\
bne-1b"
-   PPC_ACQUIRE_BARRIER
+   PPC_ATOMIC_EXIT_BARRIER
: "=" (prev), "+m" (*(volatile unsigned int *)p)
: "r" (p), "r" (val)
: "cc", "memory");
@@ -61,12 +61,12 @@ __xchg_u64(volatile void *p, unsigned long val)
unsigned long prev;
 
__asm__ __volatile__(
-   PPC_RELEASE_BARRIER
+   PPC_ATOMIC_ENTRY_BARRIER
 "1:ldarx   %0,0,%2 \n"
PPC405_ERR77(0,%2)
 "  stdcx.  %3,0,%2 \n\
bne-1b"
-   PPC_ACQUIRE_BARRIER
+   PPC_ATOMIC_EXIT_BARRIER
: "=" (prev), "+m" (*(volatile unsigned long *)p)
: "r" (p), "r" (val)
: "cc", "memory");
@@ -151,14 +151,14 @@ __cmpxchg_u32(volatile unsigned int *p, unsigned long 
old, unsigned long new)
unsigned int prev;
 
__asm__ __volatile__ (
-   PPC_RELEASE_BARRIER
+   PPC_ATOMIC_ENTRY_BARRIER
 "1:lwarx   %0,0,%2 # __cmpxchg_u32\n\
cmpw0,%0,%3\n\
bne-2f\n"
PPC405_ERR77(0,%2)
 "  stwcx.  %4,0,%2\n\
bne-1b"
-   PPC_ACQUIRE_BARRIER
+   PPC_ATOMIC_EXIT_BARRIER
"\n\
 2:"
: "=" (prev), "+m" (*p)
@@ -197,13 +197,13 @@ __cmpxchg_u64(volatile unsigned long *p, unsigned long 
old, unsigned long new)
unsigned long prev;
 
__asm__ __volatile__ (
-   PPC_RELEASE_BARRIER
+   PPC_ATOMIC_ENTRY_BARRIER
 "1:ldarx   %0,0,%2 # __cmpxchg_u64\n\
cmpd0,%0,%3\n\
bne-2f\n\
stdcx.  %4,0,%2\n\
bne-1b"
-   PPC_ACQUIRE_BARRIER
+   PPC_ATOMIC_EXIT_BARRIER
"\n\
 2:"
: "=" (prev), "+m" (*p)
-- 
2.5.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5 v2] dma-mapping: add generic dma_get_page_shift API

2015-10-14 Thread Christoph Hellwig
Hi Nishanth,

sorry for the late reply.

> > On Power, since it's technically variable, we'd need a function. So are
> > you suggesting define'ing it to a function just on Power and leaving it
> > a constant elsewhere?
> > 
> > I noticed that sparc has a IOMMU_PAGE_SHIFT already, fwiw.
> 
> Sorry, I should have been more specific -- I'm ready to spin out a v3,
> with a sparc-specific function.
> 
> Are you ok with leaving it a function for now (the only caller is in
> NVMe obviously).


I guess we do indeed need a function then.  I'll take a look at your
patch, but as long you found a way to avoid adding too much boilerplate
code it should be fine.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH tip/locking/core v4 0/6] atomics: powerpc: Implement relaxed/acquire/release variants of some atomics

2015-10-14 Thread Boqun Feng
Hi all,

This is v4 of the series.

Link for v1: https://lkml.org/lkml/2015/8/27/798
Link for v2: https://lkml.org/lkml/2015/9/16/527
Link for v3: https://lkml.org/lkml/2015/10/12/368

Changes since v3:

*   avoid to introduce smp_acquire_barrier__after_atomic()
(Will Deacon)

*   explain a little bit why we don't implement cmpxchg_release
with assembly code (Will Deacon)


Relaxed/acquire/release variants of atomic operations {add,sub}_return
and {cmp,}xchg are introduced by commit:

"atomics: add acquire/release/relaxed variants of some atomic operations"

and {inc,dec}_return has been introduced by commit:

"locking/asm-generic: Add _{relaxed|acquire|release}() variants for
inc/dec atomics"

Both of these are in the current locking/core branch of the tip tree.

By default, the generic code will implement a relaxed variant as a full
ordered atomic operation and release/acquire a variant as a relaxed
variant with a necessary general barrier before or after.

On powerpc, which has a weak memory order model, a relaxed variant can
be implemented more lightweightly than a full ordered one. Further more,
release and acquire variants can be implemented with arch-specific
lightweight barriers.

Besides, cmpxchg, xchg and their atomic_ versions are only RELEASE+ACQUIRE
rather that full barriers in current PPC implementation, which is
incorrect according to memory-barriers.txt.

Therefore this patchset fix the order guarantee of cmpxchg, xchg and
their atomic_ versions and implements the relaxed/acquire/release
variants based on powerpc memory model and specific barriers, Some
trivial tests for these new variants are also included in this series,
because some of these variants are not used in kernel for now, I think
is a good idea to at least generate the code for these variants
somewhere.

The patchset consists of 6 parts:

1.  Make xchg, cmpxchg and their atomic_ versions a full barrier

2.  Add trivial tests for the new variants in lib/atomic64_test.c

3.  Allow architectures to define their own __atomic_op_*() helpers
to build other variants based on relaxed.

4.  Implement atomic{,64}_{add,sub,inc,dec}_return_* variants

5.  Implement xchg_* and atomic{,64}_xchg_* variants

6.  Implement cmpxchg_* atomic{,64}_cmpxchg_* variants


This patchset is based on current locking/core branch of the tip tree
and all patches are built and boot tested for little endian pseries, and
also tested by 0day.


Looking forward to any suggestion, question and comment ;-)

Regards,
Boqun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, 1/2] scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target

2015-10-14 Thread Olof Johansson
On Tue, Oct 13, 2015 at 4:43 PM, Michael Ellerman  wrote:
> On Tue, 2015-10-13 at 14:02 -0700, Olof Johansson wrote:
>> On Fri, Oct 2, 2015 at 12:47 AM, Michael Ellerman  
>> wrote:
>> > On Wed, 2015-23-09 at 05:40:34 UTC, Michael Ellerman wrote:
>> >> Arch Makefiles can set KBUILD_DEFCONFIG to tell kbuild the name of the
>> >> defconfig that should be built by default.
>> >>
>> >> However currently there is an assumption that KBUILD_DEFCONFIG points to
>> >> a file at arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG).
>> >>
>> >> We would like to use a target, using merge_config, as our defconfig, so
>> >> adapt the logic in scripts/kconfig/Makefile to allow that.
>> >>
>> >> To minimise the chance of breaking anything, we first check if
>> >> KBUILD_DEFCONFIG is a file, and if so we do the old logic. If it's not a
>> >> file, then we call the top-level Makefile with KBUILD_DEFCONFIG as the
>> >> target.
>> >>
>> >> Signed-off-by: Michael Ellerman 
>> >> Acked-by: Michal Marek 
>> >
>> > Applied to powerpc next.
>> >
>> > https://git.kernel.org/powerpc/c/d2036f30cfe1daa19e63ce75
>>
>> This breaks arm64 defconfig for me:
>>
>> mkdir obj-tmp
>> make -f Makefile O=obj-tmp ARCH=arm64 defconfig
>> ... watch loop of:
>> *** Default configuration is based on target 'defconfig'
>>   GEN ./Makefile
>
> Crap, sorry. I knew I shouldn't have touched that code!
>
> Does this fix it for you?

Yes, it does, however:

>
> diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
> index b2b9c87..3043d6b 100644
> --- a/scripts/kconfig/Makefile
> +++ b/scripts/kconfig/Makefile
> @@ -96,7 +96,7 @@ savedefconfig: $(obj)/conf
>  defconfig: $(obj)/conf
>  ifeq ($(KBUILD_DEFCONFIG),)
> $< $(silent) --defconfig $(Kconfig)
> -else ifneq ($(wildcard arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
> +else ifneq ($(wildcard 
> $(srctree)/arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
> @$(kecho) "*** Default configuration is based on 
> '$(KBUILD_DEFCONFIG)'"
> $(Q)$< $(silent) 
> --defconfig=arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG) $(Kconfig)

Do you need a $(srctree) prefix here too? I'm not entirely sure what I
would do to reproduce a run that goes down this path so I can't
confirm.


-Olof
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/1] powerpc: Individual System V IPC system calls

2015-10-14 Thread Sam Bobroff
On Tue, Oct 13, 2015 at 08:38:42PM +1100, Michael Ellerman wrote:
> On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote:
> > This patch provides individual system call numbers for the following
> > System V IPC system calls, on PowerPC, so that they do not need to be
> > multiplexed:
> > * semop, semget, semctl, semtimedop
> > * msgsnd, msgrcv, msgget, msgctl
> > * shmat, shmdt, shmget, shmctl
> 
> You tested this right? :)  Tell me about it.

Why yes I did:

I have written a (fairly) trivial test program that calls each function in a
way that doesn't fail (but that doesn't necessarily attempt to exercise the
full functionality of it; my intent was primarily to validate the parameter
passing part as that is where most of the code change is (on the glibc side)).

I patched a local copy of glibc with the new kernel header and various tweaks
to correctly format the parameter lists for the new calls (there is actually
quite a lot of code in glibc around the IPC calls due to various compatibility
issues). I could then build a full tool chain that supported the new calls.

(This was a lot more extensive than the kernel patch but should be fairly close
to what needs to go into glibc.)

I used that tool chain to build a complete host system (using buildroot). Then
I could run the following tests:

* glibc: stock
  Host kernel: stock
  Result: success
  Notes: As expected, base case.

* glibc: stock
  Host kernel: patched
  Result: success
  Notes: As expected, the old ipc() call still exists in the patched host.

* glibc: patched
  Host kernel: stock
  Result: failure
  Notes: As expected, the test was run with a glibc that requires a patched
  kernel on an unpatched one so the syscalls are unknown.

* glibc: patched
  Host kernel: patched
  Result: success
  Notes: As expected. (Also, a bit of debug in glibc shows the new system call
  paths being followed.)

(I also re-ran the tests both for little-endian and big-endian hosts.)

It would obviously be good to have someone else test this, but I can't see a
way to make it easy to do. They would presumably have to go through all of the
above, which seems too much to ask given how trivial the kernel side of the
patch is. Still, it bothers me a bit so if there is any way please let me know.
(I thought about writing some assembly to directly test the syscall numbers but
all it would do is verify that the numbers are valid, which really isn't much
of a test.)

> Also we could make these available to SPU programs, but I don't think there's
> any point, no one's going to do a libc update for that.
> 
> cheers

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 4/6] QE/CPM: move muram management functions to qe_common

2015-10-14 Thread Zhao Qiang
QE and CPM have the same muram, they use the same management
functions. Now QE support both ARM and PowerPC, it is necessary
to move QE to "driver/soc", so move the muram management functions
from cpm_common to qe_common for preparing to move QE code to "driver/soc"

Signed-off-by: Zhao Qiang 
---
Changes for v2:
- no changes
Changes for v3:
- no changes
Changes for v4:
- no changes
Changes for v5:
- no changes
Changes for v6:
- using genalloc instead rheap to manage QE MURAM
- remove qe_reset from platform file, using 
- subsys_initcall to call qe_init function.
Changes for v7:
- move this patch from 3/3 to 2/3
- convert cpm with genalloc
- check for gen_pool allocation failure
Changes for v8:
- rebase
- move BD_SC_* macro instead of copy
Changes for v9:
- doesn't modify CPM, add a new patch to modify.
- rebase
Changes for v10:
- rebase
Changes for v11:
- remove renaming
- delete removing qe_reset and delete adding qe_init.
Changes for v12:
- SPI_FSL_CPM depends on QE-MURAM, select QUICC_ENGINE for it. 

 arch/powerpc/include/asm/cpm.h |  44 -
 arch/powerpc/include/asm/qe.h  |  16 ++
 arch/powerpc/sysdev/cpm_common.c   | 210 +
 arch/powerpc/sysdev/qe_lib/Makefile|   2 +-
 .../sysdev/{cpm_common.c => qe_lib/qe_common.c}| 188 +-
 drivers/spi/Kconfig|   1 +
 6 files changed, 28 insertions(+), 433 deletions(-)
 copy arch/powerpc/sysdev/{cpm_common.c => qe_lib/qe_common.c} (54%)

diff --git a/arch/powerpc/include/asm/cpm.h b/arch/powerpc/include/asm/cpm.h
index 0e1ac3f..05a1c15 100644
--- a/arch/powerpc/include/asm/cpm.h
+++ b/arch/powerpc/include/asm/cpm.h
@@ -155,50 +155,6 @@ typedef struct cpm_buf_desc {
  */
 #define BD_I2C_START   (0x0400)
 
-int cpm_muram_init(void);
-
-#if defined(CONFIG_CPM) || defined(CONFIG_QUICC_ENGINE)
-unsigned long cpm_muram_alloc(unsigned long size, unsigned long align);
-int cpm_muram_free(unsigned long offset);
-unsigned long cpm_muram_alloc_fixed(unsigned long offset, unsigned long size);
-unsigned long cpm_muram_alloc_common(unsigned long size, void *data);
-void __iomem *cpm_muram_addr(unsigned long offset);
-unsigned long cpm_muram_offset(void __iomem *addr);
-dma_addr_t cpm_muram_dma(void __iomem *addr);
-#else
-static inline unsigned long cpm_muram_alloc(unsigned long size,
-   unsigned long align)
-{
-   return -ENOSYS;
-}
-
-static inline int cpm_muram_free(unsigned long offset)
-{
-   return -ENOSYS;
-}
-
-static inline unsigned long cpm_muram_alloc_fixed(unsigned long offset,
- unsigned long size)
-{
-   return -ENOSYS;
-}
-
-static inline void __iomem *cpm_muram_addr(unsigned long offset)
-{
-   return NULL;
-}
-
-static inline unsigned long cpm_muram_offset(void __iomem *addr)
-{
-   return -ENOSYS;
-}
-
-static inline dma_addr_t cpm_muram_dma(void __iomem *addr)
-{
-   return 0;
-}
-#endif /* defined(CONFIG_CPM) || defined(CONFIG_QUICC_ENGINE) */
-
 #ifdef CONFIG_CPM
 int cpm_command(u32 command, u8 opcode);
 #else
diff --git a/arch/powerpc/include/asm/qe.h b/arch/powerpc/include/asm/qe.h
index 32b9bfa..c2dd8e6 100644
--- a/arch/powerpc/include/asm/qe.h
+++ b/arch/powerpc/include/asm/qe.h
@@ -16,11 +16,15 @@
 #define _ASM_POWERPC_QE_H
 #ifdef __KERNEL__
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #define QE_NUM_OF_SNUM 256 /* There are 256 serial number in QE */
 #define QE_NUM_OF_BRGS 16
@@ -92,6 +96,18 @@ extern void qe_reset(void);
 static inline void qe_reset(void) {}
 #endif
 
+int cpm_muram_init(void);
+
+#if defined(CONFIG_CPM) || defined(CONFIG_QUICC_ENGINE)
+unsigned long cpm_muram_alloc(unsigned long size, unsigned long align);
+int cpm_muram_free(unsigned long offset);
+unsigned long cpm_muram_alloc_fixed(unsigned long offset, unsigned long size);
+unsigned long cpm_muram_alloc_common(unsigned long size, void *data);
+void __iomem *cpm_muram_addr(unsigned long offset);
+unsigned long cpm_muram_offset(void __iomem *addr);
+dma_addr_t cpm_muram_dma(void __iomem *addr);
+#endif /* defined(CONFIG_CPM) || defined(CONFIG_QUICC_ENGINE) */
+
 /* QE PIO */
 #define QE_PIO_PINS 32
 
diff --git a/arch/powerpc/sysdev/cpm_common.c b/arch/powerpc/sysdev/cpm_common.c
index ff47072..6993aa8 100644
--- a/arch/powerpc/sysdev/cpm_common.c
+++ b/arch/powerpc/sysdev/cpm_common.c
@@ -17,7 +17,6 @@
  * published by the Free Software Foundation.
  */
 
-#include 
 #include 
 #include 
 #include 
@@ -29,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -65,214 +65,6 @@ void __init udbg_init_cpm(void)
 }
 #endif
 
-static struct gen_pool *muram_pool;
-static 

Re: [PATCH RESEND v3 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Peter Zijlstra
On Wed, Oct 14, 2015 at 08:51:34AM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 11:10:00AM +1100, Michael Ellerman wrote:

> > Thanks for fixing this. In future you should send a patch like this as a
> > separate patch. I've not been paying attention to it because I assumed it 
> > was
> 
> Got it. However, here is the thing, in previous version, this fix
> depends on some of other patches in this patchset. So to make this fix
> applied cleanly, I reorder my patchset to put this patch first, and the
> result is that some of other patches in this patchset depends on
> this(they need to remove code modified by this patch).
> 
> So I guess I'd better to stop Cc stable for this one, and wait until
> this patchset merged and send a separate patch for -stable tree. Does
> that work for you? I think this is what Peter want to suggests me to do
> when he asked me about this, right, Peter?

I don't think I had explicit thoughts about any of that, just that it
might make sense to have this patch not depend on the rest such that it
could indeed be stuffed into stable.

I'll leave the details up to Michael since he's PPC maintainer.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 3/6] CPM/QE: use genalloc to manage CPM/QE muram

2015-10-14 Thread Zhao Qiang
Use genalloc to manage CPM/QE muram instead of rheap.

Signed-off-by: Zhao Qiang 
---
Changes for v9:
- splitted from patch 3/5, modify cpm muram management functions.
Changes for v10:
- modify cpm muram first, then move to qe_common
- modify commit.
Changes for v11:
- factor out the common alloc code
- modify min_alloc_order to zero for cpm_muram_alloc_fixed.
Changes for v12:
- Nil 

 arch/powerpc/include/asm/cpm.h   |   1 +
 arch/powerpc/platforms/Kconfig   |   2 +-
 arch/powerpc/sysdev/cpm_common.c | 129 +++
 3 files changed, 93 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/cpm.h b/arch/powerpc/include/asm/cpm.h
index 4398a6c..0e1ac3f 100644
--- a/arch/powerpc/include/asm/cpm.h
+++ b/arch/powerpc/include/asm/cpm.h
@@ -161,6 +161,7 @@ int cpm_muram_init(void);
 unsigned long cpm_muram_alloc(unsigned long size, unsigned long align);
 int cpm_muram_free(unsigned long offset);
 unsigned long cpm_muram_alloc_fixed(unsigned long offset, unsigned long size);
+unsigned long cpm_muram_alloc_common(unsigned long size, void *data);
 void __iomem *cpm_muram_addr(unsigned long offset);
 unsigned long cpm_muram_offset(void __iomem *addr);
 dma_addr_t cpm_muram_dma(void __iomem *addr);
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index b7f9c40..01626be7 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -275,7 +275,7 @@ config TAU_AVERAGE
 config QUICC_ENGINE
bool "Freescale QUICC Engine (QE) Support"
depends on FSL_SOC && PPC32
-   select PPC_LIB_RHEAP
+   select GENERIC_ALLOCATOR
select CRC32
help
  The QUICC Engine (QE) is a new generation of communications
diff --git a/arch/powerpc/sysdev/cpm_common.c b/arch/powerpc/sysdev/cpm_common.c
index 4f78695..ff47072 100644
--- a/arch/powerpc/sysdev/cpm_common.c
+++ b/arch/powerpc/sysdev/cpm_common.c
@@ -17,6 +17,7 @@
  * published by the Free Software Foundation.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -27,7 +28,6 @@
 
 #include 
 #include 
-#include 
 #include 
 
 #include 
@@ -65,14 +65,22 @@ void __init udbg_init_cpm(void)
 }
 #endif
 
+static struct gen_pool *muram_pool;
 static spinlock_t cpm_muram_lock;
-static rh_block_t cpm_boot_muram_rh_block[16];
-static rh_info_t cpm_muram_info;
 static u8 __iomem *muram_vbase;
 static phys_addr_t muram_pbase;
 
-/* Max address size we deal with */
+struct muram_block {
+   struct list_head head;
+   unsigned long start;
+   int size;
+};
+
+static LIST_HEAD(muram_block_list);
+
+/* max address size we deal with */
 #define OF_MAX_ADDR_CELLS  4
+#define GENPOOL_OFFSET (4096 * 8)
 
 int cpm_muram_init(void)
 {
@@ -87,50 +95,52 @@ int cpm_muram_init(void)
return 0;
 
spin_lock_init(_muram_lock);
-   /* initialize the info header */
-   rh_init(_muram_info, 1,
-   sizeof(cpm_boot_muram_rh_block) /
-   sizeof(cpm_boot_muram_rh_block[0]),
-   cpm_boot_muram_rh_block);
-
np = of_find_compatible_node(NULL, NULL, "fsl,cpm-muram-data");
if (!np) {
/* try legacy bindings */
np = of_find_node_by_name(NULL, "data-only");
if (!np) {
-   printk(KERN_ERR "Cannot find CPM muram data node");
+   pr_err("Cannot find CPM muram data node");
ret = -ENODEV;
-   goto out;
+   goto out_muram;
}
}
 
+   muram_pool = gen_pool_create(0, -1);
muram_pbase = of_translate_address(np, zero);
if (muram_pbase == (phys_addr_t)OF_BAD_ADDR) {
-   printk(KERN_ERR "Cannot translate zero through CPM muram node");
+   pr_err("Cannot translate zero through CPM muram node");
ret = -ENODEV;
-   goto out;
+   goto out_pool;
}
 
while (of_address_to_resource(np, i++, ) == 0) {
if (r.end > max)
max = r.end;
+   ret = gen_pool_add(muram_pool, r.start - muram_pbase +
+  GENPOOL_OFFSET, resource_size(), -1);
+   if (ret) {
+   pr_err("QE: couldn't add muram to pool!\n");
+   goto out_pool;
+   }
 
-   rh_attach_region(_muram_info, r.start - muram_pbase,
-resource_size());
}
 
muram_vbase = ioremap(muram_pbase, max - muram_pbase + 1);
if (!muram_vbase) {
-   printk(KERN_ERR "Cannot map CPM muram");
+   pr_err("Cannot map QE muram");
ret = -ENOMEM;
+   goto out_pool;
}
-
-out:
+   goto out_muram;
+out_pool:
+   gen_pool_destroy(muram_pool);

[RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

2015-10-14 Thread Anshuman Khandual
On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
dynamic virtual-physical mapping for any given processor. Currently we
use VPHN node ID information only after getting either a PRRN or a VPHN
event. But during boot time inside the function numa_setup_cpu, we still
query the OF device tree for the node ID value which might be different
than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
scenario where there are no PRRN or VPHN event after boot, all node-cpu
mapping will remain incorrect there after.

With this proposed change, numa_setup_cpu will try to override the OF
device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
hcall fetched node ID value. Right now shared processor property of the
LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
during boot time. So initmem_init function has been moved after ppc_md.
setup_arch inside setup_arch during boot.

Signed-off-by: Anshuman Khandual 
---
Before the change:
# numactl -H
available: 2 nodes (0,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 0 size: 0 MB
node 0 free: 0 MB
node 3 cpus:
node 3 size: 16315 MB
node 3 free: 15716 MB
node distances:
node   0   3 
  0:  10  20 
  3:  20  10 
 
After the change:
# numactl -H
available: 2 nodes (0,3)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 3 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31
node 3 size: 16315 MB
node 3 free: 15537 MB
node distances:
node   0   3 
  0:  10  20 
  3:  20  10

 arch/powerpc/kernel/setup_64.c |  2 +-
 arch/powerpc/mm/numa.c | 27 ---
 2 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index bdcbb71..56026b7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -694,7 +694,6 @@ void __init setup_arch(char **cmdline_p)
exc_lvl_early_init();
emergency_stack_init();
 
-   initmem_init();
 
 #ifdef CONFIG_DUMMY_CONSOLE
conswitchp = _con;
@@ -703,6 +702,7 @@ void __init setup_arch(char **cmdline_p)
if (ppc_md.setup_arch)
ppc_md.setup_arch();
 
+   initmem_init();
paging_init();
 
/* Initialize the MMU context management stuff */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8b9502a..e404d05 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -41,6 +41,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_PPC_SPLPAR
+static int vphn_get_node(unsigned int cpu);
+#endif
+
 static int numa_enabled = 1;
 
 static char *cmdline __initdata;
@@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu)
 
nid = of_node_to_nid_single(cpu);
 
+   /*
+* Override the OF device tree fetched node number
+* with VPHN based node number in case of a shared
+* processor LPAR on PHYP platform.
+*/
+#ifdef CONFIG_PPC_SPLPAR
+   if (lppaca_shared_proc(get_lppaca())) {
+   nid = vphn_get_node(lcpu);
+   }
+#endif
+
 out_present:
if (nid < 0 || !node_online(nid))
nid = first_online_node;
@@ -1364,6 +1379,14 @@ static int update_lookup_table(void *data)
return 0;
 }
 
+static int vphn_get_node(unsigned int cpu)
+{
+   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+
+   vphn_get_associativity(cpu, associativity);
+   return associativity_to_nid(associativity);
+}
+
 /*
  * Update the node maps and sysfs entries for each cpu whose home node
  * has changed. Returns 1 when the topology has changed, and 0 otherwise.
@@ -1372,7 +1395,6 @@ int arch_update_cpu_topology(void)
 {
unsigned int cpu, sibling, changed = 0;
struct topology_update_data *updates, *ud;
-   __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
cpumask_t updated_cpus;
struct device *dev;
int weight, new_nid, i = 0;
@@ -1408,8 +1430,7 @@ int arch_update_cpu_topology(void)
}
 
/* Use associativity from first thread for all siblings */
-   vphn_get_associativity(cpu, associativity);
-   new_nid = associativity_to_nid(associativity);
+   new_nid = vphn_get_node(cpu);
if (new_nid < 0 || !node_online(new_nid))
new_nid = first_online_node;
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/1] powerpc: Individual System V IPC system calls

2015-10-14 Thread Michael Ellerman
On Wed, 2015-10-14 at 18:00 +1100, Sam Bobroff wrote:
> On Tue, Oct 13, 2015 at 08:38:42PM +1100, Michael Ellerman wrote:
> > On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote:
> > > This patch provides individual system call numbers for the following
> > > System V IPC system calls, on PowerPC, so that they do not need to be
> > > multiplexed:
> > > * semop, semget, semctl, semtimedop
> > > * msgsnd, msgrcv, msgget, msgctl
> > > * shmat, shmdt, shmget, shmctl
> > 
> > You tested this right? :)  Tell me about it.
> 
> Why yes I did:

...

> (I also re-ran the tests both for little-endian and big-endian hosts.)

Did you test on 32-bit at all?

> It would obviously be good to have someone else test this, but I can't see a
> way to make it easy to do. They would presumably have to go through all of the
> above, which seems too much to ask given how trivial the kernel side of the
> patch is. Still, it bothers me a bit so if there is any way please let me 
> know.
> (I thought about writing some assembly to directly test the syscall numbers 
> but
> all it would do is verify that the numbers are valid, which really isn't much
> of a test.)

Actually that is still a useful test, it at least tells you if the kernel
you're running on implements the syscalls. Obviously if you're on mainline
that's easy enough to work out from the git history, but if/when these get
backported to distro kernels, it's often harder to work out what's in the
source than just testing it directly.

So I wrote a quick dirty test for that, it seems to work for me:

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 847adf6e8d16..b120dc11aebe 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -12,7 +12,7 @@ CFLAGS := -Wall -O2 -flto -Wall -Werror 
-DGIT_VERSION='"$(GIT_VERSION)"' -I$(CUR
 
 export CFLAGS
 
-SUB_DIRS = pmu copyloops mm tm primitives stringloops vphn switch_endian dscr 
benchmarks
+SUB_DIRS = pmu copyloops mm tm primitives stringloops vphn switch_endian dscr 
benchmarks syscalls
 
 endif
 
diff --git a/tools/testing/selftests/powerpc/syscalls/Makefile 
b/tools/testing/selftests/powerpc/syscalls/Makefile
new file mode 100644
index ..b35c7945bec5
--- /dev/null
+++ b/tools/testing/selftests/powerpc/syscalls/Makefile
@@ -0,0 +1,12 @@
+TEST_PROGS := ipc_unmuxed
+
+CFLAGS += -I../../../../../usr/include
+
+all: $(TEST_PROGS)
+
+$(TEST_PROGS): ../harness.c
+
+include ../../lib.mk
+
+clean:
+   rm -f $(TEST_PROGS) *.o
diff --git a/tools/testing/selftests/powerpc/syscalls/ipc.h 
b/tools/testing/selftests/powerpc/syscalls/ipc.h
new file mode 100644
index ..fbebc022edf6
--- /dev/null
+++ b/tools/testing/selftests/powerpc/syscalls/ipc.h
@@ -0,0 +1,47 @@
+#ifdef __NR_semop
+DO_TEST(semop, __NR_semop)
+#endif
+
+#ifdef __NR_semget
+DO_TEST(semget, __NR_semget)
+#endif
+
+#ifdef __NR_semctl
+DO_TEST(semctl, __NR_semctl)
+#endif
+
+#ifdef __NR_semtimedop
+DO_TEST(semtimedop, __NR_semtimedop)
+#endif
+
+#ifdef __NR_msgsnd
+DO_TEST(msgsnd, __NR_msgsnd)
+#endif
+
+#ifdef __NR_msgrcv
+DO_TEST(msgrcv, __NR_msgrcv)
+#endif
+
+#ifdef __NR_msgget
+DO_TEST(msgget, __NR_msgget)
+#endif
+
+#ifdef __NR_msgctl
+DO_TEST(msgctl, __NR_msgctl)
+#endif
+
+#ifdef __NR_shmat
+DO_TEST(shmat, __NR_shmat)
+#endif
+
+#ifdef __NR_shmdt
+DO_TEST(shmdt, __NR_shmdt)
+#endif
+
+#ifdef __NR_shmget
+DO_TEST(shmget, __NR_shmget)
+#endif
+
+#ifdef __NR_shmctl
+DO_TEST(shmctl, __NR_shmctl)
+#endif
diff --git a/tools/testing/selftests/powerpc/syscalls/ipc_unmuxed.c 
b/tools/testing/selftests/powerpc/syscalls/ipc_unmuxed.c
new file mode 100644
index ..2ac02706f8c8
--- /dev/null
+++ b/tools/testing/selftests/powerpc/syscalls/ipc_unmuxed.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright 2015, Michael Ellerman, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * This test simply tests that certain syscalls are implemented. It doesn't
+ * actually exercise their logic in any way.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+
+
+#define DO_TEST(_name, _num)   \
+static int test_##_name(void)  \
+{  \
+   int rc; \
+   printf("Testing " #_name);  \
+   errno = 0;  \
+   rc = syscall(_num, -1, 0, 0, 0, 0, 0);  \
+   printf("\treturned %d, errno %d\n", rc, errno); \
+   return errno == ENOSYS; \
+}
+
+#include "ipc.h"
+#undef DO_TEST
+
+static int ipc_unmuxed(void)
+{
+   int tests_done = 0;
+
+#define DO_TEST(_name, _num)   \
+   FAIL_IF(test_##_name());\
+   tests_done++;
+

[PATCH v12 6/6] QE: Move QE from arch/powerpc to drivers/soc

2015-10-14 Thread Zhao Qiang
ls1 has qe and ls1 has arm cpu.
move qe from arch/powerpc to drivers/soc/fsl
to adapt to powerpc and arm

Signed-off-by: Zhao Qiang 
---
Changes for v2:
- move code to driver/soc
Changes for v3:
- change drivers/soc/qe to drivers/soc/fsl-qe
Changes for v4:
- move drivers/soc/fsl-qe to drivers/soc/fsl/qe
- move head files for qe from include/linux/fsl to include/soc/fsl
- move qe_ic.c to drivers/irqchip/
Changes for v5:
- update MAINTAINERS
Changes for v6:
- rebase
Changes for v7:
- move this patch from 2/3 to 3/3
Changes for v8:
- Nil 
Changes for v9:
- Nil 
Changes for v10:
- Nil 
Changes for v11:
- rebase
Changes for v12:
- Nil

 MAINTAINERS|  5 +++--
 arch/powerpc/platforms/83xx/km83xx.c   |  4 ++--
 arch/powerpc/platforms/83xx/misc.c |  2 +-
 arch/powerpc/platforms/83xx/mpc832x_mds.c  |  4 ++--
 arch/powerpc/platforms/83xx/mpc832x_rdb.c  |  4 ++--
 arch/powerpc/platforms/83xx/mpc836x_mds.c  |  4 ++--
 arch/powerpc/platforms/83xx/mpc836x_rdk.c  |  4 ++--
 arch/powerpc/platforms/85xx/common.c   |  2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c  |  2 +-
 arch/powerpc/platforms/85xx/mpc85xx_mds.c  |  4 ++--
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c  |  4 ++--
 arch/powerpc/platforms/85xx/twr_p102x.c|  4 ++--
 arch/powerpc/platforms/Kconfig | 19 ---
 arch/powerpc/sysdev/cpm_common.c   |  2 +-
 arch/powerpc/sysdev/qe_lib/Kconfig | 22 ++
 arch/powerpc/sysdev/qe_lib/Makefile|  6 +-
 arch/powerpc/sysdev/qe_lib/gpio.c  |  2 +-
 arch/powerpc/sysdev/qe_lib/qe_io.c |  2 +-
 arch/powerpc/sysdev/qe_lib/usb.c   |  4 ++--
 drivers/irqchip/Makefile   |  1 +
 .../sysdev/qe_lib => drivers/irqchip}/qe_ic.c  |  2 +-
 .../sysdev/qe_lib => drivers/irqchip}/qe_ic.h  |  4 ++--
 drivers/net/ethernet/freescale/fsl_pq_mdio.c   |  2 +-
 drivers/net/ethernet/freescale/ucc_geth.c  |  8 
 drivers/net/ethernet/freescale/ucc_geth.h  |  8 
 drivers/soc/Kconfig|  1 +
 drivers/soc/Makefile   |  1 +
 drivers/soc/fsl/Makefile   |  5 +
 .../sysdev/qe_lib => drivers/soc/fsl/qe}/Kconfig   | 17 +++--
 drivers/soc/fsl/qe/Makefile|  9 +
 .../sysdev/qe_lib => drivers/soc/fsl/qe}/qe.c  |  4 ++--
 .../qe_lib => drivers/soc/fsl/qe}/qe_common.c  |  2 +-
 .../sysdev/qe_lib => drivers/soc/fsl/qe}/ucc.c |  6 +++---
 .../qe_lib => drivers/soc/fsl/qe}/ucc_fast.c   |  8 
 .../qe_lib => drivers/soc/fsl/qe}/ucc_slow.c   |  8 
 drivers/spi/spi-fsl-cpm.c  |  2 +-
 drivers/tty/serial/ucc_uart.c  |  2 +-
 drivers/usb/gadget/udc/fsl_qe_udc.c|  2 +-
 drivers/usb/host/fhci-hcd.c|  2 +-
 drivers/usb/host/fhci-hub.c|  2 +-
 drivers/usb/host/fhci-sched.c  |  2 +-
 drivers/usb/host/fhci.h|  4 ++--
 .../include/asm => include/linux/fsl/qe}/qe_ic.h   |  0
 .../include/asm => include/soc/fsl/qe}/immap_qe.h  |  0
 .../include/asm => include/soc/fsl/qe}/qe.h|  2 +-
 .../include/asm => include/soc/fsl/qe}/ucc.h   |  4 ++--
 .../include/asm => include/soc/fsl/qe}/ucc_fast.h  |  6 +++---
 .../include/asm => include/soc/fsl/qe}/ucc_slow.h  |  6 +++---
 48 files changed, 105 insertions(+), 115 deletions(-)
 rename {arch/powerpc/sysdev/qe_lib => drivers/irqchip}/qe_ic.c (99%)
 rename {arch/powerpc/sysdev/qe_lib => drivers/irqchip}/qe_ic.h (97%)
 create mode 100644 drivers/soc/fsl/Makefile
 copy {arch/powerpc/sysdev/qe_lib => drivers/soc/fsl/qe}/Kconfig (50%)
 create mode 100644 drivers/soc/fsl/qe/Makefile
 rename {arch/powerpc/sysdev/qe_lib => drivers/soc/fsl/qe}/qe.c (99%)
 rename {arch/powerpc/sysdev/qe_lib => drivers/soc/fsl/qe}/qe_common.c (99%)
 rename {arch/powerpc/sysdev/qe_lib => drivers/soc/fsl/qe}/ucc.c (98%)
 rename {arch/powerpc/sysdev/qe_lib => drivers/soc/fsl/qe}/ucc_fast.c (98%)
 rename {arch/powerpc/sysdev/qe_lib => drivers/soc/fsl/qe}/ucc_slow.c (98%)
 rename {arch/powerpc/include/asm => include/linux/fsl/qe}/qe_ic.h (100%)
 rename {arch/powerpc/include/asm => include/soc/fsl/qe}/immap_qe.h (100%)
 rename {arch/powerpc/include/asm => include/soc/fsl/qe}/qe.h (99%)
 rename {arch/powerpc/include/asm => include/soc/fsl/qe}/ucc.h (96%)
 rename {arch/powerpc/include/asm => include/soc/fsl/qe}/ucc_fast.h (98%)
 rename {arch/powerpc/include/asm => include/soc/fsl/qe}/ucc_slow.h (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 562ae4e..c688e61 

[PATCH v12 5/6] QE: use subsys_initcall to init qe

2015-10-14 Thread Zhao Qiang
Use subsys_initcall to init qe to adapt ARM architecture.
Remove qe_reset from PowerPC platform file.

Signed-off-by: Zhao Qiang 
---
Changes for v12:
- Nil

 arch/powerpc/platforms/83xx/km83xx.c  |  2 --
 arch/powerpc/platforms/83xx/mpc832x_mds.c |  2 --
 arch/powerpc/platforms/83xx/mpc832x_rdb.c |  2 --
 arch/powerpc/platforms/83xx/mpc836x_mds.c |  2 --
 arch/powerpc/platforms/83xx/mpc836x_rdk.c |  3 ---
 arch/powerpc/platforms/85xx/common.c  |  1 -
 arch/powerpc/sysdev/qe_lib/qe.c   | 15 +++
 7 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/83xx/km83xx.c 
b/arch/powerpc/platforms/83xx/km83xx.c
index bf4c447..ae111581 100644
--- a/arch/powerpc/platforms/83xx/km83xx.c
+++ b/arch/powerpc/platforms/83xx/km83xx.c
@@ -136,8 +136,6 @@ static void __init mpc83xx_km_setup_arch(void)
mpc83xx_setup_pci();
 
 #ifdef CONFIG_QUICC_ENGINE
-   qe_reset();
-
np = of_find_node_by_name(NULL, "par_io");
if (np != NULL) {
par_io_init(np);
diff --git a/arch/powerpc/platforms/83xx/mpc832x_mds.c 
b/arch/powerpc/platforms/83xx/mpc832x_mds.c
index 8d76220..aacc43f 100644
--- a/arch/powerpc/platforms/83xx/mpc832x_mds.c
+++ b/arch/powerpc/platforms/83xx/mpc832x_mds.c
@@ -74,8 +74,6 @@ static void __init mpc832x_sys_setup_arch(void)
mpc83xx_setup_pci();
 
 #ifdef CONFIG_QUICC_ENGINE
-   qe_reset();
-
if ((np = of_find_node_by_name(NULL, "par_io")) != NULL) {
par_io_init(np);
of_node_put(np);
diff --git a/arch/powerpc/platforms/83xx/mpc832x_rdb.c 
b/arch/powerpc/platforms/83xx/mpc832x_rdb.c
index eff5baa..0c7a43e 100644
--- a/arch/powerpc/platforms/83xx/mpc832x_rdb.c
+++ b/arch/powerpc/platforms/83xx/mpc832x_rdb.c
@@ -203,8 +203,6 @@ static void __init mpc832x_rdb_setup_arch(void)
mpc83xx_setup_pci();
 
 #ifdef CONFIG_QUICC_ENGINE
-   qe_reset();
-
if ((np = of_find_node_by_name(NULL, "par_io")) != NULL) {
par_io_init(np);
of_node_put(np);
diff --git a/arch/powerpc/platforms/83xx/mpc836x_mds.c 
b/arch/powerpc/platforms/83xx/mpc836x_mds.c
index 1a26d2f..eb24abd 100644
--- a/arch/powerpc/platforms/83xx/mpc836x_mds.c
+++ b/arch/powerpc/platforms/83xx/mpc836x_mds.c
@@ -82,8 +82,6 @@ static void __init mpc836x_mds_setup_arch(void)
mpc83xx_setup_pci();
 
 #ifdef CONFIG_QUICC_ENGINE
-   qe_reset();
-
if ((np = of_find_node_by_name(NULL, "par_io")) != NULL) {
par_io_init(np);
of_node_put(np);
diff --git a/arch/powerpc/platforms/83xx/mpc836x_rdk.c 
b/arch/powerpc/platforms/83xx/mpc836x_rdk.c
index b63b42d..823e370 100644
--- a/arch/powerpc/platforms/83xx/mpc836x_rdk.c
+++ b/arch/powerpc/platforms/83xx/mpc836x_rdk.c
@@ -35,9 +35,6 @@ static void __init mpc836x_rdk_setup_arch(void)
ppc_md.progress("mpc836x_rdk_setup_arch()", 0);
 
mpc83xx_setup_pci();
-#ifdef CONFIG_QUICC_ENGINE
-   qe_reset();
-#endif
 }
 
 /*
diff --git a/arch/powerpc/platforms/85xx/common.c 
b/arch/powerpc/platforms/85xx/common.c
index 7bfb9b1..0f91edc 100644
--- a/arch/powerpc/platforms/85xx/common.c
+++ b/arch/powerpc/platforms/85xx/common.c
@@ -105,7 +105,6 @@ void __init mpc85xx_qe_init(void)
return;
}
 
-   qe_reset();
of_node_put(np);
 
 }
diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c
index c2518cd..3f9f596 100644
--- a/arch/powerpc/sysdev/qe_lib/qe.c
+++ b/arch/powerpc/sysdev/qe_lib/qe.c
@@ -671,6 +671,21 @@ unsigned int qe_get_num_of_snums(void)
 }
 EXPORT_SYMBOL(qe_get_num_of_snums);
 
+static int __init qe_init(void)
+{
+   struct device_node *np;
+
+   np = of_find_compatible_node(NULL, NULL, "fsl,qe");
+   if (!np) {
+   pr_err("%s: Could not find Quicc Engine node\n", __func__);
+   return -ENODEV;
+   }
+   qe_reset();
+   of_node_put(np);
+   return 0;
+}
+subsys_initcall(qe_init);
+
 #if defined(CONFIG_SUSPEND) && defined(CONFIG_PPC_85xx)
 static int qe_resume(struct platform_device *ofdev)
 {
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants

2015-10-14 Thread Will Deacon
On Wed, Oct 14, 2015 at 09:47:35AM +0800, Boqun Feng wrote:
> On Tue, Oct 13, 2015 at 04:04:27PM +0100, Will Deacon wrote:
> > On Tue, Oct 13, 2015 at 10:58:30PM +0800, Boqun Feng wrote:
> > > On Tue, Oct 13, 2015 at 03:43:33PM +0100, Will Deacon wrote:
> > > > Putting a barrier in the middle of that critical section is probably a
> > > > terrible idea, and that's why I thought you were avoiding it (hence my
> > > 
> > > The fact is that I haven't thought of that way to implement
> > > cmpxchg_release before you ask that question ;-) And I'm not going to do
> > > that for now and probably not in the future.
> > > 
> > > > original question). Perhaps just add a comment to that effect, since I
> > > 
> > > Are you suggesting if I put a barrier in the middle I'd better to add a
> > > comment, right? So if I don't do that, it's OK to let this patch as it.
> > 
> > No, I mean put a comment in your file to explain the reason why you
> > override _relaxed and _acquire, but not _release (because overriding
> 
> You mean overriding _acquire and fully order version, right?

Yes, my mistake. Sounds like you get my drift, though.

Will
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v12 1/6] genalloc:support memory-allocation with bytes-alignment to genalloc

2015-10-14 Thread Zhao Qiang
Bytes alignment is required to manage some special RAM,
so add gen_pool_first_fit_align to genalloc,
meanwhile add gen_pool_alloc_data to pass data to
gen_pool_first_fit_align(modify gen_pool_alloc as a wrapper)

Signed-off-by: Zhao Qiang 
---
Changes for v6:
- patches set v6 include a new patch because of using 
- genalloc to manage QE MURAM, patch 0001 is the new 
- patch, adding bytes alignment for allocation for use.
Changes for v7:
- cpm muram also need to use genalloc to manage, it has 
  a function to reserve a specific region of muram,
  add offset to genpool_data for start addr to be allocated.
Changes for v8:
- remove supporting reserving a specific region from this patch
  add a new patch to support it.
Changes for v9:
- Nil 
Changes for v10:
- Nil
Changes for v11:
- Nil
Changes for v12:
- Nil

 include/linux/genalloc.h | 24 
 lib/genalloc.c   | 58 +++-
 2 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 1ccaab4..aaf3dc2 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -30,10 +30,12 @@
 #ifndef __GENALLOC_H__
 #define __GENALLOC_H__
 
+#include 
 #include 
 
 struct device;
 struct device_node;
+struct gen_pool;
 
 /**
  * Allocation callback function type definition
@@ -47,7 +49,7 @@ typedef unsigned long (*genpool_algo_t)(unsigned long *map,
unsigned long size,
unsigned long start,
unsigned int nr,
-   void *data);
+   void *data, struct gen_pool *pool);
 
 /*
  *  General purpose special memory pool descriptor.
@@ -73,6 +75,13 @@ struct gen_pool_chunk {
unsigned long bits[0];  /* bitmap for allocating memory chunk */
 };
 
+/*
+ *  gen_pool data descriptor for gen_pool_first_fit_align.
+ */
+struct genpool_data_align {
+   int align;  /* alignment by bytes for starting address */
+};
+
 extern struct gen_pool *gen_pool_create(int, int);
 extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long);
 extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t,
@@ -96,6 +105,7 @@ static inline int gen_pool_add(struct gen_pool *pool, 
unsigned long addr,
 }
 extern void gen_pool_destroy(struct gen_pool *);
 extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
+extern unsigned long gen_pool_alloc_data(struct gen_pool *, size_t, void 
*data);
 extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
 extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
@@ -108,14 +118,20 @@ extern void gen_pool_set_algo(struct gen_pool *pool, 
genpool_algo_t algo,
void *data);
 
 extern unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
-   unsigned long start, unsigned int nr, void *data);
+   unsigned long start, unsigned int nr, void *data,
+   struct gen_pool *pool);
+
+extern unsigned long gen_pool_first_fit_align(unsigned long *map,
+   unsigned long size, unsigned long start, unsigned int nr,
+   void *data, struct gen_pool *pool);
 
 extern unsigned long gen_pool_first_fit_order_align(unsigned long *map,
unsigned long size, unsigned long start, unsigned int nr,
-   void *data);
+   void *data, struct gen_pool *pool);
 
 extern unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
-   unsigned long start, unsigned int nr, void *data);
+   unsigned long start, unsigned int nr, void *data,
+   struct gen_pool *pool);
 
 extern struct gen_pool *devm_gen_pool_create(struct device *dev,
int min_alloc_order, int nid);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index d214866..b8762b1 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -269,6 +269,24 @@ EXPORT_SYMBOL(gen_pool_destroy);
  */
 unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
 {
+   return gen_pool_alloc_data(pool, size, pool->data);
+}
+EXPORT_SYMBOL(gen_pool_alloc);
+
+/**
+ * gen_pool_alloc_data - allocate special memory from the pool
+ * @pool: pool to allocate from
+ * @size: number of bytes to allocate from the pool
+ * @data: data passed to algorithm
+ *
+ * Allocate the requested number of bytes from the specified pool.
+ * Uses the pool allocation function (with first-fit algorithm by default).
+ * Can not be used in NMI handler on architectures without
+ * NMI-safe cmpxchg implementation.
+ */
+unsigned long gen_pool_alloc_data(struct gen_pool *pool, size_t size,
+   void *data)
+{
struct gen_pool_chunk *chunk;
unsigned long addr = 0;
int order = 

[PATCH v12 2/6] genalloc:support allocating specific region

2015-10-14 Thread Zhao Qiang
Add new algo for genalloc, it reserve a specific region of
memory matching the size requirement (no alignment constraint)

Signed-off-by: Zhao Qiang 
---
Changes for v9:
- reserve a specific region, if the return region
- is not during the specific region, return fail.
Changes for v10:
- rename gen_pool_fixed_fit to gen_pool_fixed_alloc
Changes for v11:
- rename gen_pool_fixed_fit to gen_pool_fixed_alloc
Changes for v12:
- Nil

 include/linux/genalloc.h | 11 +++
 lib/genalloc.c   | 30 ++
 2 files changed, 41 insertions(+)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index aaf3dc2..56d5d96 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -82,6 +82,13 @@ struct genpool_data_align {
int align;  /* alignment by bytes for starting address */
 };
 
+/*
+ *  gen_pool data descriptor for gen_pool_fixed_alloc.
+ */
+struct genpool_data_fixed {
+   unsigned long offset;   /* The offset of the specific region */
+};
+
 extern struct gen_pool *gen_pool_create(int, int);
 extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long);
 extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t,
@@ -121,6 +128,10 @@ extern unsigned long gen_pool_first_fit(unsigned long 
*map, unsigned long size,
unsigned long start, unsigned int nr, void *data,
struct gen_pool *pool);
 
+extern unsigned long gen_pool_fixed_alloc(unsigned long *map,
+   unsigned long size, unsigned long start, unsigned int nr,
+   void *data, struct gen_pool *pool);
+
 extern unsigned long gen_pool_first_fit_align(unsigned long *map,
unsigned long size, unsigned long start, unsigned int nr,
void *data, struct gen_pool *pool);
diff --git a/lib/genalloc.c b/lib/genalloc.c
index b8762b1..1e6fde8 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -554,6 +554,36 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, 
unsigned long size,
 EXPORT_SYMBOL(gen_pool_first_fit_align);
 
 /**
+ * gen_pool_fixed_alloc - reserve a specific region of
+ * matching the size requirement (no alignment constraint)
+ * @map: The address to base the search on
+ * @size: The bitmap size in bits
+ * @start: The bitnumber to start searching at
+ * @nr: The number of zeroed bits we're looking for
+ * @data: data for alignment
+ * @pool: pool to get order from
+ */
+unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
+   unsigned long start, unsigned int nr, void *data,
+   struct gen_pool *pool)
+{
+   struct genpool_data_fixed *fixed_data;
+   int order;
+   unsigned long offset_bit;
+   unsigned long start_bit;
+
+   fixed_data = data;
+   order = pool->min_alloc_order;
+   offset_bit = fixed_data->offset >> order;
+   start_bit = bitmap_find_next_zero_area(map, size,
+   start + offset_bit, nr, 0);
+   if (start_bit != offset_bit)
+   start_bit = size;
+   return start_bit;
+}
+EXPORT_SYMBOL(gen_pool_fixed_alloc);
+
+/**
  * gen_pool_first_fit_order_align - find the first available region
  * of memory matching the size requirement. The region will be aligned
  * to the order of the size specified.
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v3 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Boqun Feng
On Wed, Oct 14, 2015 at 10:06:13AM +0200, Peter Zijlstra wrote:
> On Wed, Oct 14, 2015 at 08:51:34AM +0800, Boqun Feng wrote:
> > On Wed, Oct 14, 2015 at 11:10:00AM +1100, Michael Ellerman wrote:
> 
> > > Thanks for fixing this. In future you should send a patch like this as a
> > > separate patch. I've not been paying attention to it because I assumed it 
> > > was
> > 
> > Got it. However, here is the thing, in previous version, this fix
> > depends on some of other patches in this patchset. So to make this fix
> > applied cleanly, I reorder my patchset to put this patch first, and the
> > result is that some of other patches in this patchset depends on
> > this(they need to remove code modified by this patch).
> > 
> > So I guess I'd better to stop Cc stable for this one, and wait until
> > this patchset merged and send a separate patch for -stable tree. Does
> > that work for you? I think this is what Peter want to suggests me to do
> > when he asked me about this, right, Peter?
> 
> I don't think I had explicit thoughts about any of that, just that it
> might make sense to have this patch not depend on the rest such that it
> could indeed be stuffed into stable.
> 

Got that. Sorry for misunderstanding you...

> I'll leave the details up to Michael since he's PPC maintainer.

Michael and Peter, rest of this patchset depends on commits which are
currently in the locking/core branch of the tip, so I would like it as a
whole queued there. Besides, I will keep this patch Cc'ed to stable in
future versions, that works for you both?

Regards,
Boqun


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v3 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Peter Zijlstra
On Wed, Oct 14, 2015 at 05:26:53PM +0800, Boqun Feng wrote:
> Michael and Peter, rest of this patchset depends on commits which are
> currently in the locking/core branch of the tip, so I would like it as a
> whole queued there. Besides, I will keep this patch Cc'ed to stable in
> future versions, that works for you both?

From my POV having the Cc stable in there is fine if Michael actually
wants them to go there. GregKH will vacuum them up once they hit Linus'
tree and we don't need to think about it anymore.

Alternatively, Michael could put the patch in a separate branch and we
could both merge that.

Or even, seeing how its a single patch and git mostly does the right
thing, we could just merge it independently in both trees and let git
sort it out at merge time.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH RESEND v3 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Michael Ellerman
On Wed, 2015-10-14 at 11:33 +0200, Peter Zijlstra wrote:
> On Wed, Oct 14, 2015 at 05:26:53PM +0800, Boqun Feng wrote:
> > Michael and Peter, rest of this patchset depends on commits which are
> > currently in the locking/core branch of the tip, so I would like it as a
> > whole queued there. Besides, I will keep this patch Cc'ed to stable in
> > future versions, that works for you both?
> 
> From my POV having the Cc stable in there is fine if Michael actually
> wants them to go there. GregKH will vacuum them up once they hit Linus'
> tree and we don't need to think about it anymore.

Yeah that's fine by me. Here's an Ack if you want one:

Acked-by: Michael Ellerman 

> Alternatively, Michael could put the patch in a separate branch and we
> could both merge that.
> 
> Or even, seeing how its a single patch and git mostly does the right
> thing, we could just merge it independently in both trees and let git
> sort it out at merge time.

That probably would work, but I don't think it's necessary.

My tree doesn't get much (or any) more testing than linux-next, so as long as
locking/core is in linux-next then it will be tested just fine that way.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

2015-10-14 Thread Michael Ellerman
On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
> dynamic virtual-physical mapping for any given processor. Currently we
> use VPHN node ID information only after getting either a PRRN or a VPHN
> event. But during boot time inside the function numa_setup_cpu, we still
> query the OF device tree for the node ID value which might be different
> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
> scenario where there are no PRRN or VPHN event after boot, all node-cpu
> mapping will remain incorrect there after.
> 
> With this proposed change, numa_setup_cpu will try to override the OF
> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
> hcall fetched node ID value. Right now shared processor property of the
> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
> during boot time. So initmem_init function has been moved after ppc_md.
> setup_arch inside setup_arch during boot.

I would be *very* reluctant to change the order of initmem_init() vs
setup_arch().

At a minimum you'd need to go through every setup_arch() implementation and
carefully determine if the ordering of what it does matters vs initmem_init().
And then you'd need to test on every affected platform.

So I suggest you think of a different way to do it if at all possible.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] powerpc/xmon: Paged output for paca display

2015-10-14 Thread Michael Ellerman
On Thu, 2015-10-08 at 11:50 +1100, Sam Bobroff wrote:
> The paca display is already more than 24 lines, which can be problematic
> if you have an old school 80x24 terminal, or more likely you are on a
> virtual terminal which does not scroll for whatever reason.
> 
> This patch adds a new command ".", which takes a single (hex) numeric
> argument: lines per page. It will cause the output of "dp" and "dpa"
> to be broken into pages, if necessary.
> 
> Sample output:
> 
> 0:mon> .10

So what about making it "#" rather than "." ?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] powerpc/fsl: Add PCI node in device tree of bsc9132qds

2015-10-14 Thread Scott Wood
On Tue, 2015-10-13 at 19:29 +0800, Zhiqiang Hou wrote:
> From: Harninder Rai 
> 
> Signed-off-by: Harninder Rai 
> Signed-off-by: Minghuan Lian 
> Change-Id: I4355add4a92d1fcf514843aea5ecadd2e2517969
> Reviewed-on: http://git.am.freescale.net:8181/2454
> Reviewed-by: Zang Tiefei-R61911 
> Reviewed-by: Kushwaha Prabhakar-B32579 
> Reviewed-by: Fleming Andrew-AFLEMING 
> Tested-by: Fleming Andrew-AFLEMING 

Get rid of the gerrit stuff.  And where is your signoff?

> diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi 
> b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> index c723071..78c8f1c 100644
> --- a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> +++ b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> @@ -40,6 +40,35 @@
>   interrupts = <16 2 0 0 20 2 0 0>;
>  };
>  
> +/* controller at 0xa000 */
> + {
> + compatible = "fsl,bsc9132-pcie", "fsl,qoriq-pcie-v2.2";
> + device_type = "pci";
> + #size-cells = <2>;
> + #address-cells = <3>;
> + bus-range = <0 255>;
> + clock-frequency = <>;
> + interrupts = <16 2 0 0>;

This clock-frequency is not correct for PCIe.  Just remove it.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

devicetree and IRQ7 mapping for T1042(mpic)

2015-10-14 Thread Joakim Tjernlund
I am trying to figure out how to describe/map external IRQ7 in the devicetree.

Basically either IRQ7 to be left alone by Linux(becase u-boot already set it up)
or map IRQ7 to sie 0(MPIC_EILR7=0xf0) and prio=0xf(MPIC_EIVPR7=0x4f)

There is no need for SW handler because IRQ7 will be routed to the DDR 
controller
and case an automatic Self Refresh just before CPU reset.

I cannot figure out how to do this. Any ideas?

If not possible from devicetree, then can one do it from board code?

 Jocke
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Paul E. McKenney
On Wed, Oct 14, 2015 at 11:55:56PM +0800, Boqun Feng wrote:
> According to memory-barriers.txt, xchg, cmpxchg and their atomic{,64}_
> versions all need to imply a full barrier, however they are now just
> RELEASE+ACQUIRE, which is not a full barrier.
> 
> So replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with
> PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in
> __{cmp,}xchg_{u32,u64} respectively to guarantee a full barrier
> semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg().
> 
> This patch is a complement of commit b97021f85517 ("powerpc: Fix
> atomic_xxx_return barrier semantics").
> 
> Acked-by: Michael Ellerman 
> Cc:  # 3.4+
> Signed-off-by: Boqun Feng 
> ---
>  arch/powerpc/include/asm/cmpxchg.h | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cmpxchg.h 
> b/arch/powerpc/include/asm/cmpxchg.h
> index ad6263c..d1a8d93 100644
> --- a/arch/powerpc/include/asm/cmpxchg.h
> +++ b/arch/powerpc/include/asm/cmpxchg.h
> @@ -18,12 +18,12 @@ __xchg_u32(volatile void *p, unsigned long val)
>   unsigned long prev;
> 
>   __asm__ __volatile__(
> - PPC_RELEASE_BARRIER
> + PPC_ATOMIC_ENTRY_BARRIER

This looks to be the lwsync instruction.

>  "1:  lwarx   %0,0,%2 \n"
>   PPC405_ERR77(0,%2)
>  "stwcx.  %3,0,%2 \n\
>   bne-1b"
> - PPC_ACQUIRE_BARRIER
> + PPC_ATOMIC_EXIT_BARRIER

And this looks to be the sync instruction.

>   : "=" (prev), "+m" (*(volatile unsigned int *)p)
>   : "r" (p), "r" (val)
>   : "cc", "memory");

Hmmm...

Suppose we have something like the following, where "a" and "x" are both
initially zero:

CPU 0   CPU 1
-   -

WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
r3 = xchg(, 1);   smp_mb();
r3 = READ_ONCE(x);

If xchg() is fully ordered, we should never observe both CPUs'
r3 values being zero, correct?

And wouldn't this be represented by the following litmus test?

PPC SB+lwsync-RMW2-lwsync+st-sync-leading
""
{
0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
}
 P0 | P1 ;
 stw r1,0(r2)   | stw r1,0(r12)  ;
 lwsync | sync   ;
 lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
 stwcx. r1,r10,r12  | ;
 bne Fail0  | ;
 mr r3,r11  | ;
 Fail0: | ;
exists
(0:r3=0 /\ a=2 /\ 1:r3=0)

I left off P0's trailing sync because there is nothing for it to order
against in this particular litmus test.  I tried adding it and verified
that it has no effect.

Am I missing something here?  If not, it seems to me that you need
the leading lwsync to instead be a sync.

Of course, if I am not missing something, then this applies also to the
value-returning RMW atomic operations that you pulled this pattern from.
If so, it would seem that I didn't think through all the possibilities
back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
that I worried about the RMW atomic operation acting as a barrier,
but not as the load/store itself.  :-/

Thanx, Paul

> @@ -61,12 +61,12 @@ __xchg_u64(volatile void *p, unsigned long val)
>   unsigned long prev;
> 
>   __asm__ __volatile__(
> - PPC_RELEASE_BARRIER
> + PPC_ATOMIC_ENTRY_BARRIER
>  "1:  ldarx   %0,0,%2 \n"
>   PPC405_ERR77(0,%2)
>  "stdcx.  %3,0,%2 \n\
>   bne-1b"
> - PPC_ACQUIRE_BARRIER
> + PPC_ATOMIC_EXIT_BARRIER
>   : "=" (prev), "+m" (*(volatile unsigned long *)p)
>   : "r" (p), "r" (val)
>   : "cc", "memory");
> @@ -151,14 +151,14 @@ __cmpxchg_u32(volatile unsigned int *p, unsigned long 
> old, unsigned long new)
>   unsigned int prev;
> 
>   __asm__ __volatile__ (
> - PPC_RELEASE_BARRIER
> + PPC_ATOMIC_ENTRY_BARRIER
>  "1:  lwarx   %0,0,%2 # __cmpxchg_u32\n\
>   cmpw0,%0,%3\n\
>   bne-2f\n"
>   PPC405_ERR77(0,%2)
>  "stwcx.  %4,0,%2\n\
>   bne-1b"
> - PPC_ACQUIRE_BARRIER
> + PPC_ATOMIC_EXIT_BARRIER
>   "\n\
>  2:"
>   : "=" (prev), "+m" (*p)
> @@ -197,13 +197,13 @@ __cmpxchg_u64(volatile unsigned long *p, unsigned long 
> old, unsigned long new)
>   unsigned long prev;
> 
>   __asm__ __volatile__ (
> - PPC_RELEASE_BARRIER
> + PPC_ATOMIC_ENTRY_BARRIER
>  "1:  ldarx   %0,0,%2 # __cmpxchg_u64\n\
>   cmpd0,%0,%3\n\
>   bne-2f\n\
>   stdcx.  %4,0,%2\n\
>   bne-1b"
> - PPC_ACQUIRE_BARRIER
> + PPC_ATOMIC_EXIT_BARRIER
>   "\n\
>  2:"
>   : "=" (prev), "+m" (*p)
> -- 
> 2.5.3
> 


Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Peter Zijlstra
On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> Suppose we have something like the following, where "a" and "x" are both
> initially zero:
> 
>   CPU 0   CPU 1
>   -   -
> 
>   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
>   r3 = xchg(, 1);   smp_mb();
>   r3 = READ_ONCE(x);
> 
> If xchg() is fully ordered, we should never observe both CPUs'
> r3 values being zero, correct?
> 
> And wouldn't this be represented by the following litmus test?
> 
>   PPC SB+lwsync-RMW2-lwsync+st-sync-leading
>   ""
>   {
>   0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
>   1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
>   }
>P0 | P1 ;
>stw r1,0(r2)   | stw r1,0(r12)  ;
>lwsync | sync   ;
>lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
>stwcx. r1,r10,r12  | ;
>bne Fail0  | ;
>mr r3,r11  | ;
>Fail0: | ;
>   exists
>   (0:r3=0 /\ a=2 /\ 1:r3=0)
> 
> I left off P0's trailing sync because there is nothing for it to order
> against in this particular litmus test.  I tried adding it and verified
> that it has no effect.
> 
> Am I missing something here?  If not, it seems to me that you need
> the leading lwsync to instead be a sync.

So the scenario that would fail would be this one, right?

a = x = 0

CPU0CPU1

r3 = load_locked ();
a = 2;
sync();
r3 = x;
x = 1;
lwsync();
if (!store_cond(, 1))
goto again


Where we hoist the load way up because lwsync allows this.

I always thought this would fail because CPU1's store to @a would fail
the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
load and now seeing the new value (2).

> Of course, if I am not missing something, then this applies also to the
> value-returning RMW atomic operations that you pulled this pattern from.
> If so, it would seem that I didn't think through all the possibilities
> back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> that I worried about the RMW atomic operation acting as a barrier,
> but not as the load/store itself.  :-/

AARGH64 does something very similar; it does something like:

ll
...
sc-release

mb

Which I assumed worked for the same reason, any change to the variable
would fail the sc, and we go for round 2, now observing the new value.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Paul E. McKenney
On Wed, Oct 14, 2015 at 11:04:19PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > Suppose we have something like the following, where "a" and "x" are both
> > initially zero:
> > 
> > CPU 0   CPU 1
> > -   -
> > 
> > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > r3 = xchg(, 1);   smp_mb();
> > r3 = READ_ONCE(x);
> > 
> > If xchg() is fully ordered, we should never observe both CPUs'
> > r3 values being zero, correct?
> > 
> > And wouldn't this be represented by the following litmus test?
> > 
> > PPC SB+lwsync-RMW2-lwsync+st-sync-leading
> > ""
> > {
> > 0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> > 1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> > }
> >  P0 | P1 ;
> >  stw r1,0(r2)   | stw r1,0(r12)  ;
> >  lwsync | sync   ;
> >  lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
> >  stwcx. r1,r10,r12  | ;
> >  bne Fail0  | ;
> >  mr r3,r11  | ;
> >  Fail0: | ;
> > exists
> > (0:r3=0 /\ a=2 /\ 1:r3=0)
> > 
> > I left off P0's trailing sync because there is nothing for it to order
> > against in this particular litmus test.  I tried adding it and verified
> > that it has no effect.
> > 
> > Am I missing something here?  If not, it seems to me that you need
> > the leading lwsync to instead be a sync.
> 
> So the scenario that would fail would be this one, right?
> 
> a = x = 0
> 
>   CPU0CPU1
> 
>   r3 = load_locked ();
>   a = 2;
>   sync();
>   r3 = x;
>   x = 1;
>   lwsync();
>   if (!store_cond(, 1))
>   goto again
> 
> 
> Where we hoist the load way up because lwsync allows this.

That scenario would end up with a==1 rather than a==2.

> I always thought this would fail because CPU1's store to @a would fail
> the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> load and now seeing the new value (2).

The stwcx. failure was one thing that prevented a number of other
misordering cases.  The problem is that we have to let go of the notion
of an implicit global clock.

To that end, the herd tool can make a diagram of what it thought
happened, and I have attached it.  I used this diagram to try and force
this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
and succeeded.  Here is the sequence of events:

o   Commit P0's write.  The model offers to propagate this write
to the coherence point and to P1, but don't do so yet.

o   Commit P1's write.  Similar offers, but don't take them up yet.

o   Commit P0's lwsync.

o   Execute P0's lwarx, which reads a=0.  Then commit it.

o   Commit P0's stwcx. as successful.  This stores a=1.

o   Commit P0's branch (not taken).

o   Commit P0's final register-to-register move.

o   Commit P1's sync instruction.

o   There is now nothing that can happen in either processor.
P0 is done, and P1 is waiting for its sync.  Therefore,
propagate P1's a=2 write to the coherence point and to
the other thread.

o   There is still nothing that can happen in either processor.
So pick the barrier propagate, then the acknowledge sync.

o   P1 can now execute its read from x.  Because P0's write to
x is still waiting to propagate to P1, this still reads
x=0.  Execute and commit, and we now have both r3 registers
equal to zero and the final value a=2.

o   Clean up by propagating the write to x everywhere, and
propagating the lwsync.

And the "exists" clause really does trigger: 0:r3=0; 1:r3=0; [a]=2;

I am still not 100% confident of my litmus test.  It is quite possible
that I lost something in translation, but that is looking less likely.

> > Of course, if I am not missing something, then this applies also to the
> > value-returning RMW atomic operations that you pulled this pattern from.
> > If so, it would seem that I didn't think through all the possibilities
> > back when PPC_ATOMIC_EXIT_BARRIER moved to sync...  In fact, I believe
> > that I worried about the RMW atomic operation acting as a barrier,
> > but not as the load/store itself.  :-/
> 
> AARGH64 does something very similar; it does something like:
> 
>   ll
>   ...
>   sc-release
> 
>   mb
> 
> Which I assumed worked for the same reason, any change to the variable
> would fail the sc, and we go for round 2, now observing the new value.

I have to defer to Will on this one.  You are right that ARM and PowerPC
do have similar memory models, but there are some differences.

Thanx, Paul



Re: [PATCH v3 1/2] powerpc/xmon: Paged output for paca display

2015-10-14 Thread Sam Bobroff
On Wed, Oct 14, 2015 at 08:39:09PM +1100, Michael Ellerman wrote:
> On Thu, 2015-10-08 at 11:50 +1100, Sam Bobroff wrote:
> > The paca display is already more than 24 lines, which can be problematic
> > if you have an old school 80x24 terminal, or more likely you are on a
> > virtual terminal which does not scroll for whatever reason.
> > 
> > This patch adds a new command ".", which takes a single (hex) numeric
> > argument: lines per page. It will cause the output of "dp" and "dpa"
> > to be broken into pages, if necessary.
> > 
> > Sample output:
> > 
> > 0:mon> .10
> 
> So what about making it "#" rather than "." ?
> 
> cheers

Sure, although we'll have to do a better job than the other commands in the 
help text ;-)
(They use "#" to indicate a hex number and "##" is just going to be confusing.)

Do you want me to respin? (I'm happy for you to just adjust the patch.)

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v2,1/1] powerpc: Individual System V IPC system calls

2015-10-14 Thread Sam Bobroff
On Wed, Oct 14, 2015 at 08:38:15PM +1100, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 18:00 +1100, Sam Bobroff wrote:
> > On Tue, Oct 13, 2015 at 08:38:42PM +1100, Michael Ellerman wrote:
> > > On Tue, 2015-13-10 at 01:49:28 UTC, Sam bobroff wrote:
> > > > This patch provides individual system call numbers for the following
> > > > System V IPC system calls, on PowerPC, so that they do not need to be
> > > > multiplexed:
> > > > * semop, semget, semctl, semtimedop
> > > > * msgsnd, msgrcv, msgget, msgctl
> > > > * shmat, shmdt, shmget, shmctl
> > > 
> > > You tested this right? :)  Tell me about it.
> > 
> > Why yes I did:
> 
> ...
> 
> > (I also re-ran the tests both for little-endian and big-endian hosts.)
> 
> Did you test on 32-bit at all?

I ran the test program, compiled for 32 and 64 bit, on a biarch power7 machine
(using -m32 and -m64 to the compiler) but only to verify that the fully patched
system succeeded. Is that sufficient?

> > It would obviously be good to have someone else test this, but I can't see a
> > way to make it easy to do. They would presumably have to go through all of 
> > the
> > above, which seems too much to ask given how trivial the kernel side of the
> > patch is. Still, it bothers me a bit so if there is any way please let me 
> > know.
> > (I thought about writing some assembly to directly test the syscall numbers 
> > but
> > all it would do is verify that the numbers are valid, which really isn't 
> > much
> > of a test.)
> 
> Actually that is still a useful test, it at least tells you if the kernel
> you're running on implements the syscalls. Obviously if you're on mainline
> that's easy enough to work out from the git history, but if/when these get
> backported to distro kernels, it's often harder to work out what's in the
> source than just testing it directly.

Oh, fair enough then.

> So I wrote a quick dirty test for that, it seems to work for me:

[snip]

Thanks :-)

> Which gives:
> 
> test: ipc_unmuxed
> tags: git_version:v4.3-rc3-44-g10053fa531a8-dirty
> Testing semop returned -1, errno 22
> Testing semgetreturned -1, errno 2
> Testing semctlreturned -1, errno 22
> Testing semtimedopreturned -1, errno 22
> Testing msgsndreturned -1, errno 14
> Testing msgrcvreturned -1, errno 22
> Testing msggetreturned -1, errno 2
> Testing msgctlreturned -1, errno 22
> Testing shmat returned -1, errno 22
> Testing shmdt returned -1, errno 22
> Testing shmgetreturned -1, errno 2
> Testing shmctlreturned -1, errno 22
> success: ipc_unmuxed
> 
> 
> And on an unpatched system:
> 
> test: ipc_unmuxed
> tags: git_version:v4.3-rc3-44-g10053fa531a8-dirty
> Testing semop returned -1, errno 38
> [FAIL] Test FAILED on line 2
> failure: ipc_unmuxed
> 
> 
> Look OK?

Yep! And 38 (ENOSYS) is the code we'd expect in the failure case.

> cheers

Cheers,
Sam.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

2015-10-14 Thread Anshuman Khandual
On 10/14/2015 02:49 PM, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
>> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
>> dynamic virtual-physical mapping for any given processor. Currently we
>> use VPHN node ID information only after getting either a PRRN or a VPHN
>> event. But during boot time inside the function numa_setup_cpu, we still
>> query the OF device tree for the node ID value which might be different
>> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
>> scenario where there are no PRRN or VPHN event after boot, all node-cpu
>> mapping will remain incorrect there after.
>>
>> With this proposed change, numa_setup_cpu will try to override the OF
>> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
>> hcall fetched node ID value. Right now shared processor property of the
>> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
>> during boot time. So initmem_init function has been moved after ppc_md.
>> setup_arch inside setup_arch during boot.
> 
> I would be *very* reluctant to change the order of initmem_init() vs
> setup_arch().
> 
> At a minimum you'd need to go through every setup_arch() implementation and
> carefully determine if the ordering of what it does matters vs initmem_init().
> And then you'd need to test on every affected platform.
> 
> So I suggest you think of a different way to do it if at all possible.

vpa_init() is being called inside pSeries_setup_arch which is ppc_md
.setup_arch for the platform. Its called directly for the boot cpu
and through smp_init_pseries_xics for other cpus on the system. Not
sure what is the reason behind calling vpa_init() from XICS init
though.

If we can move all these vpa_init() calls from pSeries_setup_arch
to initmem_init just before calling numa_setup_cpu, the VPA area
would be initialized when we need it during boot. Will look in
this direction.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC v6 03/25] m68k/atari: Replace nvram_{read,write}_byte with arch_nvram_ops

2015-10-14 Thread Finn Thain

James, would you please review and ack this patch, and patch 01/25 also?

On Sun, 23 Aug 2015, Finn Thain wrote:

> By implementing an arch_nvram_ops struct, any platform can re-use the
> drivers/char/nvram module without needing any arch-specific code
> in that module. Atari does so here.
> 
> Atari has one user of nvram_check_checksum() whereas the other platforms
> (i.e. x86 and ARM platforms) have none at all. Replace this
> validate-checksum-and-read-byte sequence with the equivalent
> rtc_nvram_ops.read() call and remove the now unused functions.
> 
> Signed-off-by: Finn Thain 
> Tested-by: Christian T. Steigies 
> Acked-by: Geert Uytterhoeven 
> 
> ---
> 
> The advantage of the new ops struct over the old global nvram_* functions
> is that the misc device module can be shared by different platforms
> without requiring every platform to implement every nvram_* function.
> E.g. only RTC "CMOS" NVRAMs have a checksum for the entire NVRAM
> and only PowerPC platforms have a "sync" ioctl.
> 
> ---
>  arch/m68k/atari/nvram.c   |   89 
> --
>  drivers/scsi/atari_scsi.c |8 ++--
>  include/linux/nvram.h |9 
>  3 files changed, 70 insertions(+), 36 deletions(-)
> 
> Index: linux/arch/m68k/atari/nvram.c
> ===
> --- linux.orig/arch/m68k/atari/nvram.c2015-08-23 20:40:55.0 
> +1000
> +++ linux/arch/m68k/atari/nvram.c 2015-08-23 20:40:57.0 +1000
> @@ -38,33 +38,12 @@ unsigned char __nvram_read_byte(int i)
>   return CMOS_READ(NVRAM_FIRST_BYTE + i);
>  }
>  
> -unsigned char nvram_read_byte(int i)
> -{
> - unsigned long flags;
> - unsigned char c;
> -
> - spin_lock_irqsave(_lock, flags);
> - c = __nvram_read_byte(i);
> - spin_unlock_irqrestore(_lock, flags);
> - return c;
> -}
> -EXPORT_SYMBOL(nvram_read_byte);
> -
>  /* This races nicely with trying to read with checksum checking */
>  void __nvram_write_byte(unsigned char c, int i)
>  {
>   CMOS_WRITE(c, NVRAM_FIRST_BYTE + i);
>  }
>  
> -void nvram_write_byte(unsigned char c, int i)
> -{
> - unsigned long flags;
> -
> - spin_lock_irqsave(_lock, flags);
> - __nvram_write_byte(c, i);
> - spin_unlock_irqrestore(_lock, flags);
> -}
> -
>  /* On Ataris, the checksum is over all bytes except the checksum bytes
>   * themselves; these are at the very end.
>   */
> @@ -83,18 +62,6 @@ int __nvram_check_checksum(void)
>  (__nvram_read_byte(ATARI_CKS_LOC + 1) == (sum & 0xff));
>  }
>  
> -int nvram_check_checksum(void)
> -{
> - unsigned long flags;
> - int rv;
> -
> - spin_lock_irqsave(_lock, flags);
> - rv = __nvram_check_checksum();
> - spin_unlock_irqrestore(_lock, flags);
> - return rv;
> -}
> -EXPORT_SYMBOL(nvram_check_checksum);
> -
>  static void __nvram_set_checksum(void)
>  {
>   int i;
> @@ -106,6 +73,62 @@ static void __nvram_set_checksum(void)
>   __nvram_write_byte(sum, ATARI_CKS_LOC + 1);
>  }
>  
> +static ssize_t nvram_read(char *buf, size_t count, loff_t *ppos)
> +{
> + char *p = buf;
> + loff_t i;
> +
> + spin_lock_irq(_lock);
> +
> + if (!__nvram_check_checksum()) {
> + spin_unlock_irq(_lock);
> + return -EIO;
> + }
> +
> + for (i = *ppos; count > 0 && i < NVRAM_BYTES; --count, ++i, ++p)
> + *p = __nvram_read_byte(i);
> +
> + spin_unlock_irq(_lock);
> +
> + *ppos = i;
> + return p - buf;
> +}
> +
> +static ssize_t nvram_write(char *buf, size_t count, loff_t *ppos)
> +{
> + char *p = buf;
> + loff_t i;
> +
> + spin_lock_irq(_lock);
> +
> + if (!__nvram_check_checksum()) {
> + spin_unlock_irq(_lock);
> + return -EIO;
> + }
> +
> + for (i = *ppos; count > 0 && i < NVRAM_BYTES; --count, ++i, ++p)
> + __nvram_write_byte(*p, i);
> +
> + __nvram_set_checksum();
> +
> + spin_unlock_irq(_lock);
> +
> + *ppos = i;
> + return p - buf;
> +}
> +
> +static ssize_t nvram_get_size(void)
> +{
> + return NVRAM_BYTES;
> +}
> +
> +const struct nvram_ops arch_nvram_ops = {
> + .read   = nvram_read,
> + .write  = nvram_write,
> + .get_size   = nvram_get_size,
> +};
> +EXPORT_SYMBOL(arch_nvram_ops);
> +
>  #ifdef CONFIG_PROC_FS
>  static struct {
>   unsigned char val;
> Index: linux/drivers/scsi/atari_scsi.c
> ===
> --- linux.orig/drivers/scsi/atari_scsi.c  2015-08-23 20:40:53.0 
> +1000
> +++ linux/drivers/scsi/atari_scsi.c   2015-08-23 20:40:57.0 +1000
> @@ -880,13 +880,15 @@ static int __init atari_scsi_probe(struc
>  #ifdef CONFIG_NVRAM
>   else
>   /* Test if a host id is set in the NVRam */
> - if (ATARIHW_PRESENT(TT_CLK) && nvram_check_checksum()) {
> -

[PATCH] selftests/powerpc: Allow the tm-syscall test to build with old headers

2015-10-14 Thread Michael Ellerman
When building against older kernel headers, currently the tm-syscall
test fails to build because PPC_FEATURE2_HTM_NOSC is not defined.

Tweak the test so that if PPC_FEATURE2_HTM_NOSC is not defined it still
builds, but prints a warning at run time and marks the test as skipped.

Signed-off-by: Michael Ellerman 
---
 tools/testing/selftests/powerpc/tm/tm-syscall.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-syscall.c 
b/tools/testing/selftests/powerpc/tm/tm-syscall.c
index 1276e23da63b..e835bf7ec7ae 100644
--- a/tools/testing/selftests/powerpc/tm/tm-syscall.c
+++ b/tools/testing/selftests/powerpc/tm/tm-syscall.c
@@ -77,13 +77,23 @@ pid_t getppid_tm(bool suspend)
exit(-1);
 }
 
+static inline bool have_htm_nosc(void)
+{
+#ifdef PPC_FEATURE2_HTM_NOSC
+   return ((long)get_auxv_entry(AT_HWCAP2) & PPC_FEATURE2_HTM_NOSC);
+#else
+   printf("PPC_FEATURE2_HTM_NOSC not defined, can't check AT_HWCAP2\n");
+   return false;
+#endif
+}
+
 int tm_syscall(void)
 {
unsigned count = 0;
struct timeval end, now;
 
-   SKIP_IF(!((long)get_auxv_entry(AT_HWCAP2)
- & PPC_FEATURE2_HTM_NOSC));
+   SKIP_IF(!have_htm_nosc());
+
setbuf(stdout, NULL);
 
printf("Testing transactional syscalls for %d seconds...\n", 
TEST_DURATION);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: devicetree and IRQ7 mapping for T1042(mpic)

2015-10-14 Thread Scott Wood
On Wed, 2015-10-14 at 19:37 +, Joakim Tjernlund wrote:
> I am trying to figure out how to describe/map external IRQ7 in the 
> devicetree.
> 
> Basically either IRQ7 to be left alone by Linux(becase u-boot already set 
> it up)
> or map IRQ7 to sie 0(MPIC_EILR7=0xf0) and prio=0xf(MPIC_EIVPR7=0x4f)
> 
> There is no need for SW handler because IRQ7 will be routed to the DDR 
> controller
> and case an automatic Self Refresh just before CPU reset.
> 
> I cannot figure out how to do this. Any ideas?
> 
> If not possible from devicetree, then can one do it from board code?

The device tree describes the hardware.  Priority is configuration, and thus 
doesn't belong there.  You can call mpic_irq_set_priority() from board code.

Likewise, the fact that you want to route irq7 to sie0 is configuration, not 
hardware description.  At most, the device tree should describe is what is 
connected to each sie output.  There's no current Linux support for routing 
an interrupt to sie or anything other than "int".

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Boqun Feng
On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 14, 2015 at 11:04:19PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > > > Suppose we have something like the following, where "a" and "x" are both
> > > > initially zero:
> > > > 
> > > > CPU 0   CPU 1
> > > > -   -
> > > > 
> > > > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > > > r3 = xchg(, 1);   smp_mb();
> > > > r3 = READ_ONCE(x);
> > > > 
> > > > If xchg() is fully ordered, we should never observe both CPUs'
> > > > r3 values being zero, correct?
> > > > 
> > > > And wouldn't this be represented by the following litmus test?
> > > > 
> > > > PPC SB+lwsync-RMW2-lwsync+st-sync-leading
> > > > ""
> > > > {
> > > > 0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> > > > 1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> > > > }
> > > >  P0 | P1 ;
> > > >  stw r1,0(r2)   | stw r1,0(r12)  ;
> > > >  lwsync | sync   ;
> > > >  lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
> > > >  stwcx. r1,r10,r12  | ;
> > > >  bne Fail0  | ;
> > > >  mr r3,r11  | ;
> > > >  Fail0: | ;
> > > > exists
> > > > (0:r3=0 /\ a=2 /\ 1:r3=0)
> > > > 
> > > > I left off P0's trailing sync because there is nothing for it to order
> > > > against in this particular litmus test.  I tried adding it and verified
> > > > that it has no effect.
> > > > 
> > > > Am I missing something here?  If not, it seems to me that you need
> > > > the leading lwsync to instead be a sync.
> 
> I'm afraid more than that, the above litmus also shows that
> 

I mean there will be more things we need to fix, perhaps even smp_wmb()
need to be sync then..

Regards,
Boqun

>   CPU 0   CPU 1
>   -   -
> 
>   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
>   r3 = xchg_release(, 1);   smp_mb();
>   r3 = READ_ONCE(x);
> 
>   (0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted
> 
> in the implementation of this patchset, which should be disallowed by
> the semantics of RELEASE, right?
> 
> And even:
> 
>   CPU 0   CPU 1
>   -   -
> 
>   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
>   smp_store_release(, 1);   smp_mb();
>   r3 = READ_ONCE(x);
> 
>   (1:r3 == 0 && a == 2) is not prohibitted
> 
> shows by:
> 
>   PPC weird-lwsync
>   ""
>   {
>   0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
>   1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
>   }
>P0 | P1 ;
>stw r1,0(r2)   | stw r1,0(r12)  ;
>lwsync | sync   ;
>stw  r1,0(r12) | lwz r3,0(r2)   ;
>   exists
>   (a=2 /\ 1:r3=0)
> 
> 
> Please find something I'm (or the tool is) missing, maybe we can't use
> (a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
> 0?
> 
> And there is really something I find strange, see below.
> 
> > > 
> > > So the scenario that would fail would be this one, right?
> > > 
> > > a = x = 0
> > > 
> > >   CPU0CPU1
> > > 
> > >   r3 = load_locked ();
> > >   a = 2;
> > >   sync();
> > >   r3 = x;
> > >   x = 1;
> > >   lwsync();
> > >   if (!store_cond(, 1))
> > >   goto again
> > > 
> > > 
> > > Where we hoist the load way up because lwsync allows this.
> > 
> > That scenario would end up with a==1 rather than a==2.
> > 
> > > I always thought this would fail because CPU1's store to @a would fail
> > > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > > load and now seeing the new value (2).
> > 
> > The stwcx. failure was one thing that prevented a number of other
> > misordering cases.  The problem is that we have to let go of the notion
> > of an implicit global clock.
> > 
> > To that end, the herd tool can make a diagram of what it thought
> > happened, and I have attached it.  I used this diagram to try and force
> > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > and succeeded.  Here is the sequence of events:
> > 
> > o   Commit P0's write.  The model offers to propagate this write
> > to the coherence point and to P1, but don't do so yet.
> > 
> > o   Commit P1's write.  Similar offers, but don't take them up yet.
> > 
> > o   Commit P0's lwsync.
> > 
> > o   Execute 

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Boqun Feng
On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 14, 2015 at 11:04:19PM +0200, Peter Zijlstra wrote:
> > On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > > Suppose we have something like the following, where "a" and "x" are both
> > > initially zero:
> > > 
> > >   CPU 0   CPU 1
> > >   -   -
> > > 
> > >   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > >   r3 = xchg(, 1);   smp_mb();
> > >   r3 = READ_ONCE(x);
> > > 
> > > If xchg() is fully ordered, we should never observe both CPUs'
> > > r3 values being zero, correct?
> > > 
> > > And wouldn't this be represented by the following litmus test?
> > > 
> > >   PPC SB+lwsync-RMW2-lwsync+st-sync-leading
> > >   ""
> > >   {
> > >   0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> > >   1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> > >   }
> > >P0 | P1 ;
> > >stw r1,0(r2)   | stw r1,0(r12)  ;
> > >lwsync | sync   ;
> > >lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
> > >stwcx. r1,r10,r12  | ;
> > >bne Fail0  | ;
> > >mr r3,r11  | ;
> > >Fail0: | ;
> > >   exists
> > >   (0:r3=0 /\ a=2 /\ 1:r3=0)
> > > 
> > > I left off P0's trailing sync because there is nothing for it to order
> > > against in this particular litmus test.  I tried adding it and verified
> > > that it has no effect.
> > > 
> > > Am I missing something here?  If not, it seems to me that you need
> > > the leading lwsync to instead be a sync.

I'm afraid more than that, the above litmus also shows that

CPU 0   CPU 1
-   -

WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
r3 = xchg_release(, 1);   smp_mb();
r3 = READ_ONCE(x);

(0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted

in the implementation of this patchset, which should be disallowed by
the semantics of RELEASE, right?

And even:

CPU 0   CPU 1
-   -

WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
smp_store_release(, 1);   smp_mb();
r3 = READ_ONCE(x);

(1:r3 == 0 && a == 2) is not prohibitted

shows by:

PPC weird-lwsync
""
{
0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
}
 P0 | P1 ;
 stw r1,0(r2)   | stw r1,0(r12)  ;
 lwsync | sync   ;
 stw  r1,0(r12) | lwz r3,0(r2)   ;
exists
(a=2 /\ 1:r3=0)


Please find something I'm (or the tool is) missing, maybe we can't use
(a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
0?

And there is really something I find strange, see below.

> > 
> > So the scenario that would fail would be this one, right?
> > 
> > a = x = 0
> > 
> > CPU0CPU1
> > 
> > r3 = load_locked ();
> > a = 2;
> > sync();
> > r3 = x;
> > x = 1;
> > lwsync();
> > if (!store_cond(, 1))
> > goto again
> > 
> > 
> > Where we hoist the load way up because lwsync allows this.
> 
> That scenario would end up with a==1 rather than a==2.
> 
> > I always thought this would fail because CPU1's store to @a would fail
> > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > load and now seeing the new value (2).
> 
> The stwcx. failure was one thing that prevented a number of other
> misordering cases.  The problem is that we have to let go of the notion
> of an implicit global clock.
> 
> To that end, the herd tool can make a diagram of what it thought
> happened, and I have attached it.  I used this diagram to try and force
> this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> and succeeded.  Here is the sequence of events:
> 
> o Commit P0's write.  The model offers to propagate this write
>   to the coherence point and to P1, but don't do so yet.
> 
> o Commit P1's write.  Similar offers, but don't take them up yet.
> 
> o Commit P0's lwsync.
> 
> o Execute P0's lwarx, which reads a=0.  Then commit it.
> 
> o Commit P0's stwcx. as successful.  This stores a=1.
> 
> o Commit P0's branch (not taken).
> 

So at this point, P0's write to 'a' has propagated to P1, right? But
P0's write to 'x' hasn't, even there is a lwsync between them, right?
Doesn't the lwsync prevent this from happening?

If at this point P0's write to 'a' hasn't propagated then when?

Regards,
Boqun

> o Commit P0's final register-to-register move.
> 
> o  

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Paul E. McKenney
On Thu, Oct 15, 2015 at 11:11:01AM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> > On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> [snip]
> > > To that end, the herd tool can make a diagram of what it thought
> > > happened, and I have attached it.  I used this diagram to try and force
> > > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > > and succeeded.  Here is the sequence of events:
> > > 
> > > o Commit P0's write.  The model offers to propagate this write
> > >   to the coherence point and to P1, but don't do so yet.
> > > 
> > > o Commit P1's write.  Similar offers, but don't take them up yet.
> > > 
> > > o Commit P0's lwsync.
> > > 
> > > o Execute P0's lwarx, which reads a=0.  Then commit it.
> > > 
> > > o Commit P0's stwcx. as successful.  This stores a=1.
> > > 
> > > o Commit P0's branch (not taken).
> > > 
> > 
> > So at this point, P0's write to 'a' has propagated to P1, right? But
> > P0's write to 'x' hasn't, even there is a lwsync between them, right?
> > Doesn't the lwsync prevent this from happening?
> > 
> > If at this point P0's write to 'a' hasn't propagated then when?
> 
> Hmm.. I played around ppcmem, and figured out what happens to
> propagation of P0's write to 'a':
> 
> At this point, or some point after store 'a' to 1 and before sync on
> P1 finish, writes to 'a' reachs a coherence point which 'a' is 2, so
> P0's write to 'a' "fails" and will not propagate.
> 
> I probably misunderstood the word "propagate", which actually means an
> already coherent write gets seen by another CPU, right?

It is quite possible for a given write to take a position in the coherence
order that guarantees that no one will see it, as is the case here.
But yes, all readers will see an order of values for a given memory
location that is consistent with the coherence order.

> So my question should be:
> 
> As lwsync can order P0's write to 'a' happens after P0's write to 'x',
> why P0's write to 'x' isn't seen by P1 after P1's write to 'a' overrides
> P0's?

There is no global clock for PPC's memory model.

> But ppcmem gave me the answer ;-) lwsync won't wait under P0's write to
> 'x' gets propagated, and if P0's write to 'a' "wins" in write coherence,
> lwsync will guarantee propagation of 'x' happens before that of 'a', but
> if P0's write to 'a' "fails", there will be no propagation of 'a' from
> P0. So that lwsync can't do anything here.

I believe that this is consistent, but the corners can get tricky.

Thanx, Paul

> Regards,
> Boqun
> 
> > 
> > > o Commit P0's final register-to-register move.
> > > 
> > > o Commit P1's sync instruction.
> > > 
> > > o There is now nothing that can happen in either processor.
> > >   P0 is done, and P1 is waiting for its sync.  Therefore,
> > >   propagate P1's a=2 write to the coherence point and to
> > >   the other thread.
> > > 
> > > o There is still nothing that can happen in either processor.
> > >   So pick the barrier propagate, then the acknowledge sync.
> > > 
> > > o P1 can now execute its read from x.  Because P0's write to
> > >   x is still waiting to propagate to P1, this still reads
> > >   x=0.  Execute and commit, and we now have both r3 registers
> > >   equal to zero and the final value a=2.
> > > 
> > > o Clean up by propagating the write to x everywhere, and
> > >   propagating the lwsync.
> > > 
> > > And the "exists" clause really does trigger: 0:r3=0; 1:r3=0; [a]=2;
> > > 
> > > I am still not 100% confident of my litmus test.  It is quite possible
> > > that I lost something in translation, but that is looking less likely.
> > > 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/eeh: Fix recursive fenced PHB on Broadcom shiner adapter

2015-10-14 Thread Gavin Shan
Similar to commit b6541db ("powerpc/eeh: Block PCI config access
upon frozen PE"), this blocks the PCI config space of Broadcom
Shiner adapter until PE reset is completed, to avoid recursive
fenced PHB when dumping PCI config registers during the period
of error recovery.

   ~# lspci -ns 0003:03:00.0
   0003:03:00.0 0200: 14e4:168a (rev 10)
   ~# lspci -s 0003:03:00.0
   0003:03:00.0 Ethernet controller: Broadcom Corporation \
NetXtreme II BCM57800 1/10 Gigabit Ethernet (rev 10)

Signed-off-by: Gavin Shan 
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 3bb6acb..9819e34 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -443,11 +443,14 @@ static void *pnv_eeh_probe(struct pci_dn *pdn, void *data)
 * that PE to block its config space.
 *
 * Broadcom Austin 4-ports NICs (14e4:1657)
+* Broadcom Shiner 4-ports 1G NICs (14e4:168a)
 * Broadcom Shiner 2-ports 10G NICs (14e4:168e)
 */
if ((pdn->vendor_id == PCI_VENDOR_ID_BROADCOM &&
 pdn->device_id == 0x1657) ||
(pdn->vendor_id == PCI_VENDOR_ID_BROADCOM &&
+pdn->device_id == 0x168a) ||
+   (pdn->vendor_id == PCI_VENDOR_ID_BROADCOM &&
 pdn->device_id == 0x168e))
edev->pe->state |= EEH_PE_CFG_RESTRICTED;
 
-- 
2.1.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] powerpc/xmon: Paged output for paca display

2015-10-14 Thread Michael Ellerman
On Thu, 2015-10-15 at 09:24 +1100, Sam Bobroff wrote:
> On Wed, Oct 14, 2015 at 08:39:09PM +1100, Michael Ellerman wrote:
> > On Thu, 2015-10-08 at 11:50 +1100, Sam Bobroff wrote:
> > > The paca display is already more than 24 lines, which can be problematic
> > > if you have an old school 80x24 terminal, or more likely you are on a
> > > virtual terminal which does not scroll for whatever reason.
> > > 
> > > This patch adds a new command ".", which takes a single (hex) numeric
> > > argument: lines per page. It will cause the output of "dp" and "dpa"
> > > to be broken into pages, if necessary.
> > > 
> > > Sample output:
> > > 
> > > 0:mon> .10
> > 
> > So what about making it "#" rather than "." ?
> > 
> > cheers
> 
> Sure, although we'll have to do a better job than the other commands in the 
> help text ;-)
> (They use "#" to indicate a hex number and "##" is just going to be 
> confusing.)
> 
> Do you want me to respin? (I'm happy for you to just adjust the patch.)

Not that's fine I'll fix it up here.

I also converted some things to bool that could be, final patch is below.

cheers


diff --git a/arch/powerpc/xmon/nonstdio.c b/arch/powerpc/xmon/nonstdio.c
index c98748617896..d00123421e00 100644
--- a/arch/powerpc/xmon/nonstdio.c
+++ b/arch/powerpc/xmon/nonstdio.c
@@ -11,10 +11,25 @@
 #include 
 #include "nonstdio.h"
 
+static bool paginating, paginate_skipping;
+static unsigned long paginate_lpp; /* Lines Per Page */
+static unsigned long paginate_pos;
 
-static int xmon_write(const void *ptr, int nb)
+void xmon_start_pagination(void)
 {
-   return udbg_write(ptr, nb);
+   paginating = true;
+   paginate_skipping = false;
+   paginate_pos = 0;
+}
+
+void xmon_end_pagination(void)
+{
+   paginating = false;
+}
+
+void xmon_set_pagination_lpp(unsigned long lpp)
+{
+   paginate_lpp = lpp;
 }
 
 static int xmon_readchar(void)
@@ -24,6 +39,51 @@ static int xmon_readchar(void)
return -1;
 }
 
+static int xmon_write(const char *ptr, int nb)
+{
+   int rv = 0;
+   const char *p = ptr, *q;
+   const char msg[] = "[Hit a key (a:all, q:truncate, any:next page)]";
+
+   if (nb <= 0)
+   return rv;
+
+   if (paginating && paginate_skipping)
+   return nb;
+
+   if (paginate_lpp) {
+   while (paginating && (q = strchr(p, '\n'))) {
+   rv += udbg_write(p, q - p + 1);
+   p = q + 1;
+   paginate_pos++;
+
+   if (paginate_pos >= paginate_lpp) {
+   udbg_write(msg, strlen(msg));
+
+   switch (xmon_readchar()) {
+   case 'a':
+   paginating = false;
+   break;
+   case 'q':
+   paginate_skipping = true;
+   break;
+   default:
+   /* nothing */
+   break;
+   }
+
+   paginate_pos = 0;
+   udbg_write("\r\n", 2);
+
+   if (paginate_skipping)
+   return nb;
+   }
+   }
+   }
+
+   return rv + udbg_write(p, nb - (p - ptr));
+}
+
 int xmon_putchar(int c)
 {
char ch = c;
diff --git a/arch/powerpc/xmon/nonstdio.h b/arch/powerpc/xmon/nonstdio.h
index 18a51ded4ffd..f8653365667e 100644
--- a/arch/powerpc/xmon/nonstdio.h
+++ b/arch/powerpc/xmon/nonstdio.h
@@ -3,6 +3,9 @@
 #define printf xmon_printf
 #define putcharxmon_putchar
 
+extern void xmon_set_pagination_lpp(unsigned long lpp);
+extern void xmon_start_pagination(void);
+extern void xmon_end_pagination(void);
 extern int xmon_putchar(int c);
 extern void xmon_puts(const char *);
 extern char *xmon_gets(char *, int);
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 6ef1231c6e9c..f829baf45fd7 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -242,6 +242,9 @@ Commands:\n\
 "  u   dump TLB\n"
 #endif
 "  ?   help\n"
+#ifdef CONFIG_PPC64
+"  # n limit output to n lines per page (dump paca only)\n"
+#endif
 "  zr  reboot\n\
   zh   halt\n"
 ;
@@ -833,6 +836,16 @@ static void remove_cpu_bpts(void)
write_ciabr(0);
 }
 
+static void set_lpp_cmd(void)
+{
+   unsigned long lpp;
+
+   if (!scanhex()) {
+   printf("Invalid number.\n");
+   lpp = 0;
+   }
+   xmon_set_pagination_lpp(lpp);
+}
 /* Command interpreting routine */
 static char *last_cmd;
 
@@ -924,6 +937,9 @@ cmds(struct pt_regs *excp)
case '?':
xmon_puts(help_string);
break;
+   case '#':
+   set_lpp_cmd();
+  

RE: [PATCH 1/2] powerpc/fsl: Add PCI node in device tree of bsc9132qds

2015-10-14 Thread Hou Zhiqiang


> -Original Message-
> From: Wood Scott-B07421
> Sent: 2015年10月15日 4:36
> To: Hou Zhiqiang-B48286
> Cc: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org;
> b...@kernel.crashing.org; pau...@samba.org; m...@ellerman.id.au;
> devicet...@vger.kernel.org; robh...@kernel.org; pawel.m...@arm.com;
> mark.rutl...@arm.com; ijc+devicet...@hellion.org.uk; Rai Harninder-B01044;
> Lian Minghuan-B31939; Hu Mingkai-B21284
> Subject: Re: [PATCH 1/2] powerpc/fsl: Add PCI node in device tree of
> bsc9132qds
> 
> On Tue, 2015-10-13 at 19:29 +0800, Zhiqiang Hou wrote:
> > From: Harninder Rai 
> >
> > Signed-off-by: Harninder Rai 
> > Signed-off-by: Minghuan Lian 
> > Change-Id: I4355add4a92d1fcf514843aea5ecadd2e2517969
> > Reviewed-on: http://git.am.freescale.net:8181/2454
> > Reviewed-by: Zang Tiefei-R61911 
> > Reviewed-by: Kushwaha Prabhakar-B32579 
> > Reviewed-by: Fleming Andrew-AFLEMING 
> > Tested-by: Fleming Andrew-AFLEMING 
> 
> Get rid of the gerrit stuff.  And where is your signoff?
> 
> > diff --git a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > index c723071..78c8f1c 100644
> > --- a/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > +++ b/arch/powerpc/boot/dts/fsl/bsc9132si-post.dtsi
> > @@ -40,6 +40,35 @@
> >   interrupts = <16 2 0 0 20 2 0 0>;  };
> >
> > +/* controller at 0xa000 */
> > + {
> > + compatible = "fsl,bsc9132-pcie", "fsl,qoriq-pcie-v2.2";
> > + device_type = "pci";
> > + #size-cells = <2>;
> > + #address-cells = <3>;
> > + bus-range = <0 255>;
> > + clock-frequency = <>;
> > + interrupts = <16 2 0 0>;
> 
> This clock-frequency is not correct for PCIe.  Just remove it.
> 

Ok, thanks for your correction.
 

Thanks,
Zhiqiang

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Paul E. McKenney
On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 14, 2015 at 11:04:19PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > > > Suppose we have something like the following, where "a" and "x" are both
> > > > initially zero:
> > > > 
> > > > CPU 0   CPU 1
> > > > -   -
> > > > 
> > > > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > > > r3 = xchg(, 1);   smp_mb();
> > > > r3 = READ_ONCE(x);
> > > > 
> > > > If xchg() is fully ordered, we should never observe both CPUs'
> > > > r3 values being zero, correct?
> > > > 
> > > > And wouldn't this be represented by the following litmus test?
> > > > 
> > > > PPC SB+lwsync-RMW2-lwsync+st-sync-leading
> > > > ""
> > > > {
> > > > 0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> > > > 1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> > > > }
> > > >  P0 | P1 ;
> > > >  stw r1,0(r2)   | stw r1,0(r12)  ;
> > > >  lwsync | sync   ;
> > > >  lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
> > > >  stwcx. r1,r10,r12  | ;
> > > >  bne Fail0  | ;
> > > >  mr r3,r11  | ;
> > > >  Fail0: | ;
> > > > exists
> > > > (0:r3=0 /\ a=2 /\ 1:r3=0)
> > > > 
> > > > I left off P0's trailing sync because there is nothing for it to order
> > > > against in this particular litmus test.  I tried adding it and verified
> > > > that it has no effect.
> > > > 
> > > > Am I missing something here?  If not, it seems to me that you need
> > > > the leading lwsync to instead be a sync.
> 
> I'm afraid more than that, the above litmus also shows that
> 
>   CPU 0   CPU 1
>   -   -
> 
>   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
>   r3 = xchg_release(, 1);   smp_mb();
>   r3 = READ_ONCE(x);
> 
>   (0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted
> 
> in the implementation of this patchset, which should be disallowed by
> the semantics of RELEASE, right?

Not necessarily.  If you had the read first on CPU 1, and you had a
similar problem, I would be more worried.

> And even:
> 
>   CPU 0   CPU 1
>   -   -
> 
>   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
>   smp_store_release(, 1);   smp_mb();
>   r3 = READ_ONCE(x);
> 
>   (1:r3 == 0 && a == 2) is not prohibitted
> 
> shows by:
> 
>   PPC weird-lwsync
>   ""
>   {
>   0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
>   1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
>   }
>P0 | P1 ;
>stw r1,0(r2)   | stw r1,0(r12)  ;
>lwsync | sync   ;
>stw  r1,0(r12) | lwz r3,0(r2)   ;
>   exists
>   (a=2 /\ 1:r3=0)
> 
> Please find something I'm (or the tool is) missing, maybe we can't use
> (a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
> 0?

Again, if you were pairing the smp_store_release() with an smp_load_acquire()
or even a READ_ONCE() followed by a barrier, I would be quite concerned.
I am not at all worried about the above two litmus tests.

> And there is really something I find strange, see below.
> 
> > > 
> > > So the scenario that would fail would be this one, right?
> > > 
> > > a = x = 0
> > > 
> > >   CPU0CPU1
> > > 
> > >   r3 = load_locked ();
> > >   a = 2;
> > >   sync();
> > >   r3 = x;
> > >   x = 1;
> > >   lwsync();
> > >   if (!store_cond(, 1))
> > >   goto again
> > > 
> > > 
> > > Where we hoist the load way up because lwsync allows this.
> > 
> > That scenario would end up with a==1 rather than a==2.
> > 
> > > I always thought this would fail because CPU1's store to @a would fail
> > > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > > load and now seeing the new value (2).
> > 
> > The stwcx. failure was one thing that prevented a number of other
> > misordering cases.  The problem is that we have to let go of the notion
> > of an implicit global clock.
> > 
> > To that end, the herd tool can make a diagram of what it thought
> > happened, and I have attached it.  I used this diagram to try and force
> > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > and succeeded.  Here is the sequence of events:
> > 
> > o   Commit P0's write.  The model offers to propagate this 

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Paul E. McKenney
On Thu, Oct 15, 2015 at 09:22:26AM +0800, Boqun Feng wrote:
> On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> > On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
> > > On Wed, Oct 14, 2015 at 11:04:19PM +0200, Peter Zijlstra wrote:
> > > > On Wed, Oct 14, 2015 at 01:19:17PM -0700, Paul E. McKenney wrote:
> > > > > Suppose we have something like the following, where "a" and "x" are 
> > > > > both
> > > > > initially zero:
> > > > > 
> > > > >   CPU 0   CPU 1
> > > > >   -   -
> > > > > 
> > > > >   WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > > > >   r3 = xchg(, 1);   smp_mb();
> > > > >   r3 = READ_ONCE(x);
> > > > > 
> > > > > If xchg() is fully ordered, we should never observe both CPUs'
> > > > > r3 values being zero, correct?
> > > > > 
> > > > > And wouldn't this be represented by the following litmus test?
> > > > > 
> > > > >   PPC SB+lwsync-RMW2-lwsync+st-sync-leading
> > > > >   ""
> > > > >   {
> > > > >   0:r1=1; 0:r2=x; 0:r3=3; 0:r10=0 ; 0:r11=0; 0:r12=a;
> > > > >   1:r1=2; 1:r2=x; 1:r3=3; 1:r10=0 ; 1:r11=0; 1:r12=a;
> > > > >   }
> > > > >P0 | P1 ;
> > > > >stw r1,0(r2)   | stw r1,0(r12)  ;
> > > > >lwsync | sync   ;
> > > > >lwarx  r11,r10,r12 | lwz r3,0(r2)   ;
> > > > >stwcx. r1,r10,r12  | ;
> > > > >bne Fail0  | ;
> > > > >mr r3,r11  | ;
> > > > >Fail0: | ;
> > > > >   exists
> > > > >   (0:r3=0 /\ a=2 /\ 1:r3=0)
> > > > > 
> > > > > I left off P0's trailing sync because there is nothing for it to order
> > > > > against in this particular litmus test.  I tried adding it and 
> > > > > verified
> > > > > that it has no effect.
> > > > > 
> > > > > Am I missing something here?  If not, it seems to me that you need
> > > > > the leading lwsync to instead be a sync.
> > 
> > I'm afraid more than that, the above litmus also shows that
> 
> I mean there will be more things we need to fix, perhaps even smp_wmb()
> need to be sync then..

That should not be necessary.  For smp_wmb(), lwsync should be just fine.

Thanx, Paul

> Regards,
> Boqun
> 
> > CPU 0   CPU 1
> > -   -
> > 
> > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > r3 = xchg_release(, 1);   smp_mb();
> > r3 = READ_ONCE(x);
> > 
> > (0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted
> > 
> > in the implementation of this patchset, which should be disallowed by
> > the semantics of RELEASE, right?
> > 
> > And even:
> > 
> > CPU 0   CPU 1
> > -   -
> > 
> > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > smp_store_release(, 1);   smp_mb();
> > r3 = READ_ONCE(x);
> > 
> > (1:r3 == 0 && a == 2) is not prohibitted
> > 
> > shows by:
> > 
> > PPC weird-lwsync
> > ""
> > {
> > 0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
> > 1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
> > }
> >  P0 | P1 ;
> >  stw r1,0(r2)   | stw r1,0(r12)  ;
> >  lwsync | sync   ;
> >  stw  r1,0(r12) | lwz r3,0(r2)   ;
> > exists
> > (a=2 /\ 1:r3=0)
> > 
> > 
> > Please find something I'm (or the tool is) missing, maybe we can't use
> > (a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
> > 0?
> > 
> > And there is really something I find strange, see below.
> > 
> > > > 
> > > > So the scenario that would fail would be this one, right?
> > > > 
> > > > a = x = 0
> > > > 
> > > > CPU0CPU1
> > > > 
> > > > r3 = load_locked ();
> > > > a = 2;
> > > > sync();
> > > > r3 = x;
> > > > x = 1;
> > > > lwsync();
> > > > if (!store_cond(, 1))
> > > > goto again
> > > > 
> > > > 
> > > > Where we hoist the load way up because lwsync allows this.
> > > 
> > > That scenario would end up with a==1 rather than a==2.
> > > 
> > > > I always thought this would fail because CPU1's store to @a would fail
> > > > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > > > load and now seeing the new value (2).
> > > 
> > > The stwcx. failure was one thing that prevented a number of other
> > > misordering cases.  The problem is that we have to let go of the notion
> > > of an implicit global clock.
> > > 
> > > To that end, the herd tool can make a diagram of what it thought
> > > happened, and I have 

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Boqun Feng
Hi Paul,

On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
> On Wed, Oct 14, 2015 at 02:44:53PM -0700, Paul E. McKenney wrote:
[snip]
> > To that end, the herd tool can make a diagram of what it thought
> > happened, and I have attached it.  I used this diagram to try and force
> > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > and succeeded.  Here is the sequence of events:
> > 
> > o   Commit P0's write.  The model offers to propagate this write
> > to the coherence point and to P1, but don't do so yet.
> > 
> > o   Commit P1's write.  Similar offers, but don't take them up yet.
> > 
> > o   Commit P0's lwsync.
> > 
> > o   Execute P0's lwarx, which reads a=0.  Then commit it.
> > 
> > o   Commit P0's stwcx. as successful.  This stores a=1.
> > 
> > o   Commit P0's branch (not taken).
> > 
> 
> So at this point, P0's write to 'a' has propagated to P1, right? But
> P0's write to 'x' hasn't, even there is a lwsync between them, right?
> Doesn't the lwsync prevent this from happening?
> 
> If at this point P0's write to 'a' hasn't propagated then when?
> 

Hmm.. I played around ppcmem, and figured out what happens to
propagation of P0's write to 'a':

At this point, or some point after store 'a' to 1 and before sync on
P1 finish, writes to 'a' reachs a coherence point which 'a' is 2, so
P0's write to 'a' "fails" and will not propagate.


I probably misunderstood the word "propagate", which actually means an
already coherent write gets seen by another CPU, right?

So my question should be:

As lwsync can order P0's write to 'a' happens after P0's write to 'x',
why P0's write to 'x' isn't seen by P1 after P1's write to 'a' overrides
P0's?

But ppcmem gave me the answer ;-) lwsync won't wait under P0's write to
'x' gets propagated, and if P0's write to 'a' "wins" in write coherence,
lwsync will guarantee propagation of 'x' happens before that of 'a', but
if P0's write to 'a' "fails", there will be no propagation of 'a' from
P0. So that lwsync can't do anything here.

Regards,
Boqun

> 
> > o   Commit P0's final register-to-register move.
> > 
> > o   Commit P1's sync instruction.
> > 
> > o   There is now nothing that can happen in either processor.
> > P0 is done, and P1 is waiting for its sync.  Therefore,
> > propagate P1's a=2 write to the coherence point and to
> > the other thread.
> > 
> > o   There is still nothing that can happen in either processor.
> > So pick the barrier propagate, then the acknowledge sync.
> > 
> > o   P1 can now execute its read from x.  Because P0's write to
> > x is still waiting to propagate to P1, this still reads
> > x=0.  Execute and commit, and we now have both r3 registers
> > equal to zero and the final value a=2.
> > 
> > o   Clean up by propagating the write to x everywhere, and
> > propagating the lwsync.
> > 
> > And the "exists" clause really does trigger: 0:r3=0; 1:r3=0; [a]=2;
> > 
> > I am still not 100% confident of my litmus test.  It is quite possible
> > that I lost something in translation, but that is looking less likely.
> > 


signature.asc
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC, 1/2] scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target

2015-10-14 Thread Michael Ellerman
On Wed, 2015-10-14 at 09:54 -0700, Olof Johansson wrote:
> On Tue, Oct 13, 2015 at 4:43 PM, Michael Ellerman  wrote:
> > On Tue, 2015-10-13 at 14:02 -0700, Olof Johansson wrote:
> >> On Fri, Oct 2, 2015 at 12:47 AM, Michael Ellerman  
> >> wrote:
> >> > On Wed, 2015-23-09 at 05:40:34 UTC, Michael Ellerman wrote:
> >> >> Arch Makefiles can set KBUILD_DEFCONFIG to tell kbuild the name of the
> >> >> defconfig that should be built by default.
> >> >>
> >> >> However currently there is an assumption that KBUILD_DEFCONFIG points to
> >> >> a file at arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG).
> >> >>
> >> >> We would like to use a target, using merge_config, as our defconfig, so
> >> >> adapt the logic in scripts/kconfig/Makefile to allow that.
> >> >>
> >> >> To minimise the chance of breaking anything, we first check if
> >> >> KBUILD_DEFCONFIG is a file, and if so we do the old logic. If it's not a
> >> >> file, then we call the top-level Makefile with KBUILD_DEFCONFIG as the
> >> >> target.
> >> >>
> >> >> Signed-off-by: Michael Ellerman 
> >> >> Acked-by: Michal Marek 
> >> >
> >> > Applied to powerpc next.
> >> >
> >> > https://git.kernel.org/powerpc/c/d2036f30cfe1daa19e63ce75
> >>
> >> This breaks arm64 defconfig for me:
> >>
> >> mkdir obj-tmp
> >> make -f Makefile O=obj-tmp ARCH=arm64 defconfig
> >> ... watch loop of:
> >> *** Default configuration is based on target 'defconfig'
> >>   GEN ./Makefile
> >
> > Crap, sorry. I knew I shouldn't have touched that code!
> >
> > Does this fix it for you?
> 
> Yes, it does, however:
> 
> > diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
> > index b2b9c87..3043d6b 100644
> > --- a/scripts/kconfig/Makefile
> > +++ b/scripts/kconfig/Makefile
> > @@ -96,7 +96,7 @@ savedefconfig: $(obj)/conf
> >  defconfig: $(obj)/conf
> >  ifeq ($(KBUILD_DEFCONFIG),)
> > $< $(silent) --defconfig $(Kconfig)
> > -else ifneq ($(wildcard arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
> > +else ifneq ($(wildcard 
> > $(srctree)/arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)),)
> > @$(kecho) "*** Default configuration is based on 
> > '$(KBUILD_DEFCONFIG)'"
> > $(Q)$< $(silent) 
> > --defconfig=arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG) $(Kconfig)
> 
> Do you need a $(srctree) prefix here too? I'm not entirely sure what I
> would do to reproduce a run that goes down this path so I can't
> confirm.

That is the path you're going down, now that it's fixed. That's the path where
KBUILD_DEFCONFIG is a real file, ie. the old behaviour.

I'm not sure why it doesn't have a $(srctree) there, but it's never had one.

It looks like it eventually boils down to zconf_fopen() which looks for the
file in both .  and $(srctree).

So I think we could add a $(srctree) there, it would be more obvious and not
rely on the zconf_fopen() behaviour, but I'd rather leave it as is and let
Michal do that as a cleanup later.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: devicetree and IRQ7 mapping for T1042(mpic)

2015-10-14 Thread Scott Wood
On Wed, 2015-10-14 at 19:11 -0500, Scott Wood wrote:
> On Wed, 2015-10-14 at 19:37 +, Joakim Tjernlund wrote:
> > I am trying to figure out how to describe/map external IRQ7 in the 
> > devicetree.
> > 
> > Basically either IRQ7 to be left alone by Linux(becase u-boot already set 
> > it up)
> > or map IRQ7 to sie 0(MPIC_EILR7=0xf0) and prio=0xf(MPIC_EIVPR7=0x4f)
> > 
> > There is no need for SW handler because IRQ7 will be routed to the DDR 
> > controller
> > and case an automatic Self Refresh just before CPU reset.
> > 
> > I cannot figure out how to do this. Any ideas?
> > 
> > If not possible from devicetree, then can one do it from board code?
> 
> The device tree describes the hardware.  Priority is configuration, and 
> thus 
> doesn't belong there.  You can call mpic_irq_set_priority() from board code.
> 
> Likewise, the fact that you want to route irq7 to sie0 is configuration, 
> not 
> hardware description.  At most, the device tree should describe is what is 
> connected to each sie output.  There's no current Linux support for routing 
> an interrupt to sie or anything other than "int".

BTW, priority is meaningless for interrupts routed to sie.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

2015-10-14 Thread Boqun Feng
On Wed, Oct 14, 2015 at 08:07:05PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
[snip]
> > 
> > I'm afraid more than that, the above litmus also shows that
> > 
> > CPU 0   CPU 1
> > -   -
> > 
> > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > r3 = xchg_release(, 1);   smp_mb();
> > r3 = READ_ONCE(x);
> > 
> > (0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted
> > 
> > in the implementation of this patchset, which should be disallowed by
> > the semantics of RELEASE, right?
> 
> Not necessarily.  If you had the read first on CPU 1, and you had a
> similar problem, I would be more worried.
> 

Sometimes I think maybe we should say that a single unpaired ACQUIRE or
RELEASE doesn't have any order guarantee because of the above case.

But seems that's not a normal or even existing case, my bad ;-(

> > And even:
> > 
> > CPU 0   CPU 1
> > -   -
> > 
> > WRITE_ONCE(x, 1);   WRITE_ONCE(a, 2);
> > smp_store_release(, 1);   smp_mb();
> > r3 = READ_ONCE(x);
> > 
> > (1:r3 == 0 && a == 2) is not prohibitted
> > 
> > shows by:
> > 
> > PPC weird-lwsync
> > ""
> > {
> > 0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
> > 1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
> > }
> >  P0 | P1 ;
> >  stw r1,0(r2)   | stw r1,0(r12)  ;
> >  lwsync | sync   ;
> >  stw  r1,0(r12) | lwz r3,0(r2)   ;
> > exists
> > (a=2 /\ 1:r3=0)
> > 
> > Please find something I'm (or the tool is) missing, maybe we can't use
> > (a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
> > 0?
> 
> Again, if you were pairing the smp_store_release() with an smp_load_acquire()
> or even a READ_ONCE() followed by a barrier, I would be quite concerned.
> I am not at all worried about the above two litmus tests.
> 

Understood, thank you for think through that ;-)

> > And there is really something I find strange, see below.
> > 
> > > > 
> > > > So the scenario that would fail would be this one, right?
> > > > 
> > > > a = x = 0
> > > > 
> > > > CPU0CPU1
> > > > 
> > > > r3 = load_locked ();
> > > > a = 2;
> > > > sync();
> > > > r3 = x;
> > > > x = 1;
> > > > lwsync();
> > > > if (!store_cond(, 1))
> > > > goto again
> > > > 
> > > > 
> > > > Where we hoist the load way up because lwsync allows this.
> > > 
> > > That scenario would end up with a==1 rather than a==2.
> > > 
> > > > I always thought this would fail because CPU1's store to @a would fail
> > > > the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
> > > > load and now seeing the new value (2).
> > > 
> > > The stwcx. failure was one thing that prevented a number of other
> > > misordering cases.  The problem is that we have to let go of the notion
> > > of an implicit global clock.
> > > 
> > > To that end, the herd tool can make a diagram of what it thought
> > > happened, and I have attached it.  I used this diagram to try and force
> > > this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
> > > and succeeded.  Here is the sequence of events:
> > > 
> > > o Commit P0's write.  The model offers to propagate this write
> > >   to the coherence point and to P1, but don't do so yet.
> > > 
> > > o Commit P1's write.  Similar offers, but don't take them up yet.
> > > 
> > > o Commit P0's lwsync.
> > > 
> > > o Execute P0's lwarx, which reads a=0.  Then commit it.
> > > 
> > > o Commit P0's stwcx. as successful.  This stores a=1.
> > > 
> > > o Commit P0's branch (not taken).
> > 
> > So at this point, P0's write to 'a' has propagated to P1, right? But
> > P0's write to 'x' hasn't, even there is a lwsync between them, right?
> > Doesn't the lwsync prevent this from happening?
> 
> No, because lwsync is quite a bit weaker than sync aside from just
> the store-load ordering.
> 

Understood, I've tried the ppcmem, much clear now ;-)

> > If at this point P0's write to 'a' hasn't propagated then when?
> 
> Later.  At the very end of the test, in this case.  ;-)
> 

Hmm.. I tried exactly this sequence in ppcmem, seems propagation of P0's
write to 'a' is never an option...

> Why not try creating a longer litmus test that requires P0's write to
> "a" to propagate to P1 before both processes complete?
> 

I will try to write one, but to be clear, you mean we still observe 

0:r3 == 0 && a == 2 && 1:r3 == 0 

at the end, right? Because I understand that if P1's write to 'a'
doesn't override P0's, P0's write to 'a' will propagate.

Regards,
Boqun


signature.asc

Re: [PATCH] powerpc/e6500: add TMCFG0 register definition

2015-10-14 Thread Paul Mackerras
On Wed, Sep 23, 2015 at 06:06:22PM +0300, Laurentiu Tudor wrote:
> The register is not currently used in the base kernel
> but will be in a forthcoming kvm patch.
> 
> Signed-off-by: Laurentiu Tudor 

Thanks, applied to my kvm-ppc-next branch.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 15/19] KVM: PPC: e500: fix handling local_sid_lookup result

2015-10-14 Thread Paul Mackerras
On Thu, Sep 24, 2015 at 04:00:23PM +0200, Andrzej Hajda wrote:
> The function can return negative value.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
> 
> Signed-off-by: Andrzej Hajda 

Thanks, applied to my kvm-ppc-next branch.

Paul.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc/mpc5xxx: Avoid dereferencing potentially freed memory

2015-10-14 Thread Christophe JAILLET
Use 'of_property_read_u32()' instead of 'of_get_property()'+pointer
dereference in order to avoid access to potentially freed memory.

Use 'of_get_next_parent()' to simplify the while() loop and avoid the
need of a temp variable.

Signed-off-by: Christophe JAILLET 
---
v2: Use of_property_read_u32 instead of of_get_property+pointer dereference
*** Untested ***
---
 arch/powerpc/sysdev/mpc5xxx_clocks.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/sysdev/mpc5xxx_clocks.c 
b/arch/powerpc/sysdev/mpc5xxx_clocks.c
index f4f0301..92fbcf8 100644
--- a/arch/powerpc/sysdev/mpc5xxx_clocks.c
+++ b/arch/powerpc/sysdev/mpc5xxx_clocks.c
@@ -13,21 +13,17 @@
 
 unsigned long mpc5xxx_get_bus_frequency(struct device_node *node)
 {
-   struct device_node *np;
-   const unsigned int *p_bus_freq = NULL;
+   u32 bus_freq = 0;
 
of_node_get(node);
while (node) {
-   p_bus_freq = of_get_property(node, "bus-frequency", NULL);
-   if (p_bus_freq)
+   if (!of_property_read_u32(node, "bus-frequency", _freq))
break;
 
-   np = of_get_parent(node);
-   of_node_put(node);
-   node = np;
+   node = of_get_next_parent(node);
}
of_node_put(node);
 
-   return p_bus_freq ? *p_bus_freq : 0;
+   return bus_freq;
 }
 EXPORT_SYMBOL(mpc5xxx_get_bus_frequency);
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev