[PATCH] docs: submitting-patches - change non-ascii character to ascii

2017-07-20 Thread frowand . list
From: Frank Rowand 

Documentation/process/submitting-patches.rst contains a non-ascii
character.  Change it to the ascii equivalent.

Signed-off-by: Frank Rowand 
---
 Documentation/process/submitting-patches.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/process/submitting-patches.rst 
b/Documentation/process/submitting-patches.rst
index 3e10719fee35..733478ade91b 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -413,7 +413,7 @@ e-mail discussions.
 
 
 
-11) Sign your work — the Developer's Certificate of Origin
+11) Sign your work - the Developer's Certificate of Origin
 --
 
 To improve tracking of who did what, especially with patches that can
-- 
Frank Rowand 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Boqun Feng
On Thu, Jul 20, 2017 at 04:07:14PM -0700, Paul E. McKenney wrote:
[...]
> > 
> > So if I respin the patch with the extern, would you still feel reluctant?
> 
> Yes, because I am not seeing how this change helps.  What is this telling
> the reader that the original did not, and how does it help the reader
> generate good concurrent code?
> 

One thing I think we probably should do is to make READ_ONCE() semantics
more clear, i.e. call it out that in our conceptual model for kernel
programming we always rely on the compiler to be serious about the
return value of READ_ONCE(). I didn't find the comment before
READ_ONCE() or memory-barriers.txt talking about something similar.

Or am I the only one having this kinda semantics guarantee in mind?

Regards,
Boqun

>   Thanx, Paul
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Paul E. McKenney
On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
> > On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:
> >> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
> >>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
>  On 2017/07/20 14:47, Paul E. McKenney wrote:
> > On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
> >> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
> >>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
>  On 2017/07/20 2:43, Paul E. McKenney wrote:
> > On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
> >> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 
> >> >2001
> >> From: Akira Yokosawa 
> >> Date: Mon, 17 Jul 2017 16:25:33 +0900
> >> Subject: [PATCH] documentation: Fix two-CPU control-dependency 
> >> example
> >>
> >> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
> >> no-transitivity example"), the operator in "if" statement of
> >> the two-CPU example was modified from ">=" to ">".
> >> Now the example misses the point because there is no party
> >> who will modify "x" nor "y". So each CPU performs only the
> >> READ_ONCE().
> >>
> >> The point of this example is to use control dependency for 
> >> ordering,
> >> and the WRITE_ONCE() should always be executed.
> >>
> >> So it was correct prior to the above mentioned commit. Partial
> >> revert of the commit (with context adjustments regarding other
> >> changes thereafter) restores the point.
> >>
> >> Note that the three-CPU example demonstrating the lack of
> >> transitivity stands regardless of this partial revert.
> >>
> >> Signed-off-by: Akira Yokosawa 
> >
> > Hello, Akira,
> >
> > You are quite right that many compilers would generate 
> > straightforward
> > code for the code fragment below, and in that case, the assertion 
> > could
> > never trigger due to either TSO or control dependencies, depending 
> > on
> > the architecture this was running on.
> >
> > However, if the compiler was too smart for our good, it could figure
> > out that "x" and "y" only take on the values zero and one, so that
> > the "if" would always be taken.  At that point, the compiler could
> > simply ignore the "if" with the result shown below.
> >
> >> ---
> >>  Documentation/memory-barriers.txt | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/memory-barriers.txt 
> >> b/Documentation/memory-barriers.txt
> >> index c4ddfcd..c1ebe99 100644
> >> --- a/Documentation/memory-barriers.txt
> >> +++ b/Documentation/memory-barriers.txt
> >> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the 
> >> initial values of
> >>CPU 0 CPU 1
> >>===   ===
> >>r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> >> -  if (r1 > 0)   if (r2 > 0)
> >> +  if (r1 >= 0)  if (r2 >= 0)
> >>  WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>
> >>assert(!(r1 == 1 && r2 == 1));
> >
> > Original program:
> >
> > CPU 0 CPU 1
> > ===   ===
> > r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> > if (r1 >= 0)  if (r2 >= 0)
> >   WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >
> > assert(!(r1 == 1 && r2 == 1));
> >
> > Enthusiastically optimized program:
> >
> > CPU 0 CPU 1
> > ===   ===
> > r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> > WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >
> > assert(!(r1 == 1 && r2 == 1));
> >
> > This optimized version could trigger the assertion.
> >
> > So we do need to stick with the ">" comparison.
> 
>  Well but,
> 
>  Current example:
> 
>   CPU 0 CPU 1
>   ===   ===
>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>   if (r1 > 0)   if (r2 > 0)
> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> 
>   

Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Akira Yokosawa
On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
> On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:
>> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
>>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
 On 2017/07/20 14:47, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
 On 2017/07/20 2:43, Paul E. McKenney wrote:
> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 
>> >2001
>> From: Akira Yokosawa 
>> Date: Mon, 17 Jul 2017 16:25:33 +0900
>> Subject: [PATCH] documentation: Fix two-CPU control-dependency 
>> example
>>
>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
>> no-transitivity example"), the operator in "if" statement of
>> the two-CPU example was modified from ">=" to ">".
>> Now the example misses the point because there is no party
>> who will modify "x" nor "y". So each CPU performs only the
>> READ_ONCE().
>>
>> The point of this example is to use control dependency for ordering,
>> and the WRITE_ONCE() should always be executed.
>>
>> So it was correct prior to the above mentioned commit. Partial
>> revert of the commit (with context adjustments regarding other
>> changes thereafter) restores the point.
>>
>> Note that the three-CPU example demonstrating the lack of
>> transitivity stands regardless of this partial revert.
>>
>> Signed-off-by: Akira Yokosawa 
>
> Hello, Akira,
>
> You are quite right that many compilers would generate straightforward
> code for the code fragment below, and in that case, the assertion 
> could
> never trigger due to either TSO or control dependencies, depending on
> the architecture this was running on.
>
> However, if the compiler was too smart for our good, it could figure
> out that "x" and "y" only take on the values zero and one, so that
> the "if" would always be taken.  At that point, the compiler could
> simply ignore the "if" with the result shown below.
>
>> ---
>>  Documentation/memory-barriers.txt | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/Documentation/memory-barriers.txt 
>> b/Documentation/memory-barriers.txt
>> index c4ddfcd..c1ebe99 100644
>> --- a/Documentation/memory-barriers.txt
>> +++ b/Documentation/memory-barriers.txt
>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the 
>> initial values of
>>  CPU 0 CPU 1
>>  ===   ===
>>  r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>> -if (r1 > 0)   if (r2 > 0)
>> +if (r1 >= 0)  if (r2 >= 0)
>>WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>>
>>  assert(!(r1 == 1 && r2 == 1));
>
> Original program:
>
>   CPU 0 CPU 1
>   ===   ===
>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>   if (r1 >= 0)  if (r2 >= 0)
> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>
>   assert(!(r1 == 1 && r2 == 1));
>
> Enthusiastically optimized program:
>
>   CPU 0 CPU 1
>   ===   ===
>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>   WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>
>   assert(!(r1 == 1 && r2 == 1));
>
> This optimized version could trigger the assertion.
>
> So we do need to stick with the ">" comparison.

 Well but,

 Current example:

CPU 0 CPU 1
===   ===
r1 = READ_ONCE(x);r2 = READ_ONCE(y);
if (r1 > 0)   if (r2 > 0)
  WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);

assert(!(r1 == 1 && r2 == 1));

 Such a clever compiler might be able to prove that "x" and "y"
 are never modified and end up in the following:

>>
>> Hi Akira,
>>
>> I wouldn't call that compiler a clever one, it's a 

Re: [RFC v6 27/62] powerpc: helper to validate key-access permissions of a pte

2017-07-20 Thread Ram Pai
On Thu, Jul 20, 2017 at 12:12:47PM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > helper function that checks if the read/write/execute is allowed
> > on the pte.
> >
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/book3s/64/pgtable.h |4 +++
> >  arch/powerpc/include/asm/pkeys.h |   12 +
> >  arch/powerpc/mm/pkeys.c  |   33 
> > ++
> >  3 files changed, 49 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> > b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > index 30d7f55..0056e58 100644
> > --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> > +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> > @@ -472,6 +472,10 @@ static inline void write_uamor(u64 value)
> > mtspr(SPRN_UAMOR, value);
> >  }
> >
> > +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> > +extern bool arch_pte_access_permitted(u64 pte, bool write, bool execute);
> > +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> > +
> >  #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
> >  static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> >unsigned long addr, pte_t *ptep)
> > diff --git a/arch/powerpc/include/asm/pkeys.h 
> > b/arch/powerpc/include/asm/pkeys.h
> > index bbb5d85..7a9aade 100644
> > --- a/arch/powerpc/include/asm/pkeys.h
> > +++ b/arch/powerpc/include/asm/pkeys.h
> > @@ -53,6 +53,18 @@ static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
> > ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL));
> >  }
> >
> > +static inline u16 pte_to_pkey_bits(u64 pteflags)
> > +{
> > +   if (!pkey_inited)
> > +   return 0x0UL;
> 
> Do we really need that above check ? We should always find it
> peky_inited to be set. 

Yes. there are cases where pkey_inited is not enabled. 
a) if the MMU is radix.
b) if the PAGE size is 4k.
c) if the device tree says the feature is not available
d) if the CPU is of a older generation. P6 and older.

RP

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v6 11/62] powerpc: initial pkey plumbing

2017-07-20 Thread Ram Pai
On Thu, Jul 20, 2017 at 11:34:10AM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > basic setup to initialize the pkey system. Only 64K kernel in HPT
> > mode, enables the pkey system.
> >
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/Kconfig   |   16 ++
> >  arch/powerpc/include/asm/mmu_context.h |5 +++
> >  arch/powerpc/include/asm/pkeys.h   |   51 
> > 
> >  arch/powerpc/kernel/setup_64.c |4 ++
> >  arch/powerpc/mm/Makefile   |1 +
> >  arch/powerpc/mm/hash_utils_64.c|1 +
> >  arch/powerpc/mm/pkeys.c|   18 +++
> >  7 files changed, 96 insertions(+), 0 deletions(-)
> >  create mode 100644 arch/powerpc/include/asm/pkeys.h
> >  create mode 100644 arch/powerpc/mm/pkeys.c
> >
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index bf4391d..5c60fd6 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -855,6 +855,22 @@ config SECCOMP
> >
> >   If unsure, say Y. Only embedded should say N here.
> >
> > +config PPC64_MEMORY_PROTECTION_KEYS
> > +   prompt "PowerPC Memory Protection Keys"
> > +   def_bool y
> > +   # Note: only available in 64-bit mode
> > +   depends on PPC64 && PPC_64K_PAGES
> > +   select ARCH_USES_HIGH_VMA_FLAGS
> > +   select ARCH_HAS_PKEYS
> > +   ---help---
> > + Memory Protection Keys provides a mechanism for enforcing
> > + page-based protections, but without requiring modification of the
> > + page tables when an application changes protection domains.
> > +
> > + For details, see Documentation/vm/protection-keys.txt
> > +
> > + If unsure, say y.
> > +
> >  endmenu
> 
> 
> Shouldn't this come as the last patch ? Or can we enable this config by
> this patch ? If so what does it do ? Did you test boot each of this
> patch to make sure we don't break git bisect ?

it partially enables the key subsystem. The code is there, but it does
not do much. Essentially the behavior is the same as without the code.

Yes. it is test booted but not extensively on all
platforms/configurations.

However this code is blindly enabling pkey subsystem without referring
to the device tree. I have fixed this patch series to add that additional
patch. In that patch series, I place all the code without
enabling the subsystem.  The last patch actually fires it up, 
depending on availability of the feature as told by the device-tree.
RP

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v6 01/62] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-07-20 Thread Ram Pai
On Thu, Jul 20, 2017 at 11:21:51AM +0530, Aneesh Kumar K.V wrote:
> 
> .
> 
> > /*
> > @@ -116,8 +104,8 @@ int __hash_page_4K(unsigned long ea, unsigned long 
> > access, unsigned long vsid,
> >  * On hash insert failure we use old pte value and we don't
> >  * want slot information there if we have a insert failure.
> >  */
> > -   old_pte &= ~(H_PAGE_HASHPTE | H_PAGE_F_GIX | H_PAGE_F_SECOND);
> > -   new_pte &= ~(H_PAGE_HASHPTE | H_PAGE_F_GIX | H_PAGE_F_SECOND);
> > +   old_pte &= ~(H_PAGE_HASHPTE);
> > +   new_pte &= ~(H_PAGE_HASHPTE);
> > goto htab_insert_hpte;
> > }
> 
> With the current path order and above hunk we will breaks the bisect I guess. 
> With the above, when
> we convert a 64k hpte to 4khpte, since this is the first patch, we
> should clear that H_PAGE_F_GIX and H_PAGE_F_SECOND. We still use them
> for 64k. I guess you should move this hunk to second patch.

true. it should move to the next patch. Will fix it.
RP

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Paul E. McKenney
On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
> >> On 2017/07/20 14:47, Paul E. McKenney wrote:
> >>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
>  On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
> >> On 2017/07/20 2:43, Paul E. McKenney wrote:
> >>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
>  >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 
>  >2001
>  From: Akira Yokosawa 
>  Date: Mon, 17 Jul 2017 16:25:33 +0900
>  Subject: [PATCH] documentation: Fix two-CPU control-dependency 
>  example
> 
>  In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
>  no-transitivity example"), the operator in "if" statement of
>  the two-CPU example was modified from ">=" to ">".
>  Now the example misses the point because there is no party
>  who will modify "x" nor "y". So each CPU performs only the
>  READ_ONCE().
> 
>  The point of this example is to use control dependency for ordering,
>  and the WRITE_ONCE() should always be executed.
> 
>  So it was correct prior to the above mentioned commit. Partial
>  revert of the commit (with context adjustments regarding other
>  changes thereafter) restores the point.
> 
>  Note that the three-CPU example demonstrating the lack of
>  transitivity stands regardless of this partial revert.
> 
>  Signed-off-by: Akira Yokosawa 
> >>>
> >>> Hello, Akira,
> >>>
> >>> You are quite right that many compilers would generate straightforward
> >>> code for the code fragment below, and in that case, the assertion 
> >>> could
> >>> never trigger due to either TSO or control dependencies, depending on
> >>> the architecture this was running on.
> >>>
> >>> However, if the compiler was too smart for our good, it could figure
> >>> out that "x" and "y" only take on the values zero and one, so that
> >>> the "if" would always be taken.  At that point, the compiler could
> >>> simply ignore the "if" with the result shown below.
> >>>
>  ---
>   Documentation/memory-barriers.txt | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
>  diff --git a/Documentation/memory-barriers.txt 
>  b/Documentation/memory-barriers.txt
>  index c4ddfcd..c1ebe99 100644
>  --- a/Documentation/memory-barriers.txt
>  +++ b/Documentation/memory-barriers.txt
>  @@ -851,7 +851,7 @@ demonstrated by two related examples, with the 
>  initial values of
>   CPU 0 CPU 1
>   ===   ===
>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>  -if (r1 > 0)   if (r2 > 0)
>  +if (r1 >= 0)  if (r2 >= 0)
> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> 
>   assert(!(r1 == 1 && r2 == 1));
> >>>
> >>> Original program:
> >>>
> >>>   CPU 0 CPU 1
> >>>   ===   ===
> >>>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> >>>   if (r1 >= 0)  if (r2 >= 0)
> >>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>>
> >>>   assert(!(r1 == 1 && r2 == 1));
> >>>
> >>> Enthusiastically optimized program:
> >>>
> >>>   CPU 0 CPU 1
> >>>   ===   ===
> >>>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> >>>   WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>>
> >>>   assert(!(r1 == 1 && r2 == 1));
> >>>
> >>> This optimized version could trigger the assertion.
> >>>
> >>> So we do need to stick with the ">" comparison.
> >>
> >> Well but,
> >>
> >> Current example:
> >>
> >>CPU 0 CPU 1
> >>===   ===
> >>r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> >>if (r1 > 0)   if (r2 > 0)
> >>  WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>
> >>assert(!(r1 == 1 && r2 == 1));
> >>
> >> Such a clever compiler might be able to prove that "x" and "y"
> >> are never modified and end up in the following:
> >>
> 
>  Hi Akira,
> 
>  I wouldn't call that compiler a clever one, it's a broken one ;-)
> 
>  So here is the thing: 

Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Akira Yokosawa
On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
>> On 2017/07/20 14:47, Paul E. McKenney wrote:
>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
 On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
>> On 2017/07/20 2:43, Paul E. McKenney wrote:
>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
 >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
 From: Akira Yokosawa 
 Date: Mon, 17 Jul 2017 16:25:33 +0900
 Subject: [PATCH] documentation: Fix two-CPU control-dependency example

 In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
 no-transitivity example"), the operator in "if" statement of
 the two-CPU example was modified from ">=" to ">".
 Now the example misses the point because there is no party
 who will modify "x" nor "y". So each CPU performs only the
 READ_ONCE().

 The point of this example is to use control dependency for ordering,
 and the WRITE_ONCE() should always be executed.

 So it was correct prior to the above mentioned commit. Partial
 revert of the commit (with context adjustments regarding other
 changes thereafter) restores the point.

 Note that the three-CPU example demonstrating the lack of
 transitivity stands regardless of this partial revert.

 Signed-off-by: Akira Yokosawa 
>>>
>>> Hello, Akira,
>>>
>>> You are quite right that many compilers would generate straightforward
>>> code for the code fragment below, and in that case, the assertion could
>>> never trigger due to either TSO or control dependencies, depending on
>>> the architecture this was running on.
>>>
>>> However, if the compiler was too smart for our good, it could figure
>>> out that "x" and "y" only take on the values zero and one, so that
>>> the "if" would always be taken.  At that point, the compiler could
>>> simply ignore the "if" with the result shown below.
>>>
 ---
  Documentation/memory-barriers.txt | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/Documentation/memory-barriers.txt 
 b/Documentation/memory-barriers.txt
 index c4ddfcd..c1ebe99 100644
 --- a/Documentation/memory-barriers.txt
 +++ b/Documentation/memory-barriers.txt
 @@ -851,7 +851,7 @@ demonstrated by two related examples, with the 
 initial values of
CPU 0 CPU 1
===   ===
r1 = READ_ONCE(x);r2 = READ_ONCE(y);
 -  if (r1 > 0)   if (r2 > 0)
 +  if (r1 >= 0)  if (r2 >= 0)
  WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);

assert(!(r1 == 1 && r2 == 1));
>>>
>>> Original program:
>>>
>>> CPU 0 CPU 1
>>> ===   ===
>>> r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>>> if (r1 >= 0)  if (r2 >= 0)
>>>   WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>>>
>>> assert(!(r1 == 1 && r2 == 1));
>>>
>>> Enthusiastically optimized program:
>>>
>>> CPU 0 CPU 1
>>> ===   ===
>>> r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>>> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>>>
>>> assert(!(r1 == 1 && r2 == 1));
>>>
>>> This optimized version could trigger the assertion.
>>>
>>> So we do need to stick with the ">" comparison.
>>
>> Well but,
>>
>> Current example:
>>
>>  CPU 0 CPU 1
>>  ===   ===
>>  r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>>  if (r1 > 0)   if (r2 > 0)
>>WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>>
>>  assert(!(r1 == 1 && r2 == 1));
>>
>> Such a clever compiler might be able to prove that "x" and "y"
>> are never modified and end up in the following:
>>

 Hi Akira,

 I wouldn't call that compiler a clever one, it's a broken one ;-)

 So here is the thing: READ_ONCE() is a *volatile* load, which means the
 compiler has to generate code that actually does a load, so the values
 of r1 and r2 depend on the loads, therefore, a sane compiler will not 
 optimize the if()s out because the volatile semantics of READ_ONCE().
 Otherwise, I 

Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Paul E. McKenney
On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
> On 2017/07/20 14:47, Paul E. McKenney wrote:
> > On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
> >> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
> >>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
>  On 2017/07/20 2:43, Paul E. McKenney wrote:
> > On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
> >> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
> >> From: Akira Yokosawa 
> >> Date: Mon, 17 Jul 2017 16:25:33 +0900
> >> Subject: [PATCH] documentation: Fix two-CPU control-dependency example
> >>
> >> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
> >> no-transitivity example"), the operator in "if" statement of
> >> the two-CPU example was modified from ">=" to ">".
> >> Now the example misses the point because there is no party
> >> who will modify "x" nor "y". So each CPU performs only the
> >> READ_ONCE().
> >>
> >> The point of this example is to use control dependency for ordering,
> >> and the WRITE_ONCE() should always be executed.
> >>
> >> So it was correct prior to the above mentioned commit. Partial
> >> revert of the commit (with context adjustments regarding other
> >> changes thereafter) restores the point.
> >>
> >> Note that the three-CPU example demonstrating the lack of
> >> transitivity stands regardless of this partial revert.
> >>
> >> Signed-off-by: Akira Yokosawa 
> >
> > Hello, Akira,
> >
> > You are quite right that many compilers would generate straightforward
> > code for the code fragment below, and in that case, the assertion could
> > never trigger due to either TSO or control dependencies, depending on
> > the architecture this was running on.
> >
> > However, if the compiler was too smart for our good, it could figure
> > out that "x" and "y" only take on the values zero and one, so that
> > the "if" would always be taken.  At that point, the compiler could
> > simply ignore the "if" with the result shown below.
> >
> >> ---
> >>  Documentation/memory-barriers.txt | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/memory-barriers.txt 
> >> b/Documentation/memory-barriers.txt
> >> index c4ddfcd..c1ebe99 100644
> >> --- a/Documentation/memory-barriers.txt
> >> +++ b/Documentation/memory-barriers.txt
> >> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the 
> >> initial values of
> >>CPU 0 CPU 1
> >>===   ===
> >>r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> >> -  if (r1 > 0)   if (r2 > 0)
> >> +  if (r1 >= 0)  if (r2 >= 0)
> >>  WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >>
> >>assert(!(r1 == 1 && r2 == 1));
> >
> > Original program:
> >
> > CPU 0 CPU 1
> > ===   ===
> > r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> > if (r1 >= 0)  if (r2 >= 0)
> >   WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >
> > assert(!(r1 == 1 && r2 == 1));
> >
> > Enthusiastically optimized program:
> >
> > CPU 0 CPU 1
> > ===   ===
> > r1 = READ_ONCE(x);r2 = READ_ONCE(y);
> > WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> >
> > assert(!(r1 == 1 && r2 == 1));
> >
> > This optimized version could trigger the assertion.
> >
> > So we do need to stick with the ">" comparison.
> 
>  Well but,
> 
>  Current example:
> 
>   CPU 0 CPU 1
>   ===   ===
>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>   if (r1 > 0)   if (r2 > 0)
> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
> 
>   assert(!(r1 == 1 && r2 == 1));
> 
>  Such a clever compiler might be able to prove that "x" and "y"
>  are never modified and end up in the following:
> 
> >>
> >> Hi Akira,
> >>
> >> I wouldn't call that compiler a clever one, it's a broken one ;-)
> >>
> >> So here is the thing: READ_ONCE() is a *volatile* load, which means the
> >> compiler has to generate code that actually does a load, so the values
> >> of r1 and r2 depend on the loads, therefore, a sane compiler will not 
> >> optimize the if()s out because the volatile semantics of READ_ONCE().
> >> Otherwise, I think we have a lot more to worry about than this case.
> 

Re: [PATCH v3 1/5] mm: add vm_insert_mixed_mkwrite()

2017-07-20 Thread Ross Zwisler
On Thu, Jul 20, 2017 at 11:26:16AM -0400, Vivek Goyal wrote:
> On Wed, Jun 28, 2017 at 04:01:48PM -0600, Ross Zwisler wrote:
> > To be able to use the common 4k zero page in DAX we need to have our PTE
> > fault path look more like our PMD fault path where a PTE entry can be
> > marked as dirty and writeable as it is first inserted, rather than waiting
> > for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> > 
> > Right now we can rely on having a dax_pfn_mkwrite() call because we can
> > distinguish between these two cases in do_wp_page():
> > 
> > case 1: 4k zero page => writable DAX storage
> > case 2: read-only DAX storage => writeable DAX storage
> > 
> > This distinction is made by via vm_normal_page().  vm_normal_page() returns
> > false for the common 4k zero page, though, just as it does for DAX ptes.
> > Instead of special casing the DAX + 4k zero page case, we will simplify our
> > DAX PTE page fault sequence so that it matches our DAX PMD sequence, and
> > get rid of dax_pfn_mkwrite() completely.
> > 
> > This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> > and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set insert_pfn()
> > will do the work that was previously done by wp_page_reuse() as part of the
> > dax_pfn_mkwrite() call path.
> > 
> > Signed-off-by: Ross Zwisler 
> > ---
> >  include/linux/mm.h |  2 ++
> >  mm/memory.c| 57 
> > +-
> >  2 files changed, 54 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 6f543a4..096052f 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2293,6 +2293,8 @@ int vm_insert_pfn_prot(struct vm_area_struct *vma, 
> > unsigned long addr,
> > unsigned long pfn, pgprot_t pgprot);
> >  int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
> > pfn_t pfn);
> > +int vm_insert_mixed_mkwrite(struct vm_area_struct *vma, unsigned long addr,
> > +   pfn_t pfn);
> >  int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, 
> > unsigned long len);
> >  
> >  
> > diff --git a/mm/memory.c b/mm/memory.c
> > index bb11c47..de4aa71 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -1646,7 +1646,7 @@ int vm_insert_page(struct vm_area_struct *vma, 
> > unsigned long addr,
> >  EXPORT_SYMBOL(vm_insert_page);
> >  
> >  static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> > -   pfn_t pfn, pgprot_t prot)
> > +   pfn_t pfn, pgprot_t prot, bool mkwrite)
> >  {
> > struct mm_struct *mm = vma->vm_mm;
> > int retval;
> > @@ -1658,14 +1658,26 @@ static int insert_pfn(struct vm_area_struct *vma, 
> > unsigned long addr,
> > if (!pte)
> > goto out;
> > retval = -EBUSY;
> > -   if (!pte_none(*pte))
> > -   goto out_unlock;
> > +   if (!pte_none(*pte)) {
> > +   if (mkwrite) {
> > +   entry = *pte;
> > +   goto out_mkwrite;
> > +   } else
> > +   goto out_unlock;
> > +   }
> >  
> > /* Ok, finally just insert the thing.. */
> > if (pfn_t_devmap(pfn))
> > entry = pte_mkdevmap(pfn_t_pte(pfn, prot));
> > else
> > entry = pte_mkspecial(pfn_t_pte(pfn, prot));
> > +
> > +out_mkwrite:
> > +   if (mkwrite) {
> > +   entry = pte_mkyoung(entry);
> > +   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> > +   }
> > +
> > set_pte_at(mm, addr, pte, entry);
> > update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
> >  
> > @@ -1736,7 +1748,8 @@ int vm_insert_pfn_prot(struct vm_area_struct *vma, 
> > unsigned long addr,
> >  
> > track_pfn_insert(vma, , __pfn_to_pfn_t(pfn, PFN_DEV));
> >  
> > -   ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot);
> > +   ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot,
> > +   false);
> >  
> > return ret;
> >  }
> > @@ -1772,10 +1785,44 @@ int vm_insert_mixed(struct vm_area_struct *vma, 
> > unsigned long addr,
> > page = pfn_to_page(pfn_t_to_pfn(pfn));
> > return insert_page(vma, addr, page, pgprot);
> > }
> > -   return insert_pfn(vma, addr, pfn, pgprot);
> > +   return insert_pfn(vma, addr, pfn, pgprot, false);
> >  }
> >  EXPORT_SYMBOL(vm_insert_mixed);
> >  
> > +int vm_insert_mixed_mkwrite(struct vm_area_struct *vma, unsigned long addr,
> > +   pfn_t pfn)
> > +{
> > +   pgprot_t pgprot = vma->vm_page_prot;
> > +
> > +   BUG_ON(!(vma->vm_flags & VM_MIXEDMAP));
> > +
> > +   if (addr < vma->vm_start || addr >= vma->vm_end)
> > +   return -EFAULT;
> > +
> > +   track_pfn_insert(vma, , pfn);
> > +
> > +   /*
> > +* If we don't have pte special, then we have to use the pfn_valid()
> > +* based VM_MIXEDMAP scheme (see 

Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries

2017-07-20 Thread Waiman Long
On 07/20/2017 11:08 AM, Miklos Szeredi wrote:
> On Thu, Jul 20, 2017 at 4:21 PM, Waiman Long  wrote:
>> On 07/20/2017 03:20 AM, Miklos Szeredi wrote:
>>> On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long  wrote:
>> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry 
>> *dentry)
>>
>> if (!IS_ROOT(dentry)) {
>> parent = dentry->d_parent;
>> -   if (unlikely(!spin_trylock(>d_lock))) {
>> +   /*
>> +* Force the killing of this negative dentry when
>> +* DCACHE_KILL_NEGATIVE flag is set.
>> +*/
>> +   if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
>> +   spin_lock(>d_lock);
> This looks like d_lock ordering problem (should be parent first, child
> second).  Why is this needed, anyway?
>
 Yes, that is a bug. I should have used lock_parent() instead.
>>> lock_parent() can release dentry->d_lock, which means it's perfectly
>>> useless for this.
>> As the reference count is kept at 1 in dentry_kill(), the dentry won't
>> go away even if the dentry lock is temporarily released.
> It won't go away, but anything else might happen to it (ref grabbed by
> somebody else, instantiated, etc).  Don't see how it's going to be
> better than the existing trylock.
>
> Thanks,
> Miklos

In the unlikely event that the reference count or the d_flags changes,
we can abort the killing.

Cheers,
Longman


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/5] mm: add vm_insert_mixed_mkwrite()

2017-07-20 Thread Vivek Goyal
On Wed, Jun 28, 2017 at 04:01:48PM -0600, Ross Zwisler wrote:
> To be able to use the common 4k zero page in DAX we need to have our PTE
> fault path look more like our PMD fault path where a PTE entry can be
> marked as dirty and writeable as it is first inserted, rather than waiting
> for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() call.
> 
> Right now we can rely on having a dax_pfn_mkwrite() call because we can
> distinguish between these two cases in do_wp_page():
> 
>   case 1: 4k zero page => writable DAX storage
>   case 2: read-only DAX storage => writeable DAX storage
> 
> This distinction is made by via vm_normal_page().  vm_normal_page() returns
> false for the common 4k zero page, though, just as it does for DAX ptes.
> Instead of special casing the DAX + 4k zero page case, we will simplify our
> DAX PTE page fault sequence so that it matches our DAX PMD sequence, and
> get rid of dax_pfn_mkwrite() completely.
> 
> This means that insert_pfn() needs to follow the lead of insert_pfn_pmd()
> and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set insert_pfn()
> will do the work that was previously done by wp_page_reuse() as part of the
> dax_pfn_mkwrite() call path.
> 
> Signed-off-by: Ross Zwisler 
> ---
>  include/linux/mm.h |  2 ++
>  mm/memory.c| 57 
> +-
>  2 files changed, 54 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6f543a4..096052f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2293,6 +2293,8 @@ int vm_insert_pfn_prot(struct vm_area_struct *vma, 
> unsigned long addr,
>   unsigned long pfn, pgprot_t pgprot);
>  int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
>   pfn_t pfn);
> +int vm_insert_mixed_mkwrite(struct vm_area_struct *vma, unsigned long addr,
> + pfn_t pfn);
>  int vm_iomap_memory(struct vm_area_struct *vma, phys_addr_t start, unsigned 
> long len);
>  
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index bb11c47..de4aa71 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1646,7 +1646,7 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
> long addr,
>  EXPORT_SYMBOL(vm_insert_page);
>  
>  static int insert_pfn(struct vm_area_struct *vma, unsigned long addr,
> - pfn_t pfn, pgprot_t prot)
> + pfn_t pfn, pgprot_t prot, bool mkwrite)
>  {
>   struct mm_struct *mm = vma->vm_mm;
>   int retval;
> @@ -1658,14 +1658,26 @@ static int insert_pfn(struct vm_area_struct *vma, 
> unsigned long addr,
>   if (!pte)
>   goto out;
>   retval = -EBUSY;
> - if (!pte_none(*pte))
> - goto out_unlock;
> + if (!pte_none(*pte)) {
> + if (mkwrite) {
> + entry = *pte;
> + goto out_mkwrite;
> + } else
> + goto out_unlock;
> + }
>  
>   /* Ok, finally just insert the thing.. */
>   if (pfn_t_devmap(pfn))
>   entry = pte_mkdevmap(pfn_t_pte(pfn, prot));
>   else
>   entry = pte_mkspecial(pfn_t_pte(pfn, prot));
> +
> +out_mkwrite:
> + if (mkwrite) {
> + entry = pte_mkyoung(entry);
> + entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> + }
> +
>   set_pte_at(mm, addr, pte, entry);
>   update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */
>  
> @@ -1736,7 +1748,8 @@ int vm_insert_pfn_prot(struct vm_area_struct *vma, 
> unsigned long addr,
>  
>   track_pfn_insert(vma, , __pfn_to_pfn_t(pfn, PFN_DEV));
>  
> - ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot);
> + ret = insert_pfn(vma, addr, __pfn_to_pfn_t(pfn, PFN_DEV), pgprot,
> + false);
>  
>   return ret;
>  }
> @@ -1772,10 +1785,44 @@ int vm_insert_mixed(struct vm_area_struct *vma, 
> unsigned long addr,
>   page = pfn_to_page(pfn_t_to_pfn(pfn));
>   return insert_page(vma, addr, page, pgprot);
>   }
> - return insert_pfn(vma, addr, pfn, pgprot);
> + return insert_pfn(vma, addr, pfn, pgprot, false);
>  }
>  EXPORT_SYMBOL(vm_insert_mixed);
>  
> +int vm_insert_mixed_mkwrite(struct vm_area_struct *vma, unsigned long addr,
> + pfn_t pfn)
> +{
> + pgprot_t pgprot = vma->vm_page_prot;
> +
> + BUG_ON(!(vma->vm_flags & VM_MIXEDMAP));
> +
> + if (addr < vma->vm_start || addr >= vma->vm_end)
> + return -EFAULT;
> +
> + track_pfn_insert(vma, , pfn);
> +
> + /*
> +  * If we don't have pte special, then we have to use the pfn_valid()
> +  * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must*
> +  * refcount the page if pfn_valid is true (hence insert_page rather
> +  * than insert_pfn).  If a zero_pfn were inserted into a VM_MIXEDMAP
> +  * without pte 

Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries

2017-07-20 Thread Miklos Szeredi
On Thu, Jul 20, 2017 at 4:21 PM, Waiman Long  wrote:
> On 07/20/2017 03:20 AM, Miklos Szeredi wrote:
>> On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long  wrote:
>>>
> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry 
> *dentry)
>
> if (!IS_ROOT(dentry)) {
> parent = dentry->d_parent;
> -   if (unlikely(!spin_trylock(>d_lock))) {
> +   /*
> +* Force the killing of this negative dentry when
> +* DCACHE_KILL_NEGATIVE flag is set.
> +*/
> +   if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
> +   spin_lock(>d_lock);
 This looks like d_lock ordering problem (should be parent first, child
 second).  Why is this needed, anyway?

>>> Yes, that is a bug. I should have used lock_parent() instead.
>> lock_parent() can release dentry->d_lock, which means it's perfectly
>> useless for this.
>
> As the reference count is kept at 1 in dentry_kill(), the dentry won't
> go away even if the dentry lock is temporarily released.

It won't go away, but anything else might happen to it (ref grabbed by
somebody else, instantiated, etc).  Don't see how it's going to be
better than the existing trylock.

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads

2017-07-20 Thread Ross Zwisler
On Thu, Jul 20, 2017 at 12:27:23PM +0200, Jan Kara wrote:
> On Wed 19-07-17 10:26:45, Ross Zwisler wrote:
> > On Wed, Jul 19, 2017 at 05:33:14PM +0200, Jan Kara wrote:
> > > On Wed 28-06-17 16:01:50, Ross Zwisler wrote:
> > > > Another major change is that we remove dax_pfn_mkwrite() from our fault
> > > > flow, and instead rely on the page fault itself to make the PTE dirty 
> > > > and
> > > > writeable.  The following description from the patch adding the
> > > > vm_insert_mixed_mkwrite() call explains this a little more:
> > > > 
> > > > ***
> > > >   To be able to use the common 4k zero page in DAX we need to have our 
> > > > PTE
> > > >   fault path look more like our PMD fault path where a PTE entry can be
> > > >   marked as dirty and writeable as it is first inserted, rather than
> > > >   waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() 
> > > > call.
> > > > 
> > > >   Right now we can rely on having a dax_pfn_mkwrite() call because we 
> > > > can
> > > >   distinguish between these two cases in do_wp_page():
> > > > 
> > > > case 1: 4k zero page => writable DAX storage
> > > > case 2: read-only DAX storage => writeable DAX storage
> > > > 
> > > >   This distinction is made by via vm_normal_page().  vm_normal_page()
> > > >   returns false for the common 4k zero page, though, just as it does for
> > > >   DAX ptes.  Instead of special casing the DAX + 4k zero page case, we 
> > > > will
> > > >   simplify our DAX PTE page fault sequence so that it matches our DAX 
> > > > PMD
> > > >   sequence, and get rid of dax_pfn_mkwrite() completely.
> > > > 
> > > >   This means that insert_pfn() needs to follow the lead of 
> > > > insert_pfn_pmd()
> > > >   and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set
> > > >   insert_pfn() will do the work that was previously done by 
> > > > wp_page_reuse()
> > > >   as part of the dax_pfn_mkwrite() call path.
> > > > ***
> > > 
> > > Hum, thinking about this in context of this patch... So what if we have
> > > allocated storage, a process faults it read-only, we map it to page tables
> > > writeprotected. Then the process writes through mmap to the area - the 
> > > code
> > > in handle_pte_fault() ends up in do_wp_page() if I'm reading it right.
> > 
> > Yep.
> > 
> > > Then, since we are missing ->pfn_mkwrite() handlers, the PTE will be 
> > > marked
> > > writeable but radix tree entry stays clean - bug. Am I missing something?
> > 
> > I don't think we ever end up with a writeable PTE but with a clean radix 
> > tree
> > entry.  When we get the write fault we do a full fault through
> > dax_iomap_pte_fault() and dax_insert_mapping().
> >
> > dax_insert_mapping() sets up the dirty radix tree entry via
> > dax_insert_mapping_entry() before it does anything with the page tables via
> > vm_insert_mixed_mkwrite().
> > 
> > So, this mkwrite fault path is exactly the path we would have taken if the
> > initial read to real storage hadn't happened, and we end up in the same end
> > state - with a dirty DAX radix tree entry and a writeable PTE.
> 
> Ah sorry, I have missed that it is not that you would not have
> ->pfn_mkwrite() handler - you still have it but it is the same as standard
> fault handler now. So maybe can you rephrase the changelog a bit saying
> that: "We get rid of dax_pfn_mkwrite() helper and use dax_iomap_fault() to
> handle write-protection faults instead." Thanks!

Ah, sure, will do.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries

2017-07-20 Thread Waiman Long
On 07/20/2017 03:20 AM, Miklos Szeredi wrote:
> On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long  wrote:
>>
 @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry 
 *dentry)

 if (!IS_ROOT(dentry)) {
 parent = dentry->d_parent;
 -   if (unlikely(!spin_trylock(>d_lock))) {
 +   /*
 +* Force the killing of this negative dentry when
 +* DCACHE_KILL_NEGATIVE flag is set.
 +*/
 +   if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
 +   spin_lock(>d_lock);
>>> This looks like d_lock ordering problem (should be parent first, child
>>> second).  Why is this needed, anyway?
>>>
>> Yes, that is a bug. I should have used lock_parent() instead.
> lock_parent() can release dentry->d_lock, which means it's perfectly
> useless for this.

As the reference count is kept at 1 in dentry_kill(), the dentry won't
go away even if the dentry lock is temporarily released.

> I still feel forcing  free is wrong here.  Why not just block until
> the number of negatives goes below the limit (start reclaim if not
> already doing so, etc...)?

Force freeing is the simplest. Any other ways will require adding more
code and increasing code complexity.

One reason why I prefer this is to avoid adding unpredictable latency to
the regular directory lookup and other dentry related operations. We can
always change the code later on if there is a better way of doing it.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] perf: hisi: Add support for HiSilicon SoC uncore PMU driver

2017-07-20 Thread Zhangshaokun

Hi Jonathan

On 2017/7/20 21:49, Jonathan Cameron wrote:
> On Thu, 20 Jul 2017 21:03:19 +0800
> Zhangshaokun  wrote:
> 
>> Hi Jonathan
>>
>> On 2017/7/19 17:19, Jonathan Cameron wrote:
>>> On Tue, 18 Jul 2017 15:59:55 +0800
>>> Shaokun Zhang  wrote:
>>>   
 This patch adds support HiSilicon SoC uncore PMU driver framework and
 interfaces.

 Signed-off-by: Shaokun Zhang 
 Signed-off-by: Anurup M 
>>> A couple of minor things inline.
>>>
> 
 +/* Generic pmu struct for different pmu types */
 +struct hisi_pmu {
 +  const char *name;
 +  struct pmu pmu;
 +  const struct hisi_uncore_ops *ops;
 +  struct hisi_pmu_hwevents pmu_events;
 +  cpumask_t cpus;
 +  struct device *dev;
 +  struct hlist_node node;
 +  u32 scl_id;
 +  u32 ccl_id;
 +  /* Hardware information for different pmu types */
 +  void __iomem *base;
 +  union {
 +  u32 ddrc_chn_id;
 +  u32 l3c_tag_id;
 +  u32 hha_uid;
 +  };
 +  int num_counters;
 +  int num_events;
 +  int counter_bits;
 +};
 +
 +int hisi_uncore_pmu_counter_valid(struct hisi_pmu *hisi_pmu, int idx);
 +int hisi_uncore_pmu_get_event_idx(struct perf_event *event);
 +void hisi_uncore_pmu_clear_event_idx(struct hisi_pmu *hisi_pmu, int idx); 

>>> The above is only used in hisi_uncore_pmu.c so doesn't need to be here
>>> and can be static.
>>>   
>>
>> These functions would be called in L3C/HHA/DDR PMU driver. We want to give a
>> uncore perf framework in hisi_uncore_pmu.c for hisilicon uncore PMUs.
>>
> For all but the one function above that is true.  I couldn't find this one 
> being
> used anywhere in those drivers.
> 

My apologies, it is what your said, thanks.

> 
>> Thanks.
>> Shaokun
>>
 +void hisi_uncore_pmu_read(struct perf_event *event);
 +int hisi_uncore_pmu_add(struct perf_event *event, int flags);
 +void hisi_uncore_pmu_del(struct perf_event *event, int flags);
 +void hisi_uncore_pmu_start(struct perf_event *event, int flags);
 +void hisi_uncore_pmu_stop(struct perf_event *event, int flags);
 +void hisi_uncore_pmu_set_event_period(struct perf_event *event);
 +u64 hisi_uncore_pmu_event_update(struct perf_event *event);
 +int hisi_uncore_pmu_event_init(struct perf_event *event);
 +int hisi_uncore_pmu_setup(struct hisi_pmu *hisi_pmu, const char 
 *pmu_name);
 +void hisi_uncore_pmu_enable(struct pmu *pmu);
 +void hisi_uncore_pmu_disable(struct pmu *pmu);
 +struct hisi_pmu *hisi_pmu_alloc(struct device *dev, u32 num_cntrs);
 +ssize_t hisi_event_sysfs_show(struct device *dev,
 +struct device_attribute *attr, char *buf);
 +ssize_t hisi_format_sysfs_show(struct device *dev,
 + struct device_attribute *attr, char *buf);
 +ssize_t hisi_cpumask_sysfs_show(struct device *dev,
 +  struct device_attribute *attr, char *buf);
 +void hisi_read_scl_and_ccl_id(u32 *scl_id, u32 *ccl_id);
 +#endif /* __HISI_UNCORE_PMU_H__ */
>>>
>>>
>>> .
>>>   
>>
> 
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/6] perf: hisi: Add support for HiSilicon SoC L3C PMU driver

2017-07-20 Thread Zhangshaokun
Hi Jonathan,

On 2017/7/19 17:28, Jonathan Cameron wrote:
> On Tue, 18 Jul 2017 15:59:56 +0800
> Shaokun Zhang  wrote:
> 
>> This patch adds support for L3C PMU driver in HiSilicon SoC chip, Each
>> L3C has own control, counter and interrupt registers and is an separate
>> PMU. For each L3C PMU, it has 8-programable counters and supports 0x60
>> events, event code is 8-bits and every counter is free-running.
>> Interrupt is supported to handle counter (48-bits) overflow.
>>
>> Signed-off-by: Shaokun Zhang 
>> Signed-off-by: Anurup M   
> Hi Shaokun,
> 
> A few minor points inline.
> 
> Thanks,
> 
> Jonathan
>> ---
>>  drivers/perf/hisilicon/Makefile  |   2 +-
>>  drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c | 556 
>> +++
>>  include/linux/cpuhotplug.h   |   1 +
>>  3 files changed, 558 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
>>
>> diff --git a/drivers/perf/hisilicon/Makefile 
>> b/drivers/perf/hisilicon/Makefile
>> index 2783bb3..4a3d3e6 100644
>> --- a/drivers/perf/hisilicon/Makefile
>> +++ b/drivers/perf/hisilicon/Makefile
>> @@ -1 +1 @@
>> -obj-$(CONFIG_HISI_PMU) += hisi_uncore_pmu.o
>> +obj-$(CONFIG_HISI_PMU) += hisi_uncore_pmu.o hisi_uncore_l3c_pmu.o
>> diff --git a/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c 
>> b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
>> new file mode 100644
>> index 000..d430508
>> --- /dev/null
>> +++ b/drivers/perf/hisilicon/hisi_uncore_l3c_pmu.c
>> @@ -0,0 +1,556 @@
>> +/*
>> + * HiSilicon SoC L3C uncore Hardware event counters support
>> + *
>> + * Copyright (C) 2017 Hisilicon Limited
>> + * Author: Anurup M 
>> + * Shaokun Zhang 
>> + *
>> + * This code is based on the uncore PMUs like arm-cci and arm-ccn.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
> Missed this on previous but putting full gpl text in is not normally done any
> more. 
>> + */
>> +#include 
>> +#include   
> Not immediately seeing any bitmaps in use in here.
> (I may well have missed something though!)
> 

Ok, shall remove it.

>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include "hisi_uncore_pmu.h"
> 
>> +
>> +/* Check if the CPU belongs to the SCCL and CCL of PMU */
>> +static bool hisi_l3c_is_cpu_in_ccl(struct hisi_pmu *l3c_pmu)
>> +{
>> +u32 ccl_id, scl_id;
>> +
>> +/* Read Super CPU cluster ID from CPU MPIDR_EL1 */
>> +hisi_read_scl_and_ccl_id(_id, _id);
>> +
>> +/* Check if the CPU match with the SCCL and CCL */
>> +if (scl_id != l3c_pmu->scl_id)
>> +return false;
>> +if (ccl_id != l3c_pmu->ccl_id)
>> +return false;
>> +
>> +/* Return true if matched */  
> I think the name of the function makes this clear enough to
> not need a comment here, but up to you.

Shall remove some redundant comments

>> +return true;
>> +}
>> +
>> +static int hisi_l3c_pmu_online_cpu(unsigned int cpu, struct hlist_node 
>> *node)
>> +{
>> +struct hisi_pmu *l3c_pmu;
>> +
>> +l3c_pmu = hlist_entry_safe(node, struct hisi_pmu, node);
>> +
>> +/* Proceed only when CPU belongs to the same SCCL and CCL */
>> +if (!hisi_l3c_is_cpu_in_ccl(l3c_pmu))
>> +return 0;
>> +
>> +/* If another CPU is already managing the CPU cluster, simply return */
>> +if (!cpumask_empty(_pmu->cpus))
>> +return 0;
>> +
>> +/* Use this CPU for event counting in the CCL */
>> +cpumask_set_cpu(cpu, _pmu->cpus);
>> +
>> +return 0;
>> +}
>> +
>> +static int hisi_l3c_pmu_offline_cpu(unsigned int cpu, struct hlist_node 
>> *node)
>> +{
>> +struct hisi_pmu *l3c_pmu;
>> +cpumask_t ccl_online_cpus;
>> +unsigned int target;
>> +
>> +l3c_pmu = hlist_entry_safe(node, struct hisi_pmu, node);
>> +
>> +/* Proceed only when CPU belongs to the same SCCL and CCL */
>> +if (!hisi_l3c_is_cpu_in_ccl(l3c_pmu))
>> +return 0;
>> +
>> +/* Proceed if this CPU is used for event counting */
>> +if (!cpumask_test_cpu(cpu, _pmu->cpus))
>> +return 0;
>> +
>> +/* Give up ownership of CCL */
>> +cpumask_test_and_clear_cpu(cpu, _pmu->cpus);
>> +
>> +/* Any other CPU for this CCL which is still online */
>> +cpumask_and(_online_cpus, cpu_coregroup_mask(cpu),
>> + 

Re: [PATCH v3 2/6] perf: hisi: Add support for HiSilicon SoC uncore PMU driver

2017-07-20 Thread Jonathan Cameron
On Thu, 20 Jul 2017 21:03:19 +0800
Zhangshaokun  wrote:

> Hi Jonathan
> 
> On 2017/7/19 17:19, Jonathan Cameron wrote:
> > On Tue, 18 Jul 2017 15:59:55 +0800
> > Shaokun Zhang  wrote:
> >   
> >> This patch adds support HiSilicon SoC uncore PMU driver framework and
> >> interfaces.
> >>
> >> Signed-off-by: Shaokun Zhang 
> >> Signed-off-by: Anurup M 
> > A couple of minor things inline.
> > 

> >> +/* Generic pmu struct for different pmu types */
> >> +struct hisi_pmu {
> >> +  const char *name;
> >> +  struct pmu pmu;
> >> +  const struct hisi_uncore_ops *ops;
> >> +  struct hisi_pmu_hwevents pmu_events;
> >> +  cpumask_t cpus;
> >> +  struct device *dev;
> >> +  struct hlist_node node;
> >> +  u32 scl_id;
> >> +  u32 ccl_id;
> >> +  /* Hardware information for different pmu types */
> >> +  void __iomem *base;
> >> +  union {
> >> +  u32 ddrc_chn_id;
> >> +  u32 l3c_tag_id;
> >> +  u32 hha_uid;
> >> +  };
> >> +  int num_counters;
> >> +  int num_events;
> >> +  int counter_bits;
> >> +};
> >> +
> >> +int hisi_uncore_pmu_counter_valid(struct hisi_pmu *hisi_pmu, int idx);
> >> +int hisi_uncore_pmu_get_event_idx(struct perf_event *event);
> >> +void hisi_uncore_pmu_clear_event_idx(struct hisi_pmu *hisi_pmu, int idx); 
> >>
> > The above is only used in hisi_uncore_pmu.c so doesn't need to be here
> > and can be static.
> >   
> 
> These functions would be called in L3C/HHA/DDR PMU driver. We want to give a
> uncore perf framework in hisi_uncore_pmu.c for hisilicon uncore PMUs.
> 
For all but the one function above that is true.  I couldn't find this one being
used anywhere in those drivers.


> Thanks.
> Shaokun
> 
> >> +void hisi_uncore_pmu_read(struct perf_event *event);
> >> +int hisi_uncore_pmu_add(struct perf_event *event, int flags);
> >> +void hisi_uncore_pmu_del(struct perf_event *event, int flags);
> >> +void hisi_uncore_pmu_start(struct perf_event *event, int flags);
> >> +void hisi_uncore_pmu_stop(struct perf_event *event, int flags);
> >> +void hisi_uncore_pmu_set_event_period(struct perf_event *event);
> >> +u64 hisi_uncore_pmu_event_update(struct perf_event *event);
> >> +int hisi_uncore_pmu_event_init(struct perf_event *event);
> >> +int hisi_uncore_pmu_setup(struct hisi_pmu *hisi_pmu, const char 
> >> *pmu_name);
> >> +void hisi_uncore_pmu_enable(struct pmu *pmu);
> >> +void hisi_uncore_pmu_disable(struct pmu *pmu);
> >> +struct hisi_pmu *hisi_pmu_alloc(struct device *dev, u32 num_cntrs);
> >> +ssize_t hisi_event_sysfs_show(struct device *dev,
> >> +struct device_attribute *attr, char *buf);
> >> +ssize_t hisi_format_sysfs_show(struct device *dev,
> >> + struct device_attribute *attr, char *buf);
> >> +ssize_t hisi_cpumask_sysfs_show(struct device *dev,
> >> +  struct device_attribute *attr, char *buf);
> >> +void hisi_read_scl_and_ccl_id(u32 *scl_id, u32 *ccl_id);
> >> +#endif /* __HISI_UNCORE_PMU_H__ */
> > 
> > 
> > .
> >   
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/6] Documentation: perf: hisi: Documentation for HiSilicon SoC PMU driver

2017-07-20 Thread Will Deacon
On Thu, Jul 20, 2017 at 02:08:47PM +0100, Will Deacon wrote:
> On Thu, Jul 20, 2017 at 08:54:36PM +0800, Zhangshaokun wrote:
> > On 2017/7/19 17:17, Jonathan Cameron wrote:
> > >> +Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has two 
> > >> HHAs
> > >> +(0 - 1) and four DDRCs (0 - 3), respectively.
> > >> +
> > >> +HiSilicon SoC uncore PMU driver
> > >> +---
> > >> +Each device PMU has separate registers for event counting, control and
> > >> +interrupt, and the PMU driver shall register perf PMU drivers like L3C,
> > >> +HHA and DDRC etc. The available events and configuration options shall
> > >> +be described in the sysfs, see /sys/devices/hisi_*.  
> > > Is there not a subsystem directory that would make more sense to
> > > refer to than the full device list?
> > > 
> > 
> > For uncore devices, it is more reasonable to list in /sys/devices/***.
> 
> The usual place for these things is /sys/bus/event_source/devices/.

... which are symlinks to directories under /sys/devices/! I didn't realise
that, so I suspect you're ok. Still worth getting the format of things
correct though, so perf can read/parse them with minimal effort.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/6] perf: hisi: Add support for HiSilicon SoC uncore PMU driver

2017-07-20 Thread Zhangshaokun
Hi Jonathan

On 2017/7/19 17:19, Jonathan Cameron wrote:
> On Tue, 18 Jul 2017 15:59:55 +0800
> Shaokun Zhang  wrote:
> 
>> This patch adds support HiSilicon SoC uncore PMU driver framework and
>> interfaces.
>>
>> Signed-off-by: Shaokun Zhang 
>> Signed-off-by: Anurup M   
> A couple of minor things inline.
> 
> Jonathan
>> ---
>>  drivers/perf/Kconfig |   7 +
>>  drivers/perf/Makefile|   1 +
>>  drivers/perf/hisilicon/Makefile  |   1 +
>>  drivers/perf/hisilicon/hisi_uncore_pmu.c | 411 
>> +++
>>  drivers/perf/hisilicon/hisi_uncore_pmu.h | 116 +
>>  5 files changed, 536 insertions(+)
>>  create mode 100644 drivers/perf/hisilicon/Makefile
>>  create mode 100644 drivers/perf/hisilicon/hisi_uncore_pmu.c
>>  create mode 100644 drivers/perf/hisilicon/hisi_uncore_pmu.h
>>
>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>> index e5197ff..78fc4bc 100644
>> --- a/drivers/perf/Kconfig
>> +++ b/drivers/perf/Kconfig
>> @@ -17,6 +17,13 @@ config ARM_PMU_ACPI
>>  depends on ARM_PMU && ACPI
>>  def_bool y
>>  
>> +config HISI_PMU
>> +bool "HiSilicon SoC PMU"
>> +depends on ARM64 && ACPI
>> +help
>> +  Support for HiSilicon SoC uncore performance monitoring
>> +  unit (PMU), such as: L3C, HHA and DDRC.
>> +
>>  config QCOM_L2_PMU
>>  bool "Qualcomm Technologies L2-cache PMU"
>>  depends on ARCH_QCOM && ARM64 && ACPI
>> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
>> index 6420bd4..41d3342 100644
>> --- a/drivers/perf/Makefile
>> +++ b/drivers/perf/Makefile
>> @@ -1,5 +1,6 @@
>>  obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
>>  obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
>> +obj-$(CONFIG_HISI_PMU) += hisilicon/
>>  obj-$(CONFIG_QCOM_L2_PMU)   += qcom_l2_pmu.o
>>  obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
>>  obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
>> diff --git a/drivers/perf/hisilicon/Makefile 
>> b/drivers/perf/hisilicon/Makefile
>> new file mode 100644
>> index 000..2783bb3
>> --- /dev/null
>> +++ b/drivers/perf/hisilicon/Makefile
>> @@ -0,0 +1 @@
>> +obj-$(CONFIG_HISI_PMU) += hisi_uncore_pmu.o
>> diff --git a/drivers/perf/hisilicon/hisi_uncore_pmu.c 
>> b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> new file mode 100644
>> index 000..4eb04e1
>> --- /dev/null
>> +++ b/drivers/perf/hisilicon/hisi_uncore_pmu.c
>> @@ -0,0 +1,411 @@
>> +/*
>> + * HiSilicon SoC Hardware event counters support
>> + *
>> + * Copyright (C) 2017 Hisilicon Limited
>> + * Author: Anurup M 
>> + * Shaokun Zhang 
>> + *
>> + * This code is based on the uncore PMUs like arm-cci and arm-ccn.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include "hisi_uncore_pmu.h"
>> +
>> +/*
>> + * PMU format attributes
>> + */
>> +ssize_t hisi_format_sysfs_show(struct device *dev,
>> +   struct device_attribute *attr, char *buf)
>> +{
>> +struct dev_ext_attribute *eattr;
>> +
>> +eattr = container_of(attr, struct dev_ext_attribute, attr);
>> +
>> +return sprintf(buf, "%s\n", (char *)eattr->var);
>> +}
>> +
>> +/*
>> + * PMU event attributes
>> + */
>> +ssize_t hisi_event_sysfs_show(struct device *dev,
>> +  struct device_attribute *attr, char *page)
>> +{
>> +struct dev_ext_attribute *eattr;
>> +
>> +eattr = container_of(attr, struct dev_ext_attribute, attr);
>> +
>> +return sprintf(page, "config=0x%lx\n", (unsigned long)eattr->var);
>> +}
>> +
>> +/*
>> + * sysfs cpumask attributes
>> + */
>> +ssize_t hisi_cpumask_sysfs_show(struct device *dev,
>> +struct device_attribute *attr, char *buf)
>> +{
>> +struct pmu *pmu = dev_get_drvdata(dev);
>> +struct hisi_pmu *hisi_pmu = to_hisi_pmu(pmu);
>> +
>> +return cpumap_print_to_pagebuf(true, buf, _pmu->cpus);
>> +}
>> +
>> +/* Read Super CPU cluster and CPU cluster ID from MPIDR_EL1 */
>> +void hisi_read_scl_and_ccl_id(u32 *scl_id, u32 *ccl_id)
>> +{
>> +u64 mpidr;
>> +
>> +mpidr = read_cpuid_mpidr();
>> +if (mpidr & MPIDR_MT_BITMASK) {
>> +if (scl_id)
>> +*scl_id = MPIDR_AFFINITY_LEVEL(mpidr, 3);
>> +if 

Re: [PATCH v3 1/6] Documentation: perf: hisi: Documentation for HiSilicon SoC PMU driver

2017-07-20 Thread Will Deacon
On Thu, Jul 20, 2017 at 08:54:36PM +0800, Zhangshaokun wrote:
> On 2017/7/19 17:17, Jonathan Cameron wrote:
> >> +Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has two HHAs
> >> +(0 - 1) and four DDRCs (0 - 3), respectively.
> >> +
> >> +HiSilicon SoC uncore PMU driver
> >> +---
> >> +Each device PMU has separate registers for event counting, control and
> >> +interrupt, and the PMU driver shall register perf PMU drivers like L3C,
> >> +HHA and DDRC etc. The available events and configuration options shall
> >> +be described in the sysfs, see /sys/devices/hisi_*.  
> > Is there not a subsystem directory that would make more sense to
> > refer to than the full device list?
> > 
> 
> For uncore devices, it is more reasonable to list in /sys/devices/***.

The usual place for these things is /sys/bus/event_source/devices/.

The events are described as files under the events directory (one file per
event, which describes the encoding) and you can describe other things such
as formatting and/or capabilities under other directories too.

Take a look at some other PMU drivers to get an idea of how it works.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/6] Documentation: perf: hisi: Documentation for HiSilicon SoC PMU driver

2017-07-20 Thread Zhangshaokun
Hi Jonathan,

Thanks for your comments firstly.

On 2017/7/19 17:17, Jonathan Cameron wrote:
> On Tue, 18 Jul 2017 15:59:54 +0800
> Shaokun Zhang  wrote:
> 
>> This patch adds documentation for the uncore PMUs on HiSilicon SoC.
>>
>> Signed-off-by: Shaokun Zhang 
>> Signed-off-by: Anurup M   
> Hi Shaokun,
> 
> Sorry for the late reply on this (only recently joined Huawei)
> 
> This is a fairly generic review of the code rather than going into
> the actual userspace ABI choices as this is an area I'm only just
> starting to become familiar with.
> 
> Thanks,
> 
> Jonathan
>> ---
>>  Documentation/perf/hisi-pmu.txt | 51 
>> +
>>  1 file changed, 51 insertions(+)
>>  create mode 100644 Documentation/perf/hisi-pmu.txt
>>
>> diff --git a/Documentation/perf/hisi-pmu.txt 
>> b/Documentation/perf/hisi-pmu.txt
>> new file mode 100644
>> index 000..5fa0b1a
>> --- /dev/null
>> +++ b/Documentation/perf/hisi-pmu.txt
>> @@ -0,0 +1,51 @@
>> +HiSilicon SoC uncore Performance Monitoring Unit (PMU)
>> +==
>> +The HiSilicon SoC chip comprehends various independent system device PMUs
>> +such as L3 cache (L3C), Hydra Home Agent (HHA) and DDRC. These PMUs are
>> +independent and have hardware logic to gather statistics and performance
>> +information.
>> +
>> +HiSilicon SoC encapsulates multiple CPU and IO dies. Each CPU cluster (CC
>> +L) is made up of 4 cpu cores sharing one L3 cache; Each CPU die is called  
> nitpick, I'd not have a line break mid acronym. 
> 

Ok, shall keep CCL in the same line.

>> +Super CPU cluster (SCCL) and is made up of 6 CCLs. Each SCCL has two HHAs
>> +(0 - 1) and four DDRCs (0 - 3), respectively.
>> +
>> +HiSilicon SoC uncore PMU driver
>> +---
>> +Each device PMU has separate registers for event counting, control and
>> +interrupt, and the PMU driver shall register perf PMU drivers like L3C,
>> +HHA and DDRC etc. The available events and configuration options shall
>> +be described in the sysfs, see /sys/devices/hisi_*.  
> Is there not a subsystem directory that would make more sense to
> refer to than the full device list?
> 

For uncore devices, it is more reasonable to list in /sys/devices/***.

Thanks.
Shaokun

>> +The "perf list" command shall list the available events from sysfs.
>> +
>> +Each L3C, HHA and DDRC in one SCCL are registered as an separate PMU with 
>> perf.
>> +The PMU name will appear in event listing as hisi_module 
>> _.
>> +where "index-id" is the index of module and "sccl-id" is the identifier of
>> +the SCCL.
>> +e.g. hisi_l3c0_1/rd_hit_cpipe is READ_HIT_CPIPE event of L3C index #0 and 
>> SCCL
>> +ID #1.
>> +e.g. hisi_hha0_1/rx_operations is RX_OPERATIONS event of HHA index #0 and 
>> SCCL
>> +ID #1.
>> +
>> +The driver also provides a "cpumask" sysfs attribute, which shows the CPU 
>> core
>> +ID used to count the uncore PMU event.
>> +
>> +Example usage of perf:
>> +$# perf list
>> +hisi_l3c0_3/rd_hit_cpipe/ [kernel PMU event]
>> +--
>> +hisi_l3c0_3/wr_hit_cpipe/ [kernel PMU event]
>> +--
>> +hisi_l3c0_1/rd_hit_cpipe/ [kernel PMU event]
>> +--
>> +hisi_l3c0_1/wr_hit_cpipe/ [kernel PMU event]
>> +--
>> +
>> +$# perf stat -a -e hisi_l3c0_1/rd_hit_cpipe/ sleep 5
>> +$# perf stat -a -e hisi_l3c0_1/config=0x02/ sleep 5
>> +
>> +The current driver does not support sampling. So "perf record" is 
>> unsupported.
>> +Also attach to a task is unsupported as the events are all uncore.
>> +
>> +Note: Please contact the maintainer for a complete list of events supported 
>> for
>> +the PMU devices in the SoC and its information if needed.  
> 
> 
> ___
> linuxarm mailing list
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Akira Yokosawa
On 2017/07/20 14:47, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
 On 2017/07/20 2:43, Paul E. McKenney wrote:
> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
>> From: Akira Yokosawa 
>> Date: Mon, 17 Jul 2017 16:25:33 +0900
>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example
>>
>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
>> no-transitivity example"), the operator in "if" statement of
>> the two-CPU example was modified from ">=" to ">".
>> Now the example misses the point because there is no party
>> who will modify "x" nor "y". So each CPU performs only the
>> READ_ONCE().
>>
>> The point of this example is to use control dependency for ordering,
>> and the WRITE_ONCE() should always be executed.
>>
>> So it was correct prior to the above mentioned commit. Partial
>> revert of the commit (with context adjustments regarding other
>> changes thereafter) restores the point.
>>
>> Note that the three-CPU example demonstrating the lack of
>> transitivity stands regardless of this partial revert.
>>
>> Signed-off-by: Akira Yokosawa 
>
> Hello, Akira,
>
> You are quite right that many compilers would generate straightforward
> code for the code fragment below, and in that case, the assertion could
> never trigger due to either TSO or control dependencies, depending on
> the architecture this was running on.
>
> However, if the compiler was too smart for our good, it could figure
> out that "x" and "y" only take on the values zero and one, so that
> the "if" would always be taken.  At that point, the compiler could
> simply ignore the "if" with the result shown below.
>
>> ---
>>  Documentation/memory-barriers.txt | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/Documentation/memory-barriers.txt 
>> b/Documentation/memory-barriers.txt
>> index c4ddfcd..c1ebe99 100644
>> --- a/Documentation/memory-barriers.txt
>> +++ b/Documentation/memory-barriers.txt
>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the 
>> initial values of
>>  CPU 0 CPU 1
>>  ===   ===
>>  r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>> -if (r1 > 0)   if (r2 > 0)
>> +if (r1 >= 0)  if (r2 >= 0)
>>WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>>
>>  assert(!(r1 == 1 && r2 == 1));
>
> Original program:
>
>   CPU 0 CPU 1
>   ===   ===
>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>   if (r1 >= 0)  if (r2 >= 0)
> WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>
>   assert(!(r1 == 1 && r2 == 1));
>
> Enthusiastically optimized program:
>
>   CPU 0 CPU 1
>   ===   ===
>   r1 = READ_ONCE(x);r2 = READ_ONCE(y);
>   WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
>
>   assert(!(r1 == 1 && r2 == 1));
>
> This optimized version could trigger the assertion.
>
> So we do need to stick with the ">" comparison.

 Well but,

 Current example:

CPU 0 CPU 1
===   ===
r1 = READ_ONCE(x);r2 = READ_ONCE(y);
if (r1 > 0)   if (r2 > 0)
  WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);

assert(!(r1 == 1 && r2 == 1));

 Such a clever compiler might be able to prove that "x" and "y"
 are never modified and end up in the following:

>>
>> Hi Akira,
>>
>> I wouldn't call that compiler a clever one, it's a broken one ;-)
>>
>> So here is the thing: READ_ONCE() is a *volatile* load, which means the
>> compiler has to generate code that actually does a load, so the values
>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 
>> optimize the if()s out because the volatile semantics of READ_ONCE().
>> Otherwise, I think we have a lot more to worry about than this case.
>>
CPU 0 CPU 1
===   ===
r1 = READ_ONCE(x);r2 = READ_ONCE(y);

assert(!(r1 == 1 && r2 == 1));

 This means it is impossible to describe this example in C,
 doesn't it?
>>>
>>> That is a matter of some debate, but it has gotten increasingly 

Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Paul E. McKenney
On Thu, Jul 20, 2017 at 02:14:34PM +0800, Boqun Feng wrote:
> On Wed, Jul 19, 2017 at 10:47:04PM -0700, Paul E. McKenney wrote:
> [...]
> > > Hi Paul,
> > > 
> > > I know the compiler could optimize atomics in very interesting ways, but
> > > this case is about volatile, so I guess our case is still fine? ;-)
> > 
> > Hello, Boqun,
> > 
> > When I asked that question, the answer I got was "the compiler must
> > emit the load instruction, but is under no obligation to actually use the
> > value loaded".
> > 
> > I don't happen to like that answer, by the way.  ;-)
> > 
> 
> Me neither, seems to me the kernel happens to work well at
> compiler-optimization's mercy ;-/
> 
> With claim like that, compiler could do optimization as turning:
> 
>   struct task_struct *owner;
> 
>   for (;;) {
>   owner = READ_ONCE(lock->owner);
>   if (owner && !mutex_spin_on_owner(lock, owner))
>   break;
>   /* ... */
> 
> into:
> 
>   struct task_struct *owner;
> 
>   owner = READ_ONCE(lock->owner);
> 
>   for (;;READ_ONCE(lock->owner)) {
>   if (owner && !mutex_spin_on_owner(lock, owner))
>   break;
>   /* ... */
> 
> Because the load executed in every loop, and they just happen to choose
> not to use the values. And this is within their rights!

Well, this is one reason that I attend standards-committee meetings.
As does Will Deacon.  That way, there is someone there to protest when
people argue that the above behavior is just fine.  ;-)

Thanx, Paul

> Regards,
> Boqun
> 
> > Thanx, Paul
> > 
> > > > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html
> > > > 
> > > 
> > > Great material to wake up mind in the morning! Thanks.
> > > 
> > > Regards,
> > > Boqun
> > > 
> > > > What are your thoughts on this?
> > > > 
> > > > Thanx, Paul
> > > > 
> > > > > Thanks, Akira
> > > > > 
> > > > > > That said, I very much welcome critical reviews of 
> > > > > > memory-barriers.txt,
> > > > > > so please do feel free to continue doing that!
> > > > > > 
> > > > > > Thanx, Paul
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/5] dax: use common 4k zero page for dax mmap reads

2017-07-20 Thread Jan Kara
On Wed 19-07-17 10:26:45, Ross Zwisler wrote:
> On Wed, Jul 19, 2017 at 05:33:14PM +0200, Jan Kara wrote:
> > On Wed 28-06-17 16:01:50, Ross Zwisler wrote:
> > > Another major change is that we remove dax_pfn_mkwrite() from our fault
> > > flow, and instead rely on the page fault itself to make the PTE dirty and
> > > writeable.  The following description from the patch adding the
> > > vm_insert_mixed_mkwrite() call explains this a little more:
> > > 
> > > ***
> > >   To be able to use the common 4k zero page in DAX we need to have our PTE
> > >   fault path look more like our PMD fault path where a PTE entry can be
> > >   marked as dirty and writeable as it is first inserted, rather than
> > >   waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault() 
> > > call.
> > > 
> > >   Right now we can rely on having a dax_pfn_mkwrite() call because we can
> > >   distinguish between these two cases in do_wp_page():
> > > 
> > >   case 1: 4k zero page => writable DAX storage
> > >   case 2: read-only DAX storage => writeable DAX storage
> > > 
> > >   This distinction is made by via vm_normal_page().  vm_normal_page()
> > >   returns false for the common 4k zero page, though, just as it does for
> > >   DAX ptes.  Instead of special casing the DAX + 4k zero page case, we 
> > > will
> > >   simplify our DAX PTE page fault sequence so that it matches our DAX PMD
> > >   sequence, and get rid of dax_pfn_mkwrite() completely.
> > > 
> > >   This means that insert_pfn() needs to follow the lead of 
> > > insert_pfn_pmd()
> > >   and allow us to pass in a 'mkwrite' flag.  If 'mkwrite' is set
> > >   insert_pfn() will do the work that was previously done by 
> > > wp_page_reuse()
> > >   as part of the dax_pfn_mkwrite() call path.
> > > ***
> > 
> > Hum, thinking about this in context of this patch... So what if we have
> > allocated storage, a process faults it read-only, we map it to page tables
> > writeprotected. Then the process writes through mmap to the area - the code
> > in handle_pte_fault() ends up in do_wp_page() if I'm reading it right.
> 
> Yep.
> 
> > Then, since we are missing ->pfn_mkwrite() handlers, the PTE will be marked
> > writeable but radix tree entry stays clean - bug. Am I missing something?
> 
> I don't think we ever end up with a writeable PTE but with a clean radix tree
> entry.  When we get the write fault we do a full fault through
> dax_iomap_pte_fault() and dax_insert_mapping().
>
> dax_insert_mapping() sets up the dirty radix tree entry via
> dax_insert_mapping_entry() before it does anything with the page tables via
> vm_insert_mixed_mkwrite().
> 
> So, this mkwrite fault path is exactly the path we would have taken if the
> initial read to real storage hadn't happened, and we end up in the same end
> state - with a dirty DAX radix tree entry and a writeable PTE.

Ah sorry, I have missed that it is not that you would not have
->pfn_mkwrite() handler - you still have it but it is the same as standard
fault handler now. So maybe can you rephrase the changelog a bit saying
that: "We get rid of dax_pfn_mkwrite() helper and use dax_iomap_fault() to
handle write-protection faults instead." Thanks!

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] fs/dcache: Limit numbers of negative dentries

2017-07-20 Thread Miklos Szeredi
On Wed, Jul 19, 2017 at 10:42 PM, Waiman Long  wrote:
> On 07/19/2017 04:24 PM, Miklos Szeredi wrote:
>> On Mon, Jul 17, 2017 at 3:39 PM, Waiman Long  wrote:
>>> The number of positive dentries is limited by the number of files
>>> in the filesystems. The number of negative dentries, however,
>>> has no limit other than the total amount of memory available in
>>> the system. So a rogue application that generates a lot of negative
>>> dentries can potentially exhaust most of the memory available in the
>>> system impacting performance on other running applications.
>>>
>>> To prevent this from happening, the dcache code is now updated to limit
>>> the amount of the negative dentries in the LRU lists that can be kept
>>> as a percentage of total available system memory. The default is 5%
>>> and can be changed by specifying the "neg_dentry_pc=" kernel command
>>> line option.
>>>
>>> Signed-off-by: Waiman Long 
>>> ---
>> [...]
>>
>>> @@ -603,7 +698,13 @@ static struct dentry *dentry_kill(struct dentry 
>>> *dentry)
>>>
>>> if (!IS_ROOT(dentry)) {
>>> parent = dentry->d_parent;
>>> -   if (unlikely(!spin_trylock(>d_lock))) {
>>> +   /*
>>> +* Force the killing of this negative dentry when
>>> +* DCACHE_KILL_NEGATIVE flag is set.
>>> +*/
>>> +   if (unlikely(dentry->d_flags & DCACHE_KILL_NEGATIVE)) {
>>> +   spin_lock(>d_lock);
>> This looks like d_lock ordering problem (should be parent first, child
>> second).  Why is this needed, anyway?
>>
>
> Yes, that is a bug. I should have used lock_parent() instead.

lock_parent() can release dentry->d_lock, which means it's perfectly
useless for this.

I still feel forcing  free is wrong here.  Why not just block until
the number of negatives goes below the limit (start reclaim if not
already doing so, etc...)?

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 101/102] Documentation: devres: add explicit exclusive/shared reset control request calls

2017-07-20 Thread Lee Jones
On Wed, 19 Jul 2017, Philipp Zabel wrote:

> Commit a53e35db70d1 ("reset: Ensure drivers are explicit when requesting
> reset lines") started to transition the reset control request API calls
> to explicitly state whether the driver needs exclusive or shared reset
> control behavior. Add the explicit API calls to the devres list.
> 
> Cc: Jonathan Corbet 
> Cc: Lee Jones 
> Cc: linux-doc@vger.kernel.org
> Signed-off-by: Philipp Zabel 
> ---
>  Documentation/driver-model/devres.txt | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Acked-by: Lee Jones 

> diff --git a/Documentation/driver-model/devres.txt 
> b/Documentation/driver-model/devres.txt
> index 30e04f7a690dd..fd157078dd558 100644
> --- a/Documentation/driver-model/devres.txt
> +++ b/Documentation/driver-model/devres.txt
> @@ -379,7 +379,12 @@ REGULATOR
>devm_regulator_register()
>  
>  RESET
> -  devm_reset_control_get()
> +  devm_reset_control_get_exclusive()
> +  devm_reset_control_get_shared()
> +  devm_reset_control_get_optional_exclusive()
> +  devm_reset_control_get_optional_shared()
> +  devm_reset_control_get_exclusive_by_index()
> +  devm_reset_control_get_shared_by_index()
>devm_reset_controller_register()
>  
>  SLAVE DMA ENGINE

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v6 27/62] powerpc: helper to validate key-access permissions of a pte

2017-07-20 Thread Aneesh Kumar K.V
Ram Pai  writes:

> helper function that checks if the read/write/execute is allowed
> on the pte.
>
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/pgtable.h |4 +++
>  arch/powerpc/include/asm/pkeys.h |   12 +
>  arch/powerpc/mm/pkeys.c  |   33 
> ++
>  3 files changed, 49 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 30d7f55..0056e58 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -472,6 +472,10 @@ static inline void write_uamor(u64 value)
>   mtspr(SPRN_UAMOR, value);
>  }
>
> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> +extern bool arch_pte_access_permitted(u64 pte, bool write, bool execute);
> +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> +
>  #define __HAVE_ARCH_PTEP_GET_AND_CLEAR
>  static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
>  unsigned long addr, pte_t *ptep)
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index bbb5d85..7a9aade 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -53,6 +53,18 @@ static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
>   ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL));
>  }
>
> +static inline u16 pte_to_pkey_bits(u64 pteflags)
> +{
> + if (!pkey_inited)
> + return 0x0UL;

Do we really need that above check ? We should always find it
peky_inited to be set. 

> +
> + return (((pteflags & H_PAGE_PKEY_BIT0) ? 0x10 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT1) ? 0x8 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT2) ? 0x4 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT3) ? 0x2 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT4) ? 0x1 : 0x0UL));
> +}
> +


-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v6 26/62] powerpc: Program HPTE key protection bits

2017-07-20 Thread Aneesh Kumar K.V
Ram Pai  writes:

> Map the PTE protection key bits to the HPTE key protection bits,
> while creating HPTE  entries.
>
Reviewed-by: Aneesh Kumar K.V 

> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h |5 +
>  arch/powerpc/include/asm/pkeys.h  |   12 
>  arch/powerpc/mm/hash_utils_64.c   |4 
>  3 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
> b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 6981a52..f7a6ed3 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -90,6 +90,8 @@
>  #define HPTE_R_PP0   ASM_CONST(0x8000)
>  #define HPTE_R_TSASM_CONST(0x4000)
>  #define HPTE_R_KEY_HIASM_CONST(0x3000)
> +#define HPTE_R_KEY_BIT0  ASM_CONST(0x2000)
> +#define HPTE_R_KEY_BIT1  ASM_CONST(0x1000)
>  #define HPTE_R_RPN_SHIFT 12
>  #define HPTE_R_RPN   ASM_CONST(0x0000)
>  #define HPTE_R_RPN_3_0   ASM_CONST(0x01fff000)
> @@ -104,6 +106,9 @@
>  #define HPTE_R_C ASM_CONST(0x0080)
>  #define HPTE_R_R ASM_CONST(0x0100)
>  #define HPTE_R_KEY_LOASM_CONST(0x0e00)
> +#define HPTE_R_KEY_BIT2  ASM_CONST(0x0800)
> +#define HPTE_R_KEY_BIT3  ASM_CONST(0x0400)
> +#define HPTE_R_KEY_BIT4  ASM_CONST(0x0200)
>
>  #define HPTE_V_1TB_SEG   ASM_CONST(0x4000)
>  #define HPTE_V_VRMA_MASK ASM_CONST(0x4001ff00)
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index ad39db0..bbb5d85 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -41,6 +41,18 @@ static inline u64 vmflag_to_page_pkey_bits(u64 vm_flags)
>   ((vm_flags & VM_PKEY_BIT4) ? H_PAGE_PKEY_BIT0 : 0x0UL));
>  }
>
> +static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
> +{
> + if (!pkey_inited)
> + return 0x0UL;
> +
> + return (((pteflags & H_PAGE_PKEY_BIT0) ? HPTE_R_KEY_BIT0 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT1) ? HPTE_R_KEY_BIT1 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT2) ? HPTE_R_KEY_BIT2 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT3) ? HPTE_R_KEY_BIT3 : 0x0UL) |
> + ((pteflags & H_PAGE_PKEY_BIT4) ? HPTE_R_KEY_BIT4 : 0x0UL));
> +}
> +
>  static inline int vma_pkey(struct vm_area_struct *vma)
>  {
>   if (!pkey_inited)
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index f88423b..1e74529 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -231,6 +231,10 @@ unsigned long htab_convert_pte_flags(unsigned long 
> pteflags)
>*/
>   rflags |= HPTE_R_M;
>
> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> + rflags |= pte_to_hpte_pkey_bits(pteflags);
> +#endif
> +
>   return rflags;
>  }
>
> -- 
> 1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] documentation: Fix two-CPU control-dependency example

2017-07-20 Thread Boqun Feng
On Wed, Jul 19, 2017 at 10:47:04PM -0700, Paul E. McKenney wrote:
[...]
> > Hi Paul,
> > 
> > I know the compiler could optimize atomics in very interesting ways, but
> > this case is about volatile, so I guess our case is still fine? ;-)
> 
> Hello, Boqun,
> 
> When I asked that question, the answer I got was "the compiler must
> emit the load instruction, but is under no obligation to actually use the
> value loaded".
> 
> I don't happen to like that answer, by the way.  ;-)
> 

Me neither, seems to me the kernel happens to work well at
compiler-optimization's mercy ;-/

With claim like that, compiler could do optimization as turning:

struct task_struct *owner;

for (;;) {
owner = READ_ONCE(lock->owner);
if (owner && !mutex_spin_on_owner(lock, owner))
break;
/* ... */

into:

struct task_struct *owner;

owner = READ_ONCE(lock->owner);

for (;;READ_ONCE(lock->owner)) {
if (owner && !mutex_spin_on_owner(lock, owner))
break;
/* ... */

Because the load executed in every loop, and they just happen to choose
not to use the values. And this is within their rights!

Regards,
Boqun

>   Thanx, Paul
> 
> > > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html
> > > 
> > 
> > Great material to wake up mind in the morning! Thanks.
> > 
> > Regards,
> > Boqun
> > 
> > > What are your thoughts on this?
> > > 
> > >   Thanx, Paul
> > > 
> > > > Thanks, Akira
> > > > 
> > > > > That said, I very much welcome critical reviews of 
> > > > > memory-barriers.txt,
> > > > > so please do feel free to continue doing that!
> > > > > 
> > > > >   Thanx, Paul
> > > > > 
> > > > > 
> > > > 
> > > 
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v6 11/62] powerpc: initial pkey plumbing

2017-07-20 Thread Aneesh Kumar K.V
Ram Pai  writes:

> basic setup to initialize the pkey system. Only 64K kernel in HPT
> mode, enables the pkey system.
>
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/Kconfig   |   16 ++
>  arch/powerpc/include/asm/mmu_context.h |5 +++
>  arch/powerpc/include/asm/pkeys.h   |   51 
> 
>  arch/powerpc/kernel/setup_64.c |4 ++
>  arch/powerpc/mm/Makefile   |1 +
>  arch/powerpc/mm/hash_utils_64.c|1 +
>  arch/powerpc/mm/pkeys.c|   18 +++
>  7 files changed, 96 insertions(+), 0 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/pkeys.h
>  create mode 100644 arch/powerpc/mm/pkeys.c
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index bf4391d..5c60fd6 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -855,6 +855,22 @@ config SECCOMP
>
> If unsure, say Y. Only embedded should say N here.
>
> +config PPC64_MEMORY_PROTECTION_KEYS
> + prompt "PowerPC Memory Protection Keys"
> + def_bool y
> + # Note: only available in 64-bit mode
> + depends on PPC64 && PPC_64K_PAGES
> + select ARCH_USES_HIGH_VMA_FLAGS
> + select ARCH_HAS_PKEYS
> + ---help---
> +   Memory Protection Keys provides a mechanism for enforcing
> +   page-based protections, but without requiring modification of the
> +   page tables when an application changes protection domains.
> +
> +   For details, see Documentation/vm/protection-keys.txt
> +
> +   If unsure, say y.
> +
>  endmenu


Shouldn't this come as the last patch ? Or can we enable this config by
this patch ? If so what does it do ? Did you test boot each of this
patch to make sure we don't break git bisect ?

>
>  config ISA_DMA_API
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index da7e943..4b93547 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -181,5 +181,10 @@ static inline bool arch_vma_access_permitted(struct 
> vm_area_struct *vma,
>   /* by default, allow everything */


-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html