Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-04 Thread Segher Boessenkool
On Wed, Sep 04, 2019 at 01:23:36PM +1000, Alastair D'Silva wrote:
> > Maybe also add "msr" in the clobbers.
> > 
> Ok.

There is no known register "msr" in GCC.


Segher


RE: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 22:11 +0200, Gabriel Paubert wrote:
> On Tue, Sep 03, 2019 at 01:31:57PM -0500, Segher Boessenkool wrote:
> > On Tue, Sep 03, 2019 at 07:05:19PM +0200, Christophe Leroy wrote:
> > > Le 03/09/2019 à 18:04, Segher Boessenkool a écrit :
> > > > (Why are they separate though?  It could just be one loop var).
> > > 
> > > Yes it could just be a single loop var, but in that case it would
> > > have 
> > > to be reset at the start of the second loop, which means we would
> > > have 
> > > to pass 'addr' for resetting the loop anyway,
> > 
> > Right, I noticed that after hitting send, as usual.
> > 
> > > so I opted to do it 
> > > outside the inline asm by using to separate loop vars set to
> > > their 
> > > starting value outside the inline asm.
> > 
> > The thing is, the way it is written now, it will get separate
> > registers
> > for each loop (with proper earlyclobbers added).  Not that that
> > really
> > matters of course, it just feels wrong :-)
> 
> After "mtmsr %3", it is always possible to copy %0 to %3 and use it
> as
> an address register for the second loop. One register less to
> allocate
> for the compiler. Constraints of course have to be adjusted.
> 
> 

Given that we're dealing with registers holding data that has been
named outside the assembler, this feels dirty. We'd be using the
register passed in as 'msr' to hold the address instead.

Since we're not short on registers, I don't see this as a good change.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 11:04 -0500, Segher Boessenkool wrote:
> On Tue, Sep 03, 2019 at 04:28:09PM +0200, Christophe Leroy wrote:
> > Le 03/09/2019 à 15:04, Segher Boessenkool a écrit :
> > > On Tue, Sep 03, 2019 at 03:23:57PM +1000, Alastair D'Silva wrote:
> > > > +   asm volatile(
> > > > +   "   mtctr %2;"
> > > > +   "   mtmsr %3;"
> > > > +   "   isync;"
> > > > +   "0: dcbst   0, %0;"
> > > > +   "   addi%0, %0, %4;"
> > > > +   "   bdnz0b;"
> > > > +   "   sync;"
> > > > +   "   mtctr %2;"
> > > > +   "1: icbi0, %1;"
> > > > +   "   addi%1, %1, %4;"
> > > > +   "   bdnz1b;"
> > > > +   "   sync;"
> > > > +   "   mtmsr %5;"
> > > > +   "   isync;"
> > > > +   : "+r" (loop1), "+r" (loop2)
> > > > +   : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
> > > > +   : "ctr", "memory");
> > > 
> > > This outputs as one huge assembler statement, all on one
> > > line.  That's
> > > going to be fun to read or debug.
> > 
> > Do you mean \n has to be added after the ; ?
> 
> Something like that.  There is no really satisfying way for doing
> huge
> inline asm, and maybe that is a good thing ;-)
> 
> Often people write \n\t at the end of each line of inline asm.  This
> works
> pretty well (but then there are labels, oh joy).
> 
> > > loop1 and/or loop2 can be assigned the same register as msr0 or
> > > nb.  They
> > > need to be made earlyclobbers.  (msr is fine, all of its reads
> > > are before
> > > any writes to loop1 or loop2; and bytes is fine, it's not a
> > > register).
> > 
> > Can you explicit please ? Doesn't '+r' means that they are input
> > and 
> > output at the same time ?
> 
> That is what + means, yes -- that this output is an input as
> well.  It is
> the same to write
> 
>   asm("mov %1,%0 ; mov %0,42" : "+r"(x), "=r"(y));
> or to write
>   asm("mov %1,%0 ; mov %0,42" : "=r"(x), "=r"(y) : "0"(x));
> 
> (So not "at the same time" as in "in the same machine instruction",
> but
> more loosely, as in "in the same inline asm statement").
> 
> > "to be made earlyclobbers", what does this means exactly ? How to
> > do that ?
> 
> You write &, like "+" in this case.  It means the machine code
> writes
> to this register before it has consumed all asm inputs (remember, GCC
> does not understand (or even parse!) the assembler string).
> 
> So just
> 
>   : "+" (loop1), "+" (loop2)
> 
> will do.  (Why are they separate though?  It could just be one loop
> var).
> 
> 

Thanks, I've updated these.

-- 
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819



RE: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Alastair D'Silva
On Tue, 2019-09-03 at 08:08 +0200, Christophe Leroy wrote:
> 
> Le 03/09/2019 à 07:23, Alastair D'Silva a écrit :
> > From: Alastair D'Silva 
> > 
> > Similar to commit 22e9c88d486a
> > ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
> > this patch converts the following ASM symbols to C:
> >  flush_icache_range()
> >  __flush_dcache_icache()
> >  __flush_dcache_icache_phys()
> > 
> > This was done as we discovered a long-standing bug where the length
> > of the
> > range was truncated due to using a 32 bit shift instead of a 64 bit
> > one.
> > 
> > By converting these functions to C, it becomes easier to maintain.
> > 
> > flush_dcache_icache_phys() retains a critical assembler section as
> > we must
> > ensure there are no memory accesses while the data MMU is disabled
> > (authored by Christophe Leroy). Since this has no external callers,
> > it has
> > also been made static, allowing the compiler to inline it within
> > flush_dcache_icache_page().
> > 
> > Signed-off-by: Alastair D'Silva 
> > Signed-off-by: Christophe Leroy 
> > ---
> >   arch/powerpc/include/asm/cache.h  |  26 ++---
> >   arch/powerpc/include/asm/cacheflush.h |  24 ++--
> >   arch/powerpc/kernel/misc_32.S | 117 
> >   arch/powerpc/kernel/misc_64.S | 102 -
> >   arch/powerpc/mm/mem.c | 152
> > +-
> >   5 files changed, 173 insertions(+), 248 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/cache.h
> > b/arch/powerpc/include/asm/cache.h
> > index f852d5cd746c..91c808c6738b 100644
> > --- a/arch/powerpc/include/asm/cache.h
> > +++ b/arch/powerpc/include/asm/cache.h
> > @@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
> >   #endif
> >   #endif /* ! __ASSEMBLY__ */
> >   
> > -#if defined(__ASSEMBLY__)
> > -/*
> > - * For a snooping icache, we still need a dummy icbi to purge all
> > the
> > - * prefetched instructions from the ifetch buffers. We also need a
> > sync
> > - * before the icbi to order the the actual stores to memory that
> > might
> > - * have modified instructions with the icbi.
> > - */
> > -#define PURGE_PREFETCHED_INS   \
> > -   sync;   \
> > -   icbi0,r3;   \
> > -   sync;   \
> > -   isync
> > -
> > -#else
> > +#if !defined(__ASSEMBLY__)
> >   #define __read_mostly
> > __attribute__((__section__(".data..read_mostly")))
> >   
> >   #ifdef CONFIG_PPC_BOOK3S_32
> > @@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
> >   {
> > __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) :
> > "memory");
> >   }
> > +
> > +static inline void icbi(void *addr)
> > +{
> > +   __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");
> 
> I think "__asm__ __volatile__" is deprecated. Use "asm volatile"
> instead.
> 

Ok.

> > +}
> > +
> > +static inline void iccci(void *addr)
> > +{
> > +   __asm__ __volatile__ ("iccci 0, %0" : : "r"(addr) : "memory");
> > +}
> > +
> 
> Same
> 
> >   #endif /* !__ASSEMBLY__ */
> >   #endif /* __KERNEL__ */
> >   #endif /* _ASM_POWERPC_CACHE_H */
> > diff --git a/arch/powerpc/include/asm/cacheflush.h
> > b/arch/powerpc/include/asm/cacheflush.h
> > index ed57843ef452..4a1c9f0200e1 100644
> > --- a/arch/powerpc/include/asm/cacheflush.h
> > +++ b/arch/powerpc/include/asm/cacheflush.h
> > @@ -42,24 +42,20 @@ extern void flush_dcache_page(struct page
> > *page);
> >   #define flush_dcache_mmap_lock(mapping)   do { } while
> > (0)
> >   #define flush_dcache_mmap_unlock(mapping) do { } while (0)
> >   
> > -extern void flush_icache_range(unsigned long, unsigned long);
> > +void flush_icache_range(unsigned long start, unsigned long stop);
> >   extern void flush_icache_user_range(struct vm_area_struct *vma,
> > struct page *page, unsigned long
> > addr,
> > int len);
> > -extern void __flush_dcache_icache(void *page_va);
> >   extern void flush_dcache_icache_page(struct page *page);
> > -#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
> > -extern void __flush_dcache_icache_phys(unsigned long physaddr);
> > -#else
> > -static inline void __flush_dcache_icache_phys(unsigned long
> > physaddr)
> > -{
> > -   BUG();
> > -}
> > -#endif
> > -
> > -/*
> > - * Write any modified data cache blocks out to memory and
> > invalidate them.
> > - * Does not invalidate the corresponding instruction cache blocks.
> > +void __flush_dcache_icache(void *page);
> > +
> > +/**
> > + * flush_dcache_range(): Write any modified data cache blocks out
> > to memory and
> > + * invalidate them. Does not invalidate the corresponding
> > instruction cache
> > + * blocks.
> > + *
> > + * @start: the start address
> > + * @stop: the stop address (exclusive)
> >*/
> >   static inline void flush_dcache_range(unsigned long start,
> > unsigned long stop)
> >   {
> > diff --git a/arch/powerpc/kernel/misc_32.S
> > b/arch/powerpc/kernel/misc_32.S
> > index 

Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Gabriel Paubert
On Tue, Sep 03, 2019 at 01:31:57PM -0500, Segher Boessenkool wrote:
> On Tue, Sep 03, 2019 at 07:05:19PM +0200, Christophe Leroy wrote:
> > Le 03/09/2019 à 18:04, Segher Boessenkool a écrit :
> > >(Why are they separate though?  It could just be one loop var).
> > 
> > Yes it could just be a single loop var, but in that case it would have 
> > to be reset at the start of the second loop, which means we would have 
> > to pass 'addr' for resetting the loop anyway,
> 
> Right, I noticed that after hitting send, as usual.
> 
> > so I opted to do it 
> > outside the inline asm by using to separate loop vars set to their 
> > starting value outside the inline asm.
> 
> The thing is, the way it is written now, it will get separate registers
> for each loop (with proper earlyclobbers added).  Not that that really
> matters of course, it just feels wrong :-)

After "mtmsr %3", it is always possible to copy %0 to %3 and use it as
an address register for the second loop. One register less to allocate
for the compiler. Constraints of course have to be adjusted.

Gabriel
> 
> 
> Segher


Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Segher Boessenkool
On Tue, Sep 03, 2019 at 07:05:19PM +0200, Christophe Leroy wrote:
> Le 03/09/2019 à 18:04, Segher Boessenkool a écrit :
> >(Why are they separate though?  It could just be one loop var).
> 
> Yes it could just be a single loop var, but in that case it would have 
> to be reset at the start of the second loop, which means we would have 
> to pass 'addr' for resetting the loop anyway,

Right, I noticed that after hitting send, as usual.

> so I opted to do it 
> outside the inline asm by using to separate loop vars set to their 
> starting value outside the inline asm.

The thing is, the way it is written now, it will get separate registers
for each loop (with proper earlyclobbers added).  Not that that really
matters of course, it just feels wrong :-)


Segher


Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Christophe Leroy




Le 03/09/2019 à 18:04, Segher Boessenkool a écrit :

On Tue, Sep 03, 2019 at 04:28:09PM +0200, Christophe Leroy wrote:

Le 03/09/2019 à 15:04, Segher Boessenkool a écrit :

On Tue, Sep 03, 2019 at 03:23:57PM +1000, Alastair D'Silva wrote:

+   asm volatile(
+   "   mtctr %2;"
+   "   mtmsr %3;"
+   "   isync;"
+   "0: dcbst   0, %0;"
+   "   addi%0, %0, %4;"
+   "   bdnz0b;"
+   "   sync;"
+   "   mtctr %2;"
+   "1: icbi0, %1;"
+   "   addi%1, %1, %4;"
+   "   bdnz1b;"
+   "   sync;"
+   "   mtmsr %5;"
+   "   isync;"
+   : "+r" (loop1), "+r" (loop2)
+   : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
+   : "ctr", "memory");


This outputs as one huge assembler statement, all on one line.  That's
going to be fun to read or debug.


Do you mean \n has to be added after the ; ?


Something like that.  There is no really satisfying way for doing huge
inline asm, and maybe that is a good thing ;-)

Often people write \n\t at the end of each line of inline asm.  This works
pretty well (but then there are labels, oh joy).


loop1 and/or loop2 can be assigned the same register as msr0 or nb.  They
need to be made earlyclobbers.  (msr is fine, all of its reads are before
any writes to loop1 or loop2; and bytes is fine, it's not a register).


Can you explicit please ? Doesn't '+r' means that they are input and
output at the same time ?


That is what + means, yes -- that this output is an input as well.  It is
the same to write

   asm("mov %1,%0 ; mov %0,42" : "+r"(x), "=r"(y));
or to write
   asm("mov %1,%0 ; mov %0,42" : "=r"(x), "=r"(y) : "0"(x));

(So not "at the same time" as in "in the same machine instruction", but
more loosely, as in "in the same inline asm statement").


"to be made earlyclobbers", what does this means exactly ? How to do that ?


You write &, like "+" in this case.  It means the machine code writes
to this register before it has consumed all asm inputs (remember, GCC
does not understand (or even parse!) the assembler string).

So just

: "+" (loop1), "+" (loop2)

will do.  (Why are they separate though?  It could just be one loop var).


Yes it could just be a single loop var, but in that case it would have 
to be reset at the start of the second loop, which means we would have 
to pass 'addr' for resetting the loop anyway, so I opted to do it 
outside the inline asm by using to separate loop vars set to their 
starting value outside the inline asm.


Christophe


Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Segher Boessenkool
On Tue, Sep 03, 2019 at 04:28:09PM +0200, Christophe Leroy wrote:
> Le 03/09/2019 à 15:04, Segher Boessenkool a écrit :
> >On Tue, Sep 03, 2019 at 03:23:57PM +1000, Alastair D'Silva wrote:
> >>+   asm volatile(
> >>+   "   mtctr %2;"
> >>+   "   mtmsr %3;"
> >>+   "   isync;"
> >>+   "0: dcbst   0, %0;"
> >>+   "   addi%0, %0, %4;"
> >>+   "   bdnz0b;"
> >>+   "   sync;"
> >>+   "   mtctr %2;"
> >>+   "1: icbi0, %1;"
> >>+   "   addi%1, %1, %4;"
> >>+   "   bdnz1b;"
> >>+   "   sync;"
> >>+   "   mtmsr %5;"
> >>+   "   isync;"
> >>+   : "+r" (loop1), "+r" (loop2)
> >>+   : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
> >>+   : "ctr", "memory");
> >
> >This outputs as one huge assembler statement, all on one line.  That's
> >going to be fun to read or debug.
> 
> Do you mean \n has to be added after the ; ?

Something like that.  There is no really satisfying way for doing huge
inline asm, and maybe that is a good thing ;-)

Often people write \n\t at the end of each line of inline asm.  This works
pretty well (but then there are labels, oh joy).

> >loop1 and/or loop2 can be assigned the same register as msr0 or nb.  They
> >need to be made earlyclobbers.  (msr is fine, all of its reads are before
> >any writes to loop1 or loop2; and bytes is fine, it's not a register).
> 
> Can you explicit please ? Doesn't '+r' means that they are input and 
> output at the same time ?

That is what + means, yes -- that this output is an input as well.  It is
the same to write

  asm("mov %1,%0 ; mov %0,42" : "+r"(x), "=r"(y));
or to write
  asm("mov %1,%0 ; mov %0,42" : "=r"(x), "=r"(y) : "0"(x));

(So not "at the same time" as in "in the same machine instruction", but
more loosely, as in "in the same inline asm statement").

> "to be made earlyclobbers", what does this means exactly ? How to do that ?

You write &, like "+" in this case.  It means the machine code writes
to this register before it has consumed all asm inputs (remember, GCC
does not understand (or even parse!) the assembler string).

So just

: "+" (loop1), "+" (loop2)

will do.  (Why are they separate though?  It could just be one loop var).


Segher


Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Christophe Leroy




Le 03/09/2019 à 15:04, Segher Boessenkool a écrit :

Hi!

On Tue, Sep 03, 2019 at 03:23:57PM +1000, Alastair D'Silva wrote:

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c



+#if !defined(CONFIG_PPC_8xx) & !defined(CONFIG_PPC64)


Please write that as &&?  That is more usual, and thus, easier to read.


+static void flush_dcache_icache_phys(unsigned long physaddr)



+   asm volatile(
+   "   mtctr %2;"
+   "   mtmsr %3;"
+   "   isync;"
+   "0: dcbst   0, %0;"
+   "   addi%0, %0, %4;"
+   "   bdnz0b;"
+   "   sync;"
+   "   mtctr %2;"
+   "1: icbi0, %1;"
+   "   addi%1, %1, %4;"
+   "   bdnz1b;"
+   "   sync;"
+   "   mtmsr %5;"
+   "   isync;"
+   : "+r" (loop1), "+r" (loop2)
+   : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
+   : "ctr", "memory");


This outputs as one huge assembler statement, all on one line.  That's
going to be fun to read or debug.


Do you mean \n has to be added after the ; ?



loop1 and/or loop2 can be assigned the same register as msr0 or nb.  They
need to be made earlyclobbers.  (msr is fine, all of its reads are before
any writes to loop1 or loop2; and bytes is fine, it's not a register).


Can you explicit please ? Doesn't '+r' means that they are input and 
output at the same time ?


"to be made earlyclobbers", what does this means exactly ? How to do that ?

Christophe


Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Segher Boessenkool
Hi!

On Tue, Sep 03, 2019 at 03:23:57PM +1000, Alastair D'Silva wrote:
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c

> +#if !defined(CONFIG_PPC_8xx) & !defined(CONFIG_PPC64)

Please write that as &&?  That is more usual, and thus, easier to read.

> +static void flush_dcache_icache_phys(unsigned long physaddr)

> + asm volatile(
> + "   mtctr %2;"
> + "   mtmsr %3;"
> + "   isync;"
> + "0: dcbst   0, %0;"
> + "   addi%0, %0, %4;"
> + "   bdnz0b;"
> + "   sync;"
> + "   mtctr %2;"
> + "1: icbi0, %1;"
> + "   addi%1, %1, %4;"
> + "   bdnz1b;"
> + "   sync;"
> + "   mtmsr %5;"
> + "   isync;"
> + : "+r" (loop1), "+r" (loop2)
> + : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
> + : "ctr", "memory");

This outputs as one huge assembler statement, all on one line.  That's
going to be fun to read or debug.

loop1 and/or loop2 can be assigned the same register as msr0 or nb.  They
need to be made earlyclobbers.  (msr is fine, all of its reads are before
any writes to loop1 or loop2; and bytes is fine, it's not a register).


Segher


Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Michael Ellerman
Christophe Leroy  writes:
> Le 03/09/2019 à 07:23, Alastair D'Silva a écrit :
>> From: Alastair D'Silva 
>> 
>> Similar to commit 22e9c88d486a
>> ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
>> this patch converts the following ASM symbols to C:
>>  flush_icache_range()
>>  __flush_dcache_icache()
>>  __flush_dcache_icache_phys()
>> 
>> This was done as we discovered a long-standing bug where the length of the
>> range was truncated due to using a 32 bit shift instead of a 64 bit one.
>> 
>> By converting these functions to C, it becomes easier to maintain.
>> 
>> flush_dcache_icache_phys() retains a critical assembler section as we must
>> ensure there are no memory accesses while the data MMU is disabled
>> (authored by Christophe Leroy). Since this has no external callers, it has
>> also been made static, allowing the compiler to inline it within
>> flush_dcache_icache_page().
>> 
>> Signed-off-by: Alastair D'Silva 
>> Signed-off-by: Christophe Leroy 
>> ---
>>   arch/powerpc/include/asm/cache.h  |  26 ++---
>>   arch/powerpc/include/asm/cacheflush.h |  24 ++--
>>   arch/powerpc/kernel/misc_32.S | 117 
>>   arch/powerpc/kernel/misc_64.S | 102 -
>>   arch/powerpc/mm/mem.c | 152 +-
>>   5 files changed, 173 insertions(+), 248 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/cache.h 
>> b/arch/powerpc/include/asm/cache.h
>> index f852d5cd746c..91c808c6738b 100644
>> --- a/arch/powerpc/include/asm/cache.h
>> +++ b/arch/powerpc/include/asm/cache.h
>> @@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
>>   #endif
>>   #endif /* ! __ASSEMBLY__ */
>>   
>> -#if defined(__ASSEMBLY__)
>> -/*
>> - * For a snooping icache, we still need a dummy icbi to purge all the
>> - * prefetched instructions from the ifetch buffers. We also need a sync
>> - * before the icbi to order the the actual stores to memory that might
>> - * have modified instructions with the icbi.
>> - */
>> -#define PURGE_PREFETCHED_INS\
>> -sync;   \
>> -icbi0,r3;   \
>> -sync;   \
>> -isync
>> -
>> -#else
>> +#if !defined(__ASSEMBLY__)
>>   #define __read_mostly __attribute__((__section__(".data..read_mostly")))
>>   
>>   #ifdef CONFIG_PPC_BOOK3S_32
>> @@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
>>   {
>>  __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
>>   }
>> +
>> +static inline void icbi(void *addr)
>> +{
>> +__asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");
>
> I think "__asm__ __volatile__" is deprecated. Use "asm volatile" instead.

Yes please.

>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>> index 9191a66b3bc5..cd540123874d 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -321,6 +321,105 @@ void free_initmem(void)
>>  free_initmem_default(POISON_FREE_INITMEM);
>>   }
>>   
>> +/*
>> + * Warning: This macro will perform an early return if the CPU has
>> + * a coherent icache. The intent is is call this early in function,
>> + * and handle the non-coherent icache variant afterwards.
>> + *
>> + * For a snooping icache, we still need a dummy icbi to purge all the
>> + * prefetched instructions from the ifetch buffers. We also need a sync
>> + * before the icbi to order the the actual stores to memory that might
>> + * have modified instructions with the icbi.
>> + */
>> +#define flush_coherent_icache_or_return(addr) { \
>> +if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) { \
>> +mb(); /* sync */\
>> +icbi(addr); \
>> +mb(); /* sync */\
>> +isync();\
>> +return; \
>> +}   \
>> +}
>
> I hate this kind of awful macro which kills code readability.

Yes I agree.

> Please to something like
>
> static bool flush_coherent_icache_or_return(unsigned long addr)
> {
>   if (!cpu_has_feature(CPU_FTR_COHERENT_ICACHE))
>   return false;
>
>   mb(); /* sync */
>   icbi(addr);
>   mb(); /* sync */
>   isync();
>   return true;
> }
>
> then callers will do:
>
>   if (flush_coherent_icache_or_return(addr))
>   return;

I don't think it needs the "_or_return" in the name.

eg, it can just be:

if (flush_coherent_icache(addr))
return;


Which reads fine I think, ie. flush the coherent icache, and if that
succeeds return, else continue.

cheers


Re: [PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-03 Thread Christophe Leroy




Le 03/09/2019 à 07:23, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
 flush_icache_range()
 __flush_dcache_icache()
 __flush_dcache_icache_phys()

This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.

By converting these functions to C, it becomes easier to maintain.

flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().

Signed-off-by: Alastair D'Silva 
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/include/asm/cache.h  |  26 ++---
  arch/powerpc/include/asm/cacheflush.h |  24 ++--
  arch/powerpc/kernel/misc_32.S | 117 
  arch/powerpc/kernel/misc_64.S | 102 -
  arch/powerpc/mm/mem.c | 152 +-
  5 files changed, 173 insertions(+), 248 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index f852d5cd746c..91c808c6738b 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
  #endif
  #endif /* ! __ASSEMBLY__ */
  
-#if defined(__ASSEMBLY__)

-/*
- * For a snooping icache, we still need a dummy icbi to purge all the
- * prefetched instructions from the ifetch buffers. We also need a sync
- * before the icbi to order the the actual stores to memory that might
- * have modified instructions with the icbi.
- */
-#define PURGE_PREFETCHED_INS   \
-   sync;   \
-   icbi0,r3;   \
-   sync;   \
-   isync
-
-#else
+#if !defined(__ASSEMBLY__)
  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
  
  #ifdef CONFIG_PPC_BOOK3S_32

@@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
  {
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
  }
+
+static inline void icbi(void *addr)
+{
+   __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");


I think "__asm__ __volatile__" is deprecated. Use "asm volatile" instead.


+}
+
+static inline void iccci(void *addr)
+{
+   __asm__ __volatile__ ("iccci 0, %0" : : "r"(addr) : "memory");
+}
+


Same


  #endif /* !__ASSEMBLY__ */
  #endif /* __KERNEL__ */
  #endif /* _ASM_POWERPC_CACHE_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index ed57843ef452..4a1c9f0200e1 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -42,24 +42,20 @@ extern void flush_dcache_page(struct page *page);
  #define flush_dcache_mmap_lock(mapping)   do { } while (0)
  #define flush_dcache_mmap_unlock(mapping) do { } while (0)
  
-extern void flush_icache_range(unsigned long, unsigned long);

+void flush_icache_range(unsigned long start, unsigned long stop);
  extern void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void __flush_dcache_icache(void *page_va);
  extern void flush_dcache_icache_page(struct page *page);
-#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
-extern void __flush_dcache_icache_phys(unsigned long physaddr);
-#else
-static inline void __flush_dcache_icache_phys(unsigned long physaddr)
-{
-   BUG();
-}
-#endif
-
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
- * Does not invalidate the corresponding instruction cache blocks.
+void __flush_dcache_icache(void *page);
+
+/**
+ * flush_dcache_range(): Write any modified data cache blocks out to memory and
+ * invalidate them. Does not invalidate the corresponding instruction cache
+ * blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
   */
  static inline void flush_dcache_range(unsigned long start, unsigned long stop)
  {
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index fe4bd321730e..12b95e6799d4 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -318,123 +318,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE)
  EXPORT_SYMBOL(flush_instruction_cache)
  #endif /* CONFIG_PPC_8xx */
  
-/*

- * Write any modified data cache blocks out to memory
- * and invalidate the corresponding instruction cache blocks.
- * This is a no-op on the 601.
- *
- * flush_icache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(flush_icache_range)
-BEGIN_FTR_SECTION
-   PURGE_PREFETCHED_INS
-   blr 

[PATCH v2 3/6] powerpc: Convert flush_icache_range & friends to C

2019-09-02 Thread Alastair D'Silva
From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts the following ASM symbols to C:
flush_icache_range()
__flush_dcache_icache()
__flush_dcache_icache_phys()

This was done as we discovered a long-standing bug where the length of the
range was truncated due to using a 32 bit shift instead of a 64 bit one.

By converting these functions to C, it becomes easier to maintain.

flush_dcache_icache_phys() retains a critical assembler section as we must
ensure there are no memory accesses while the data MMU is disabled
(authored by Christophe Leroy). Since this has no external callers, it has
also been made static, allowing the compiler to inline it within
flush_dcache_icache_page().

Signed-off-by: Alastair D'Silva 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/cache.h  |  26 ++---
 arch/powerpc/include/asm/cacheflush.h |  24 ++--
 arch/powerpc/kernel/misc_32.S | 117 
 arch/powerpc/kernel/misc_64.S | 102 -
 arch/powerpc/mm/mem.c | 152 +-
 5 files changed, 173 insertions(+), 248 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index f852d5cd746c..91c808c6738b 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void)
 #endif
 #endif /* ! __ASSEMBLY__ */
 
-#if defined(__ASSEMBLY__)
-/*
- * For a snooping icache, we still need a dummy icbi to purge all the
- * prefetched instructions from the ifetch buffers. We also need a sync
- * before the icbi to order the the actual stores to memory that might
- * have modified instructions with the icbi.
- */
-#define PURGE_PREFETCHED_INS   \
-   sync;   \
-   icbi0,r3;   \
-   sync;   \
-   isync
-
-#else
+#if !defined(__ASSEMBLY__)
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
 #ifdef CONFIG_PPC_BOOK3S_32
@@ -145,6 +132,17 @@ static inline void dcbst(void *addr)
 {
__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
 }
+
+static inline void icbi(void *addr)
+{
+   __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void iccci(void *addr)
+{
+   __asm__ __volatile__ ("iccci 0, %0" : : "r"(addr) : "memory");
+}
+
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index ed57843ef452..4a1c9f0200e1 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -42,24 +42,20 @@ extern void flush_dcache_page(struct page *page);
 #define flush_dcache_mmap_lock(mapping)do { } while (0)
 #define flush_dcache_mmap_unlock(mapping)  do { } while (0)
 
-extern void flush_icache_range(unsigned long, unsigned long);
+void flush_icache_range(unsigned long start, unsigned long stop);
 extern void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
-extern void __flush_dcache_icache(void *page_va);
 extern void flush_dcache_icache_page(struct page *page);
-#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
-extern void __flush_dcache_icache_phys(unsigned long physaddr);
-#else
-static inline void __flush_dcache_icache_phys(unsigned long physaddr)
-{
-   BUG();
-}
-#endif
-
-/*
- * Write any modified data cache blocks out to memory and invalidate them.
- * Does not invalidate the corresponding instruction cache blocks.
+void __flush_dcache_icache(void *page);
+
+/**
+ * flush_dcache_range(): Write any modified data cache blocks out to memory and
+ * invalidate them. Does not invalidate the corresponding instruction cache
+ * blocks.
+ *
+ * @start: the start address
+ * @stop: the stop address (exclusive)
  */
 static inline void flush_dcache_range(unsigned long start, unsigned long stop)
 {
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index fe4bd321730e..12b95e6799d4 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -318,123 +318,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE)
 EXPORT_SYMBOL(flush_instruction_cache)
 #endif /* CONFIG_PPC_8xx */
 
-/*
- * Write any modified data cache blocks out to memory
- * and invalidate the corresponding instruction cache blocks.
- * This is a no-op on the 601.
- *
- * flush_icache_range(unsigned long start, unsigned long stop)
- */
-_GLOBAL(flush_icache_range)
-BEGIN_FTR_SECTION
-   PURGE_PREFETCHED_INS
-   blr /* for 601, do nothing */
-END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
-   rlwinm  r3,r3,0,0,31 - L1_CACHE_SHIFT
-   subfr4,r3,r4
-