Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 04:13:35PM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
> >
> > On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > Similarly, an MMIO read may be used to see if a DMA has completed
> > > and the device register tells you that the DMA has left the device,
> > > but without a barrier, the CPU may have prefetched the DMA
> > > data while waiting for the MMIO-read to complete. The __io_ar()
> > > barrier() in asm-generic/io.h prevents the compiler from reordering
> > > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > > can bypass a strongly ordered read (MMIO), then it's still still
> > > broken.
> > __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> > I got rmb() here not barrier().
> >
> > Only __io_br() is barrier().
> 
> Ah right, I misremembered the defaults. It's probably ok then.
Thx for the review and comments. These let me re-consider the mmio
issues and help to improve the csky asm/io.h in future. 
> 
> > > > > - How does endianess work? Are there any buses that flip bytes around
> > > > >   when running big-endian, or do you always do that in software?
> > > > Currently we only support little-endian and soc will follow it.
> > >
> > > Ok, that makes it easier. If you think that you won't even need big-endian
> > > support in the long run, you could also remove your asm/byteorder.h
> > > header. If you're not sure, it doesn't hurt to keep it of course.
> > Em... I'm not sure, so let me keep it for a while.
> 
> Ok. I think overall the trend is to be little-endian only for most
> architectures: powerpc64 moved from big-endian only to little-endian
> by default, ARM rarely uses big-endian (basically only for legacy
> applications ported from BE MIPS or ppc), and all new architectures
> we added in the last years are little-endian (OpenRISC being the
> main exception).
Good news, I really don't want to support big-endian and it makes CI
double.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 04:13:35PM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
> >
> > On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > Similarly, an MMIO read may be used to see if a DMA has completed
> > > and the device register tells you that the DMA has left the device,
> > > but without a barrier, the CPU may have prefetched the DMA
> > > data while waiting for the MMIO-read to complete. The __io_ar()
> > > barrier() in asm-generic/io.h prevents the compiler from reordering
> > > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > > can bypass a strongly ordered read (MMIO), then it's still still
> > > broken.
> > __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> > I got rmb() here not barrier().
> >
> > Only __io_br() is barrier().
> 
> Ah right, I misremembered the defaults. It's probably ok then.
Thx for the review and comments. These let me re-consider the mmio
issues and help to improve the csky asm/io.h in future. 
> 
> > > > > - How does endianess work? Are there any buses that flip bytes around
> > > > >   when running big-endian, or do you always do that in software?
> > > > Currently we only support little-endian and soc will follow it.
> > >
> > > Ok, that makes it easier. If you think that you won't even need big-endian
> > > support in the long run, you could also remove your asm/byteorder.h
> > > header. If you're not sure, it doesn't hurt to keep it of course.
> > Em... I'm not sure, so let me keep it for a while.
> 
> Ok. I think overall the trend is to be little-endian only for most
> architectures: powerpc64 moved from big-endian only to little-endian
> by default, ARM rarely uses big-endian (basically only for legacy
> applications ported from BE MIPS or ppc), and all new architectures
> we added in the last years are little-endian (OpenRISC being the
> main exception).
Good news, I really don't want to support big-endian and it makes CI
double.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
>
> On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > Similarly, an MMIO read may be used to see if a DMA has completed
> > and the device register tells you that the DMA has left the device,
> > but without a barrier, the CPU may have prefetched the DMA
> > data while waiting for the MMIO-read to complete. The __io_ar()
> > barrier() in asm-generic/io.h prevents the compiler from reordering
> > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > can bypass a strongly ordered read (MMIO), then it's still still
> > broken.
> __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> I got rmb() here not barrier().
>
> Only __io_br() is barrier().

Ah right, I misremembered the defaults. It's probably ok then.

> > > > - How does endianess work? Are there any buses that flip bytes around
> > > >   when running big-endian, or do you always do that in software?
> > > Currently we only support little-endian and soc will follow it.
> >
> > Ok, that makes it easier. If you think that you won't even need big-endian
> > support in the long run, you could also remove your asm/byteorder.h
> > header. If you're not sure, it doesn't hurt to keep it of course.
> Em... I'm not sure, so let me keep it for a while.

Ok. I think overall the trend is to be little-endian only for most
architectures: powerpc64 moved from big-endian only to little-endian
by default, ARM rarely uses big-endian (basically only for legacy
applications ported from BE MIPS or ppc), and all new architectures
we added in the last years are little-endian (OpenRISC being the
main exception).

 Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
>
> On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > Similarly, an MMIO read may be used to see if a DMA has completed
> > and the device register tells you that the DMA has left the device,
> > but without a barrier, the CPU may have prefetched the DMA
> > data while waiting for the MMIO-read to complete. The __io_ar()
> > barrier() in asm-generic/io.h prevents the compiler from reordering
> > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > can bypass a strongly ordered read (MMIO), then it's still still
> > broken.
> __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> I got rmb() here not barrier().
>
> Only __io_br() is barrier().

Ah right, I misremembered the defaults. It's probably ok then.

> > > > - How does endianess work? Are there any buses that flip bytes around
> > > >   when running big-endian, or do you always do that in software?
> > > Currently we only support little-endian and soc will follow it.
> >
> > Ok, that makes it easier. If you think that you won't even need big-endian
> > support in the long run, you could also remove your asm/byteorder.h
> > header. If you're not sure, it doesn't hurt to keep it of course.
> Em... I'm not sure, so let me keep it for a while.

Ok. I think overall the trend is to be little-endian only for most
architectures: powerpc64 moved from big-endian only to little-endian
by default, ARM rarely uses big-endian (basically only for legacy
applications ported from BE MIPS or ppc), and all new architectures
we added in the last years are little-endian (OpenRISC being the
main exception).

 Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> >
> > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> > >
> > > Can you describe how C-Sky hardware implements MMIO?
> > Our mmio is uncachable and strong-order address, so there is no need
> > barriers for access these io addr.
> >
> >  #define ioremap_wc ioremap_nocache
> >  #define ioremap_wt ioremap_nocache
> >
> > Current ioremap_wc and ioremap_wt implementation are too simple and
> > we'll improve it in future.
> >
> > > In particular:
> > >
> > > - Is a read from uncached memory always serialized with DMA, and with
> > >   other CPUs doing MMIO access to a different address?
> > CPU use ld.w to get data from uncached strong order memory.
> > Other CPUs use the same mmio vaddr to access the uncachable strong order
> > memory paddr.
> 
> Ok, but what about the DMA? The most common requirement for
> serialization here is with a DMA transfer, where you first write
> into a buffer in memory, then write to an MMIO register to trigger
> a DMA-load, and then the device reads the data from memory.
> Without a barrier before the MMIO, the data may still be in a
> store queue of the CPU, and the DMA gets stale data.

> 
> Similarly, an MMIO read may be used to see if a DMA has completed
> and the device register tells you that the DMA has left the device,
> but without a barrier, the CPU may have prefetched the DMA
> data while waiting for the MMIO-read to complete. The __io_ar()
> barrier() in asm-generic/io.h prevents the compiler from reordering
> the two reads, but if an weakly ordered read (in coherent DMA buffer)
> can bypass a strongly ordered read (MMIO), then it's still still
> broken.
__io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
I got rmb() here not barrier().

Only __io_br() is barrier().

> > > - How does endianess work? Are there any buses that flip bytes around
> > >   when running big-endian, or do you always do that in software?
> > Currently we only support little-endian and soc will follow it.
> 
> Ok, that makes it easier. If you think that you won't even need big-endian
> support in the long run, you could also remove your asm/byteorder.h
> header. If you're not sure, it doesn't hurt to keep it of course.
Em... I'm not sure, so let me keep it for a while.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> >
> > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> > >
> > > Can you describe how C-Sky hardware implements MMIO?
> > Our mmio is uncachable and strong-order address, so there is no need
> > barriers for access these io addr.
> >
> >  #define ioremap_wc ioremap_nocache
> >  #define ioremap_wt ioremap_nocache
> >
> > Current ioremap_wc and ioremap_wt implementation are too simple and
> > we'll improve it in future.
> >
> > > In particular:
> > >
> > > - Is a read from uncached memory always serialized with DMA, and with
> > >   other CPUs doing MMIO access to a different address?
> > CPU use ld.w to get data from uncached strong order memory.
> > Other CPUs use the same mmio vaddr to access the uncachable strong order
> > memory paddr.
> 
> Ok, but what about the DMA? The most common requirement for
> serialization here is with a DMA transfer, where you first write
> into a buffer in memory, then write to an MMIO register to trigger
> a DMA-load, and then the device reads the data from memory.
> Without a barrier before the MMIO, the data may still be in a
> store queue of the CPU, and the DMA gets stale data.

> 
> Similarly, an MMIO read may be used to see if a DMA has completed
> and the device register tells you that the DMA has left the device,
> but without a barrier, the CPU may have prefetched the DMA
> data while waiting for the MMIO-read to complete. The __io_ar()
> barrier() in asm-generic/io.h prevents the compiler from reordering
> the two reads, but if an weakly ordered read (in coherent DMA buffer)
> can bypass a strongly ordered read (MMIO), then it's still still
> broken.
__io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
I got rmb() here not barrier().

Only __io_br() is barrier().

> > > - How does endianess work? Are there any buses that flip bytes around
> > >   when running big-endian, or do you always do that in software?
> > Currently we only support little-endian and soc will follow it.
> 
> Ok, that makes it easier. If you think that you won't even need big-endian
> support in the long run, you could also remove your asm/byteorder.h
> header. If you're not sure, it doesn't hurt to keep it of course.
Em... I'm not sure, so let me keep it for a while.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
>
> On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> >
> > Can you describe how C-Sky hardware implements MMIO?
> Our mmio is uncachable and strong-order address, so there is no need
> barriers for access these io addr.
>
>  #define ioremap_wc ioremap_nocache
>  #define ioremap_wt ioremap_nocache
>
> Current ioremap_wc and ioremap_wt implementation are too simple and
> we'll improve it in future.
>
> > In particular:
> >
> > - Is a read from uncached memory always serialized with DMA, and with
> >   other CPUs doing MMIO access to a different address?
> CPU use ld.w to get data from uncached strong order memory.
> Other CPUs use the same mmio vaddr to access the uncachable strong order
> memory paddr.

Ok, but what about the DMA? The most common requirement for
serialization here is with a DMA transfer, where you first write
into a buffer in memory, then write to an MMIO register to trigger
a DMA-load, and then the device reads the data from memory.
Without a barrier before the MMIO, the data may still be in a
store queue of the CPU, and the DMA gets stale data.

Similarly, an MMIO read may be used to see if a DMA has completed
and the device register tells you that the DMA has left the device,
but without a barrier, the CPU may have prefetched the DMA
data while waiting for the MMIO-read to complete. The __io_ar()
barrier() in asm-generic/io.h prevents the compiler from reordering
the two reads, but if an weakly ordered read (in coherent DMA buffer)
can bypass a strongly ordered read (MMIO), then it's still still
broken.

> > - How does endianess work? Are there any buses that flip bytes around
> >   when running big-endian, or do you always do that in software?
> Currently we only support little-endian and soc will follow it.

Ok, that makes it easier. If you think that you won't even need big-endian
support in the long run, you could also remove your asm/byteorder.h
header. If you're not sure, it doesn't hurt to keep it of course.

Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
>
> On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> >
> > Can you describe how C-Sky hardware implements MMIO?
> Our mmio is uncachable and strong-order address, so there is no need
> barriers for access these io addr.
>
>  #define ioremap_wc ioremap_nocache
>  #define ioremap_wt ioremap_nocache
>
> Current ioremap_wc and ioremap_wt implementation are too simple and
> we'll improve it in future.
>
> > In particular:
> >
> > - Is a read from uncached memory always serialized with DMA, and with
> >   other CPUs doing MMIO access to a different address?
> CPU use ld.w to get data from uncached strong order memory.
> Other CPUs use the same mmio vaddr to access the uncachable strong order
> memory paddr.

Ok, but what about the DMA? The most common requirement for
serialization here is with a DMA transfer, where you first write
into a buffer in memory, then write to an MMIO register to trigger
a DMA-load, and then the device reads the data from memory.
Without a barrier before the MMIO, the data may still be in a
store queue of the CPU, and the DMA gets stale data.

Similarly, an MMIO read may be used to see if a DMA has completed
and the device register tells you that the DMA has left the device,
but without a barrier, the CPU may have prefetched the DMA
data while waiting for the MMIO-read to complete. The __io_ar()
barrier() in asm-generic/io.h prevents the compiler from reordering
the two reads, but if an weakly ordered read (in coherent DMA buffer)
can bypass a strongly ordered read (MMIO), then it's still still
broken.

> > - How does endianess work? Are there any buses that flip bytes around
> >   when running big-endian, or do you always do that in software?
> Currently we only support little-endian and soc will follow it.

Ok, that makes it easier. If you think that you won't even need big-endian
support in the long run, you could also remove your asm/byteorder.h
header. If you're not sure, it doesn't hurt to keep it of course.

Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Guo Ren
On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> 
> > diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> > new file mode 100644
> > index 000..fcb2142
> > --- /dev/null
> > +++ b/arch/csky/include/asm/io.h
> > @@ -0,0 +1,23 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> > +#ifndef __ASM_CSKY_IO_H
> > +#define __ASM_CSKY_IO_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> > +
> > +extern void iounmap(void *addr);
> > +
> > +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> > +   size_t size, unsigned long flags);
> > +
> > +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> > +#define ioremap_wc ioremap_nocache
> > +#define ioremap_wt ioremap_nocache
> > +
> > +#include 
> 
> It is very unusual for an architecture to not need special handling in 
> asm/io.h,
> to do the proper barriers etc.
> 
> Can you describe how C-Sky hardware implements MMIO?
Our mmio is uncachable and strong-order address, so there is no need
barriers for access these io addr.

 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache

Current ioremap_wc and ioremap_wt implementation are too simple and
we'll improve it in future.

> In particular:
> 
> - Is a read from uncached memory always serialized with DMA, and with
>   other CPUs doing MMIO access to a different address?
CPU use ld.w to get data from uncached strong order memory.
Other CPUs use the same mmio vaddr to access the uncachable strong order
memory paddr.

> - How does endianess work? Are there any buses that flip bytes around
>   when running big-endian, or do you always do that in software?
Currently we only support little-endian and soc will follow it.

 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Guo Ren
On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> 
> > diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> > new file mode 100644
> > index 000..fcb2142
> > --- /dev/null
> > +++ b/arch/csky/include/asm/io.h
> > @@ -0,0 +1,23 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> > +#ifndef __ASM_CSKY_IO_H
> > +#define __ASM_CSKY_IO_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> > +
> > +extern void iounmap(void *addr);
> > +
> > +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> > +   size_t size, unsigned long flags);
> > +
> > +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> > +#define ioremap_wc ioremap_nocache
> > +#define ioremap_wt ioremap_nocache
> > +
> > +#include 
> 
> It is very unusual for an architecture to not need special handling in 
> asm/io.h,
> to do the proper barriers etc.
> 
> Can you describe how C-Sky hardware implements MMIO?
Our mmio is uncachable and strong-order address, so there is no need
barriers for access these io addr.

 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache

Current ioremap_wc and ioremap_wt implementation are too simple and
we'll improve it in future.

> In particular:
> 
> - Is a read from uncached memory always serialized with DMA, and with
>   other CPUs doing MMIO access to a different address?
CPU use ld.w to get data from uncached strong order memory.
Other CPUs use the same mmio vaddr to access the uncachable strong order
memory paddr.

> - How does endianess work? Are there any buses that flip bytes around
>   when running big-endian, or do you always do that in software?
Currently we only support little-endian and soc will follow it.

 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Arnd Bergmann
On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:

> diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> new file mode 100644
> index 000..fcb2142
> --- /dev/null
> +++ b/arch/csky/include/asm/io.h
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> +#ifndef __ASM_CSKY_IO_H
> +#define __ASM_CSKY_IO_H
> +
> +#include 
> +#include 
> +#include 
> +
> +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> +
> +extern void iounmap(void *addr);
> +
> +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> +   size_t size, unsigned long flags);
> +
> +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> +#define ioremap_wc ioremap_nocache
> +#define ioremap_wt ioremap_nocache
> +
> +#include 

It is very unusual for an architecture to not need special handling in asm/io.h,
to do the proper barriers etc.

Can you describe how C-Sky hardware implements MMIO?

In particular:

- Is a read from uncached memory always serialized with DMA, and with
  other CPUs doing MMIO access to a different address?

- How does endianess work? Are there any buses that flip bytes around
  when running big-endian, or do you always do that in software?

Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Arnd Bergmann
On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:

> diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> new file mode 100644
> index 000..fcb2142
> --- /dev/null
> +++ b/arch/csky/include/asm/io.h
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> +#ifndef __ASM_CSKY_IO_H
> +#define __ASM_CSKY_IO_H
> +
> +#include 
> +#include 
> +#include 
> +
> +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> +
> +extern void iounmap(void *addr);
> +
> +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> +   size_t size, unsigned long flags);
> +
> +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> +#define ioremap_wc ioremap_nocache
> +#define ioremap_wt ioremap_nocache
> +
> +#include 

It is very unusual for an architecture to not need special handling in asm/io.h,
to do the proper barriers etc.

Can you describe how C-Sky hardware implements MMIO?

In particular:

- Is a read from uncached memory always serialized with DMA, and with
  other CPUs doing MMIO access to a different address?

- How does endianess work? Are there any buses that flip bytes around
  when running big-endian, or do you always do that in software?

Arnd


[PATCH V3 06/26] csky: Cache and TLB routines

2018-09-05 Thread Guo Ren
Signed-off-by: Guo Ren 
---
 arch/csky/abiv1/cacheflush.c  |  50 
 arch/csky/abiv1/inc/abi/cacheflush.h  |  41 +++
 arch/csky/abiv1/inc/abi/tlb.h |  11 ++
 arch/csky/abiv2/cacheflush.c  |  54 +
 arch/csky/abiv2/inc/abi/cacheflush.h  |  38 ++
 arch/csky/abiv2/inc/abi/tlb.h |  12 ++
 arch/csky/include/asm/barrier.h   |  45 +++
 arch/csky/include/asm/cache.h |  28 +
 arch/csky/include/asm/cacheflush.h|   8 ++
 arch/csky/include/asm/io.h|  23 
 arch/csky/include/asm/tlb.h   |  19 +++
 arch/csky/include/asm/tlbflush.h  |  22 
 arch/csky/include/uapi/asm/cachectl.h |  13 +++
 arch/csky/mm/cachev1.c| 126 
 arch/csky/mm/cachev2.c|  79 +
 arch/csky/mm/syscache.c   |  28 +
 arch/csky/mm/tlb.c| 214 ++
 17 files changed, 811 insertions(+)
 create mode 100644 arch/csky/abiv1/cacheflush.c
 create mode 100644 arch/csky/abiv1/inc/abi/cacheflush.h
 create mode 100644 arch/csky/abiv1/inc/abi/tlb.h
 create mode 100644 arch/csky/abiv2/cacheflush.c
 create mode 100644 arch/csky/abiv2/inc/abi/cacheflush.h
 create mode 100644 arch/csky/abiv2/inc/abi/tlb.h
 create mode 100644 arch/csky/include/asm/barrier.h
 create mode 100644 arch/csky/include/asm/cache.h
 create mode 100644 arch/csky/include/asm/cacheflush.h
 create mode 100644 arch/csky/include/asm/io.h
 create mode 100644 arch/csky/include/asm/tlb.h
 create mode 100644 arch/csky/include/asm/tlbflush.h
 create mode 100644 arch/csky/include/uapi/asm/cachectl.h
 create mode 100644 arch/csky/mm/cachev1.c
 create mode 100644 arch/csky/mm/cachev2.c
 create mode 100644 arch/csky/mm/syscache.c
 create mode 100644 arch/csky/mm/tlb.c

diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
new file mode 100644
index 000..4c6fede
--- /dev/null
+++ b/arch/csky/abiv1/cacheflush.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void flush_dcache_page(struct page *page)
+{
+   struct address_space *mapping = page_mapping(page);
+   unsigned long addr;
+
+   if (mapping && !mapping_mapped(mapping)) {
+   set_bit(PG_arch_1, &(page)->flags);
+   return;
+   }
+
+   /*
+* We could delay the flush for the !page_mapping case too.  But that
+* case is for exec env/arg pages and those are %99 certainly going to
+* get faulted into the tlb (and thus flushed) anyways.
+*/
+   addr = (unsigned long) page_address(page);
+   dcache_wb_range(addr, addr + PAGE_SIZE);
+}
+
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t 
*pte)
+{
+   unsigned long addr;
+   struct page *page;
+   unsigned long pfn;
+
+   pfn = pte_pfn(*pte);
+   if (unlikely(!pfn_valid(pfn)))
+   return;
+
+   page = pfn_to_page(pfn);
+   addr = (unsigned long) page_address(page);
+
+   if (vma->vm_flags & VM_EXEC ||
+   pages_do_alias(addr, address & PAGE_MASK))
+   cache_wbinv_all();
+
+   clear_bit(PG_arch_1, &(page)->flags);
+}
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h 
b/arch/csky/abiv1/inc/abi/cacheflush.h
new file mode 100644
index 000..ba5071e
--- /dev/null
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
+#ifndef __ABI_CSKY_CACHEFLUSH_H
+#define __ABI_CSKY_CACHEFLUSH_H
+
+#include 
+#include 
+#include 
+
+#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
+extern void flush_dcache_page(struct page *);
+
+#define flush_cache_mm(mm) cache_wbinv_all()
+#define flush_cache_page(vma,page,pfn) cache_wbinv_all()
+#define flush_cache_dup_mm(mm) cache_wbinv_all()
+
+#define flush_cache_range(mm,start,end)cache_wbinv_range(start, end)
+#define flush_cache_vmap(start, end)   cache_wbinv_range(start, end)
+#define flush_cache_vunmap(start, end)  cache_wbinv_range(start, end)
+
+#define flush_icache_page(vma, page)   cache_wbinv_all()
+#define flush_icache_range(start, end) cache_wbinv_range(start, end)
+#define flush_icache_user_range(vma,pg,adr,len)cache_wbinv_range(adr, 
adr + len)
+
+#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
+do{ \
+   cache_wbinv_all(); \
+   memcpy(dst, src, len); \
+   icache_inv_all(); \
+}while(0)
+
+#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
+do{ \
+   cache_wbinv_all(); \
+   memcpy(dst, src, len); \
+}while(0)
+
+#define flush_dcache_mmap_lock(mapping)do{}while(0)
+#define flush_dcache_mmap_unlock(mapping)  do{}while(0)
+
+#endif /* __ABI_CSKY_CACHEFLUSH_H */
diff --git 

[PATCH V3 06/26] csky: Cache and TLB routines

2018-09-05 Thread Guo Ren
Signed-off-by: Guo Ren 
---
 arch/csky/abiv1/cacheflush.c  |  50 
 arch/csky/abiv1/inc/abi/cacheflush.h  |  41 +++
 arch/csky/abiv1/inc/abi/tlb.h |  11 ++
 arch/csky/abiv2/cacheflush.c  |  54 +
 arch/csky/abiv2/inc/abi/cacheflush.h  |  38 ++
 arch/csky/abiv2/inc/abi/tlb.h |  12 ++
 arch/csky/include/asm/barrier.h   |  45 +++
 arch/csky/include/asm/cache.h |  28 +
 arch/csky/include/asm/cacheflush.h|   8 ++
 arch/csky/include/asm/io.h|  23 
 arch/csky/include/asm/tlb.h   |  19 +++
 arch/csky/include/asm/tlbflush.h  |  22 
 arch/csky/include/uapi/asm/cachectl.h |  13 +++
 arch/csky/mm/cachev1.c| 126 
 arch/csky/mm/cachev2.c|  79 +
 arch/csky/mm/syscache.c   |  28 +
 arch/csky/mm/tlb.c| 214 ++
 17 files changed, 811 insertions(+)
 create mode 100644 arch/csky/abiv1/cacheflush.c
 create mode 100644 arch/csky/abiv1/inc/abi/cacheflush.h
 create mode 100644 arch/csky/abiv1/inc/abi/tlb.h
 create mode 100644 arch/csky/abiv2/cacheflush.c
 create mode 100644 arch/csky/abiv2/inc/abi/cacheflush.h
 create mode 100644 arch/csky/abiv2/inc/abi/tlb.h
 create mode 100644 arch/csky/include/asm/barrier.h
 create mode 100644 arch/csky/include/asm/cache.h
 create mode 100644 arch/csky/include/asm/cacheflush.h
 create mode 100644 arch/csky/include/asm/io.h
 create mode 100644 arch/csky/include/asm/tlb.h
 create mode 100644 arch/csky/include/asm/tlbflush.h
 create mode 100644 arch/csky/include/uapi/asm/cachectl.h
 create mode 100644 arch/csky/mm/cachev1.c
 create mode 100644 arch/csky/mm/cachev2.c
 create mode 100644 arch/csky/mm/syscache.c
 create mode 100644 arch/csky/mm/tlb.c

diff --git a/arch/csky/abiv1/cacheflush.c b/arch/csky/abiv1/cacheflush.c
new file mode 100644
index 000..4c6fede
--- /dev/null
+++ b/arch/csky/abiv1/cacheflush.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void flush_dcache_page(struct page *page)
+{
+   struct address_space *mapping = page_mapping(page);
+   unsigned long addr;
+
+   if (mapping && !mapping_mapped(mapping)) {
+   set_bit(PG_arch_1, &(page)->flags);
+   return;
+   }
+
+   /*
+* We could delay the flush for the !page_mapping case too.  But that
+* case is for exec env/arg pages and those are %99 certainly going to
+* get faulted into the tlb (and thus flushed) anyways.
+*/
+   addr = (unsigned long) page_address(page);
+   dcache_wb_range(addr, addr + PAGE_SIZE);
+}
+
+void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t 
*pte)
+{
+   unsigned long addr;
+   struct page *page;
+   unsigned long pfn;
+
+   pfn = pte_pfn(*pte);
+   if (unlikely(!pfn_valid(pfn)))
+   return;
+
+   page = pfn_to_page(pfn);
+   addr = (unsigned long) page_address(page);
+
+   if (vma->vm_flags & VM_EXEC ||
+   pages_do_alias(addr, address & PAGE_MASK))
+   cache_wbinv_all();
+
+   clear_bit(PG_arch_1, &(page)->flags);
+}
diff --git a/arch/csky/abiv1/inc/abi/cacheflush.h 
b/arch/csky/abiv1/inc/abi/cacheflush.h
new file mode 100644
index 000..ba5071e
--- /dev/null
+++ b/arch/csky/abiv1/inc/abi/cacheflush.h
@@ -0,0 +1,41 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
+#ifndef __ABI_CSKY_CACHEFLUSH_H
+#define __ABI_CSKY_CACHEFLUSH_H
+
+#include 
+#include 
+#include 
+
+#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
+extern void flush_dcache_page(struct page *);
+
+#define flush_cache_mm(mm) cache_wbinv_all()
+#define flush_cache_page(vma,page,pfn) cache_wbinv_all()
+#define flush_cache_dup_mm(mm) cache_wbinv_all()
+
+#define flush_cache_range(mm,start,end)cache_wbinv_range(start, end)
+#define flush_cache_vmap(start, end)   cache_wbinv_range(start, end)
+#define flush_cache_vunmap(start, end)  cache_wbinv_range(start, end)
+
+#define flush_icache_page(vma, page)   cache_wbinv_all()
+#define flush_icache_range(start, end) cache_wbinv_range(start, end)
+#define flush_icache_user_range(vma,pg,adr,len)cache_wbinv_range(adr, 
adr + len)
+
+#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
+do{ \
+   cache_wbinv_all(); \
+   memcpy(dst, src, len); \
+   icache_inv_all(); \
+}while(0)
+
+#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
+do{ \
+   cache_wbinv_all(); \
+   memcpy(dst, src, len); \
+}while(0)
+
+#define flush_dcache_mmap_lock(mapping)do{}while(0)
+#define flush_dcache_mmap_unlock(mapping)  do{}while(0)
+
+#endif /* __ABI_CSKY_CACHEFLUSH_H */
diff --git