Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 04:13:35PM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
> >
> > On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > Similarly, an MMIO read may be used to see if a DMA has completed
> > > and the device register tells you that the DMA has left the device,
> > > but without a barrier, the CPU may have prefetched the DMA
> > > data while waiting for the MMIO-read to complete. The __io_ar()
> > > barrier() in asm-generic/io.h prevents the compiler from reordering
> > > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > > can bypass a strongly ordered read (MMIO), then it's still still
> > > broken.
> > __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> > I got rmb() here not barrier().
> >
> > Only __io_br() is barrier().
> 
> Ah right, I misremembered the defaults. It's probably ok then.
Thx for the review and comments. These let me re-consider the mmio
issues and help to improve the csky asm/io.h in future. 
> 
> > > > > - How does endianess work? Are there any buses that flip bytes around
> > > > >   when running big-endian, or do you always do that in software?
> > > > Currently we only support little-endian and soc will follow it.
> > >
> > > Ok, that makes it easier. If you think that you won't even need big-endian
> > > support in the long run, you could also remove your asm/byteorder.h
> > > header. If you're not sure, it doesn't hurt to keep it of course.
> > Em... I'm not sure, so let me keep it for a while.
> 
> Ok. I think overall the trend is to be little-endian only for most
> architectures: powerpc64 moved from big-endian only to little-endian
> by default, ARM rarely uses big-endian (basically only for legacy
> applications ported from BE MIPS or ppc), and all new architectures
> we added in the last years are little-endian (OpenRISC being the
> main exception).
Good news, I really don't want to support big-endian and it makes CI
double.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 04:13:35PM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
> >
> > On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > Similarly, an MMIO read may be used to see if a DMA has completed
> > > and the device register tells you that the DMA has left the device,
> > > but without a barrier, the CPU may have prefetched the DMA
> > > data while waiting for the MMIO-read to complete. The __io_ar()
> > > barrier() in asm-generic/io.h prevents the compiler from reordering
> > > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > > can bypass a strongly ordered read (MMIO), then it's still still
> > > broken.
> > __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> > I got rmb() here not barrier().
> >
> > Only __io_br() is barrier().
> 
> Ah right, I misremembered the defaults. It's probably ok then.
Thx for the review and comments. These let me re-consider the mmio
issues and help to improve the csky asm/io.h in future. 
> 
> > > > > - How does endianess work? Are there any buses that flip bytes around
> > > > >   when running big-endian, or do you always do that in software?
> > > > Currently we only support little-endian and soc will follow it.
> > >
> > > Ok, that makes it easier. If you think that you won't even need big-endian
> > > support in the long run, you could also remove your asm/byteorder.h
> > > header. If you're not sure, it doesn't hurt to keep it of course.
> > Em... I'm not sure, so let me keep it for a while.
> 
> Ok. I think overall the trend is to be little-endian only for most
> architectures: powerpc64 moved from big-endian only to little-endian
> by default, ARM rarely uses big-endian (basically only for legacy
> applications ported from BE MIPS or ppc), and all new architectures
> we added in the last years are little-endian (OpenRISC being the
> main exception).
Good news, I really don't want to support big-endian and it makes CI
double.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
>
> On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > Similarly, an MMIO read may be used to see if a DMA has completed
> > and the device register tells you that the DMA has left the device,
> > but without a barrier, the CPU may have prefetched the DMA
> > data while waiting for the MMIO-read to complete. The __io_ar()
> > barrier() in asm-generic/io.h prevents the compiler from reordering
> > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > can bypass a strongly ordered read (MMIO), then it's still still
> > broken.
> __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> I got rmb() here not barrier().
>
> Only __io_br() is barrier().

Ah right, I misremembered the defaults. It's probably ok then.

> > > > - How does endianess work? Are there any buses that flip bytes around
> > > >   when running big-endian, or do you always do that in software?
> > > Currently we only support little-endian and soc will follow it.
> >
> > Ok, that makes it easier. If you think that you won't even need big-endian
> > support in the long run, you could also remove your asm/byteorder.h
> > header. If you're not sure, it doesn't hurt to keep it of course.
> Em... I'm not sure, so let me keep it for a while.

Ok. I think overall the trend is to be little-endian only for most
architectures: powerpc64 moved from big-endian only to little-endian
by default, ARM rarely uses big-endian (basically only for legacy
applications ported from BE MIPS or ppc), and all new architectures
we added in the last years are little-endian (OpenRISC being the
main exception).

 Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 2:55 PM Guo Ren  wrote:
>
> On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> > On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> > > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > Similarly, an MMIO read may be used to see if a DMA has completed
> > and the device register tells you that the DMA has left the device,
> > but without a barrier, the CPU may have prefetched the DMA
> > data while waiting for the MMIO-read to complete. The __io_ar()
> > barrier() in asm-generic/io.h prevents the compiler from reordering
> > the two reads, but if an weakly ordered read (in coherent DMA buffer)
> > can bypass a strongly ordered read (MMIO), then it's still still
> > broken.
> __io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
> I got rmb() here not barrier().
>
> Only __io_br() is barrier().

Ah right, I misremembered the defaults. It's probably ok then.

> > > > - How does endianess work? Are there any buses that flip bytes around
> > > >   when running big-endian, or do you always do that in software?
> > > Currently we only support little-endian and soc will follow it.
> >
> > Ok, that makes it easier. If you think that you won't even need big-endian
> > support in the long run, you could also remove your asm/byteorder.h
> > header. If you're not sure, it doesn't hurt to keep it of course.
> Em... I'm not sure, so let me keep it for a while.

Ok. I think overall the trend is to be little-endian only for most
architectures: powerpc64 moved from big-endian only to little-endian
by default, ARM rarely uses big-endian (basically only for legacy
applications ported from BE MIPS or ppc), and all new architectures
we added in the last years are little-endian (OpenRISC being the
main exception).

 Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> >
> > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> > >
> > > Can you describe how C-Sky hardware implements MMIO?
> > Our mmio is uncachable and strong-order address, so there is no need
> > barriers for access these io addr.
> >
> >  #define ioremap_wc ioremap_nocache
> >  #define ioremap_wt ioremap_nocache
> >
> > Current ioremap_wc and ioremap_wt implementation are too simple and
> > we'll improve it in future.
> >
> > > In particular:
> > >
> > > - Is a read from uncached memory always serialized with DMA, and with
> > >   other CPUs doing MMIO access to a different address?
> > CPU use ld.w to get data from uncached strong order memory.
> > Other CPUs use the same mmio vaddr to access the uncachable strong order
> > memory paddr.
> 
> Ok, but what about the DMA? The most common requirement for
> serialization here is with a DMA transfer, where you first write
> into a buffer in memory, then write to an MMIO register to trigger
> a DMA-load, and then the device reads the data from memory.
> Without a barrier before the MMIO, the data may still be in a
> store queue of the CPU, and the DMA gets stale data.

> 
> Similarly, an MMIO read may be used to see if a DMA has completed
> and the device register tells you that the DMA has left the device,
> but without a barrier, the CPU may have prefetched the DMA
> data while waiting for the MMIO-read to complete. The __io_ar()
> barrier() in asm-generic/io.h prevents the compiler from reordering
> the two reads, but if an weakly ordered read (in coherent DMA buffer)
> can bypass a strongly ordered read (MMIO), then it's still still
> broken.
__io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
I got rmb() here not barrier().

Only __io_br() is barrier().

> > > - How does endianess work? Are there any buses that flip bytes around
> > >   when running big-endian, or do you always do that in software?
> > Currently we only support little-endian and soc will follow it.
> 
> Ok, that makes it easier. If you think that you won't even need big-endian
> support in the long run, you could also remove your asm/byteorder.h
> header. If you're not sure, it doesn't hurt to keep it of course.
Em... I'm not sure, so let me keep it for a while.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Guo Ren
On Fri, Sep 07, 2018 at 10:14:38AM +0200, Arnd Bergmann wrote:
> On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
> >
> > On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> > >
> > > Can you describe how C-Sky hardware implements MMIO?
> > Our mmio is uncachable and strong-order address, so there is no need
> > barriers for access these io addr.
> >
> >  #define ioremap_wc ioremap_nocache
> >  #define ioremap_wt ioremap_nocache
> >
> > Current ioremap_wc and ioremap_wt implementation are too simple and
> > we'll improve it in future.
> >
> > > In particular:
> > >
> > > - Is a read from uncached memory always serialized with DMA, and with
> > >   other CPUs doing MMIO access to a different address?
> > CPU use ld.w to get data from uncached strong order memory.
> > Other CPUs use the same mmio vaddr to access the uncachable strong order
> > memory paddr.
> 
> Ok, but what about the DMA? The most common requirement for
> serialization here is with a DMA transfer, where you first write
> into a buffer in memory, then write to an MMIO register to trigger
> a DMA-load, and then the device reads the data from memory.
> Without a barrier before the MMIO, the data may still be in a
> store queue of the CPU, and the DMA gets stale data.

> 
> Similarly, an MMIO read may be used to see if a DMA has completed
> and the device register tells you that the DMA has left the device,
> but without a barrier, the CPU may have prefetched the DMA
> data while waiting for the MMIO-read to complete. The __io_ar()
> barrier() in asm-generic/io.h prevents the compiler from reordering
> the two reads, but if an weakly ordered read (in coherent DMA buffer)
> can bypass a strongly ordered read (MMIO), then it's still still
> broken.
__io_ar() barrier()? not rmb() ?! I've defined the rmb in asm/barrier, So
I got rmb() here not barrier().

Only __io_br() is barrier().

> > > - How does endianess work? Are there any buses that flip bytes around
> > >   when running big-endian, or do you always do that in software?
> > Currently we only support little-endian and soc will follow it.
> 
> Ok, that makes it easier. If you think that you won't even need big-endian
> support in the long run, you could also remove your asm/byteorder.h
> header. If you're not sure, it doesn't hurt to keep it of course.
Em... I'm not sure, so let me keep it for a while.

Best Regards
 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
>
> On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> >
> > Can you describe how C-Sky hardware implements MMIO?
> Our mmio is uncachable and strong-order address, so there is no need
> barriers for access these io addr.
>
>  #define ioremap_wc ioremap_nocache
>  #define ioremap_wt ioremap_nocache
>
> Current ioremap_wc and ioremap_wt implementation are too simple and
> we'll improve it in future.
>
> > In particular:
> >
> > - Is a read from uncached memory always serialized with DMA, and with
> >   other CPUs doing MMIO access to a different address?
> CPU use ld.w to get data from uncached strong order memory.
> Other CPUs use the same mmio vaddr to access the uncachable strong order
> memory paddr.

Ok, but what about the DMA? The most common requirement for
serialization here is with a DMA transfer, where you first write
into a buffer in memory, then write to an MMIO register to trigger
a DMA-load, and then the device reads the data from memory.
Without a barrier before the MMIO, the data may still be in a
store queue of the CPU, and the DMA gets stale data.

Similarly, an MMIO read may be used to see if a DMA has completed
and the device register tells you that the DMA has left the device,
but without a barrier, the CPU may have prefetched the DMA
data while waiting for the MMIO-read to complete. The __io_ar()
barrier() in asm-generic/io.h prevents the compiler from reordering
the two reads, but if an weakly ordered read (in coherent DMA buffer)
can bypass a strongly ordered read (MMIO), then it's still still
broken.

> > - How does endianess work? Are there any buses that flip bytes around
> >   when running big-endian, or do you always do that in software?
> Currently we only support little-endian and soc will follow it.

Ok, that makes it easier. If you think that you won't even need big-endian
support in the long run, you could also remove your asm/byteorder.h
header. If you're not sure, it doesn't hurt to keep it of course.

Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-07 Thread Arnd Bergmann
On Fri, Sep 7, 2018 at 5:04 AM Guo Ren  wrote:
>
> On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> > On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> >
> > Can you describe how C-Sky hardware implements MMIO?
> Our mmio is uncachable and strong-order address, so there is no need
> barriers for access these io addr.
>
>  #define ioremap_wc ioremap_nocache
>  #define ioremap_wt ioremap_nocache
>
> Current ioremap_wc and ioremap_wt implementation are too simple and
> we'll improve it in future.
>
> > In particular:
> >
> > - Is a read from uncached memory always serialized with DMA, and with
> >   other CPUs doing MMIO access to a different address?
> CPU use ld.w to get data from uncached strong order memory.
> Other CPUs use the same mmio vaddr to access the uncachable strong order
> memory paddr.

Ok, but what about the DMA? The most common requirement for
serialization here is with a DMA transfer, where you first write
into a buffer in memory, then write to an MMIO register to trigger
a DMA-load, and then the device reads the data from memory.
Without a barrier before the MMIO, the data may still be in a
store queue of the CPU, and the DMA gets stale data.

Similarly, an MMIO read may be used to see if a DMA has completed
and the device register tells you that the DMA has left the device,
but without a barrier, the CPU may have prefetched the DMA
data while waiting for the MMIO-read to complete. The __io_ar()
barrier() in asm-generic/io.h prevents the compiler from reordering
the two reads, but if an weakly ordered read (in coherent DMA buffer)
can bypass a strongly ordered read (MMIO), then it's still still
broken.

> > - How does endianess work? Are there any buses that flip bytes around
> >   when running big-endian, or do you always do that in software?
> Currently we only support little-endian and soc will follow it.

Ok, that makes it easier. If you think that you won't even need big-endian
support in the long run, you could also remove your asm/byteorder.h
header. If you're not sure, it doesn't hurt to keep it of course.

Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Guo Ren
On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> 
> > diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> > new file mode 100644
> > index 000..fcb2142
> > --- /dev/null
> > +++ b/arch/csky/include/asm/io.h
> > @@ -0,0 +1,23 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> > +#ifndef __ASM_CSKY_IO_H
> > +#define __ASM_CSKY_IO_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> > +
> > +extern void iounmap(void *addr);
> > +
> > +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> > +   size_t size, unsigned long flags);
> > +
> > +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> > +#define ioremap_wc ioremap_nocache
> > +#define ioremap_wt ioremap_nocache
> > +
> > +#include 
> 
> It is very unusual for an architecture to not need special handling in 
> asm/io.h,
> to do the proper barriers etc.
> 
> Can you describe how C-Sky hardware implements MMIO?
Our mmio is uncachable and strong-order address, so there is no need
barriers for access these io addr.

 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache

Current ioremap_wc and ioremap_wt implementation are too simple and
we'll improve it in future.

> In particular:
> 
> - Is a read from uncached memory always serialized with DMA, and with
>   other CPUs doing MMIO access to a different address?
CPU use ld.w to get data from uncached strong order memory.
Other CPUs use the same mmio vaddr to access the uncachable strong order
memory paddr.

> - How does endianess work? Are there any buses that flip bytes around
>   when running big-endian, or do you always do that in software?
Currently we only support little-endian and soc will follow it.

 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Guo Ren
On Thu, Sep 06, 2018 at 04:31:16PM +0200, Arnd Bergmann wrote:
> On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:
> 
> > diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> > new file mode 100644
> > index 000..fcb2142
> > --- /dev/null
> > +++ b/arch/csky/include/asm/io.h
> > @@ -0,0 +1,23 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> > +#ifndef __ASM_CSKY_IO_H
> > +#define __ASM_CSKY_IO_H
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> > +
> > +extern void iounmap(void *addr);
> > +
> > +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> > +   size_t size, unsigned long flags);
> > +
> > +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> > +#define ioremap_wc ioremap_nocache
> > +#define ioremap_wt ioremap_nocache
> > +
> > +#include 
> 
> It is very unusual for an architecture to not need special handling in 
> asm/io.h,
> to do the proper barriers etc.
> 
> Can you describe how C-Sky hardware implements MMIO?
Our mmio is uncachable and strong-order address, so there is no need
barriers for access these io addr.

 #define ioremap_wc ioremap_nocache
 #define ioremap_wt ioremap_nocache

Current ioremap_wc and ioremap_wt implementation are too simple and
we'll improve it in future.

> In particular:
> 
> - Is a read from uncached memory always serialized with DMA, and with
>   other CPUs doing MMIO access to a different address?
CPU use ld.w to get data from uncached strong order memory.
Other CPUs use the same mmio vaddr to access the uncachable strong order
memory paddr.

> - How does endianess work? Are there any buses that flip bytes around
>   when running big-endian, or do you always do that in software?
Currently we only support little-endian and soc will follow it.

 Guo Ren


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Arnd Bergmann
On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:

> diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> new file mode 100644
> index 000..fcb2142
> --- /dev/null
> +++ b/arch/csky/include/asm/io.h
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> +#ifndef __ASM_CSKY_IO_H
> +#define __ASM_CSKY_IO_H
> +
> +#include 
> +#include 
> +#include 
> +
> +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> +
> +extern void iounmap(void *addr);
> +
> +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> +   size_t size, unsigned long flags);
> +
> +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> +#define ioremap_wc ioremap_nocache
> +#define ioremap_wt ioremap_nocache
> +
> +#include 

It is very unusual for an architecture to not need special handling in asm/io.h,
to do the proper barriers etc.

Can you describe how C-Sky hardware implements MMIO?

In particular:

- Is a read from uncached memory always serialized with DMA, and with
  other CPUs doing MMIO access to a different address?

- How does endianess work? Are there any buses that flip bytes around
  when running big-endian, or do you always do that in software?

Arnd


Re: [PATCH V3 06/26] csky: Cache and TLB routines

2018-09-06 Thread Arnd Bergmann
On Wed, Sep 5, 2018 at 2:08 PM Guo Ren  wrote:

> diff --git a/arch/csky/include/asm/io.h b/arch/csky/include/asm/io.h
> new file mode 100644
> index 000..fcb2142
> --- /dev/null
> +++ b/arch/csky/include/asm/io.h
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
> +#ifndef __ASM_CSKY_IO_H
> +#define __ASM_CSKY_IO_H
> +
> +#include 
> +#include 
> +#include 
> +
> +extern void __iomem *ioremap(phys_addr_t offset, size_t size);
> +
> +extern void iounmap(void *addr);
> +
> +extern int remap_area_pages(unsigned long address, phys_addr_t phys_addr,
> +   size_t size, unsigned long flags);
> +
> +#define ioremap_nocache(phy, sz)   ioremap(phy, sz)
> +#define ioremap_wc ioremap_nocache
> +#define ioremap_wt ioremap_nocache
> +
> +#include 

It is very unusual for an architecture to not need special handling in asm/io.h,
to do the proper barriers etc.

Can you describe how C-Sky hardware implements MMIO?

In particular:

- Is a read from uncached memory always serialized with DMA, and with
  other CPUs doing MMIO access to a different address?

- How does endianess work? Are there any buses that flip bytes around
  when running big-endian, or do you always do that in software?

Arnd