On Monday 14 September 2015 13:49:12 Arnd Bergmann wrote:
> > > If all hardware can do 32-bit accesses here and the size is guaranteed to
> > > be a
> > > multiple of four bytes, you can probably improve performance by using a
> > > __raw_writel() loop there. Using __raw_writel() in general is almost
> > > always
> > > a bug, but here it actually makes sense. See also the powerpc
> > > implementation
> > > of _memcpy_toio().
> >
> > AFAICT, buffer passed to ->write_bu() are not necessarily aligned on
> > 32bits, so using writel here might require copying data in temporary
> > buffers :-/.
> >
> > Don't hesitate to point where I'm wrong ;-).
>
> Brian or Dwmw2 should be able to know for sure. I think it's definitely
> worth trying as the potential performance gains could be huge, if you
> replace
>
> for (p = start; p < start + length; data++, p++) {
> writeb(*data, p);
> wmb();
> }
>
> with
>
> for (p = start; p < start + length; data++, p+=4) {
> writel(*data, p);
> };
> wmb();
>
As Boris pointed out on IRC, we have an optimized version of
memcpy_toio on little-endian, which already does this. I'm not completely
sure why we don't use it for big-endian architectures as well.
Powerpc uses the same method on big-endian, but it's possible that
it does not do the right thing on one of the older platforms using
BE32 mode, or one that has a weird bus mode.
Arnd
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html