Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Will Deacon
On Tue, Feb 26, 2019 at 10:43:24AM -0800, Linus Torvalds wrote:
> On Tue, Feb 26, 2019 at 10:38 AM Will Deacon  wrote:
> >
> > That makes sense to me for this Alpha-specific case, but in general I
> > don't think we require that I/O accesses to different endpoints are
> > ordered with respect to each other, which was implied by one of Maciej's
> > examples. e.g.
> >
> > writeb(0x42, _mmio_reg);
> > readb(_mmio_reg);
> 
> If they are the same device (just different data ports), I'd
> *definitely* expect them to be ordered.
> 
> We have tons of code that depends on that. Almost every driver out
> there, in fact.
> 
> So we need the mb() on alpha to guarantee the access ordering on the
> CPU side, and then PCI itself ends up guaranteeing that accesses to
> the same device will remain ordered outside the CPU.
> 
> Agreed?

Yup, agreed. I'd consider all those ports to be the same endpoint, so we're
good.

Will


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2019 at 10:38 AM Will Deacon  wrote:
>
> That makes sense to me for this Alpha-specific case, but in general I
> don't think we require that I/O accesses to different endpoints are
> ordered with respect to each other, which was implied by one of Maciej's
> examples. e.g.
>
> writeb(0x42, _mmio_reg);
> readb(_mmio_reg);

If they are the same device (just different data ports), I'd
*definitely* expect them to be ordered.

We have tons of code that depends on that. Almost every driver out
there, in fact.

So we need the mb() on alpha to guarantee the access ordering on the
CPU side, and then PCI itself ends up guaranteeing that accesses to
the same device will remain ordered outside the CPU.

Agreed?

Linus


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Will Deacon
On Tue, Feb 26, 2019 at 10:12:03AM -0800, Linus Torvalds wrote:
> On Tue, Feb 26, 2019 at 9:52 AM Maciej W. Rozycki  
> wrote:
> >
> >  The thing is taking for example `ioreadX' and `iowriteX' we have `mb'
> > before a write and `mb' after a read.  So if we do say (addresses are
> > arbitrary for illustration only):
> >
> > iowrite8(0, 1);
> > ioread8(2);
> >
> > then these effectively expand to chipset-specific:
> >
> > mb();
> > foo_iowrite8(0, 1);
> > foo_ioread8(2);
> > mb();
> 
> Yeah, looking at the current alpha io.c file, I'd suggest that all IO
> operations just do the "mb()" _before_ doing the op.
> 
> That way all IO ops are at least ordered wrt each other, any any IO
> ops are ordered wrt any memory writes before it (in particular the
> "people wrote to memory, and then did a IO write to start DMA" case).
> 
> Also, since "spin_unlock()" on alpha has a memory barrier before
> releasing the lock, it means that IO ends up being ordered wrt locks
> too.
> 
> Does it mean that a "ioread()" can then be delayed until after later
> non-IO memory operations? Yes. But that seems harmless
> 
> That said, I didn't spend a _ton_ of time thinking about this and
> looking at the code, but it does seem like the simplest model really
> is: "just always do a mb() before any IO ops". I think the "order IO
> wrt each other, and wrt any _preceding_ memory traffic (and all
> locking) is the important part, and that would seem to guarantee it.
> 
> Obviously there can then be other re-ordering at the IO layer level
> (ie posted writes and IO being re-ordered across different devices
> etc), but that's universal and not alpha-specific.
> 
> Does anybody see any worries with the "just move the mb() earlier in
> the ioread functions" model?

That makes sense to me for this Alpha-specific case, but in general I
don't think we require that I/O accesses to different endpoints are
ordered with respect to each other, which was implied by one of Maciej's
examples. e.g.

writeb(0x42, _mmio_reg);
readb(_mmio_reg);

are not ordered with respect to each other (and I think if you wanted
this ordering in general you're going to need more than just fences).

I've been trying to write this up in memory-barriers.txt, so if anybody
has comments or concerns I'm keen to hear them, especially if there's
a feeling that we should be providing stronger guarantees for port I/O:

http://lkml.kernel.org/r/20190222175525.1198-1-will.dea...@arm.com

Cheers,

Will


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Linus Torvalds
On Tue, Feb 26, 2019 at 9:52 AM Maciej W. Rozycki  wrote:
>
>  The thing is taking for example `ioreadX' and `iowriteX' we have `mb'
> before a write and `mb' after a read.  So if we do say (addresses are
> arbitrary for illustration only):
>
> iowrite8(0, 1);
> ioread8(2);
>
> then these effectively expand to chipset-specific:
>
> mb();
> foo_iowrite8(0, 1);
> foo_ioread8(2);
> mb();

Yeah, looking at the current alpha io.c file, I'd suggest that all IO
operations just do the "mb()" _before_ doing the op.

That way all IO ops are at least ordered wrt each other, any any IO
ops are ordered wrt any memory writes before it (in particular the
"people wrote to memory, and then did a IO write to start DMA" case).

Also, since "spin_unlock()" on alpha has a memory barrier before
releasing the lock, it means that IO ends up being ordered wrt locks
too.

Does it mean that a "ioread()" can then be delayed until after later
non-IO memory operations? Yes. But that seems harmless

That said, I didn't spend a _ton_ of time thinking about this and
looking at the code, but it does seem like the simplest model really
is: "just always do a mb() before any IO ops". I think the "order IO
wrt each other, and wrt any _preceding_ memory traffic (and all
locking) is the important part, and that would seem to guarantee it.

Obviously there can then be other re-ordering at the IO layer level
(ie posted writes and IO being re-ordered across different devices
etc), but that's universal and not alpha-specific.

Does anybody see any worries with the "just move the mb() earlier in
the ioread functions" model?

Linus


Re: [PATCH] add delay between port write and port read

2019-02-26 Thread Maciej W. Rozycki
Linus,

 I have added you to this discussion in hope you can clear any uncertainty 
there might be about MMIO ordering properties of the Alpha; a question for 
you is towards the end of this e-mail.

On Thu, 21 Feb 2019, Mikulas Patocka wrote:

> Do you think that we should fix this by identifying hardware that needs 
> the delays and adding the delays there?

 First of all we need to correctly identify what is really going on there.  

 And I've had a look at our arch/alpha/kernel/io.c and 
arch/alpha/include/asm/io.h as they stand now and AFAICT what they have is 
completely broken and if it works anywhere, then by pure luck only, e.g. 
because external bus accesses complete quickly enough for code execution 
not to progress to a subsequent external bus access.

 The thing is taking for example `ioreadX' and `iowriteX' we have `mb' 
before a write and `mb' after a read.  So if we do say (addresses are 
arbitrary for illustration only):

iowrite8(0, 1);
ioread8(2);

then these effectively expand to chipset-specific:

mb();
foo_iowrite8(0, 1);
foo_ioread8(2);
mb();

Which means `foo_iowrite8' and `foo_ioread8' are unordered WRT each other 
as there is no barrier between them.  Worse yet for:

iowrite8(0, 1);
ioread8(1);

the `ioread8' operation may not even appear externally, as the CPU may 
peek the value in the CPU's write buffer while the write is still pending.  
A similar consideration can be done for `readX' and `writeX', as well as a 
mixture between the two kinds of these accesses.

 We do need to add a preceding `mb' to all the `*readX' accessors to 
satisfy the strong ordering requirement WRT any preceding `*writeX' 
access.  We need that leading `mb' in `*_relaxed' accessors as well.  And 
we need to alias `mmiowb' to `wmb' (unless it's removed as I've seen 
with Will's recent patches).

 NB, the 82378ZB SIO adds a minimum of 5 ISA clocks between back-to-back 
accesses from PCI to ISA, which AFAIU should be enough for the PC87332 
Super I/O.  More can be configured if required, but I doubt that is 
needed, because it wasn't for 20+ years.

> In my opinion, adding mb() to the port accessing functions is safer - it 
> is 6 line patch.

 The amount of code does not matter as long as it is wrong.  We need to 
get this right.

> Reading all the hardware manuals is time consuming and hardly anyone will 
> do it for 25 years old hardware.

 I do this regularly and I'm fine doing it this time too, also for the 
value of the education gained.  I'll see if I can experiment with my 
hardware too, but regrettably that'll have to wait a couple of months at 
the very least (due to Brexit :( -- I have left my Alpha in the UK and the 
very last weekend I literally headed off to stay away from all of the 
upcoming mess).

 Meanwhile here is what I have tracked down in the architecture manual[1]:

"Access to local I/O space does not cause any implicit read/write 
ordering; explicit barrier instructions must be used to ensure any desired 
ordering.  Barrier instructions must be used:

* After updating a memory-resident data structure and before writing a 
  local I/O space location to notify the device of the updates.

* Between multiple consecutive direct accesses to local I/O space, e.g.  
  device control registers, if those accesses are expected to be ordered
  at the device.

Again, note that implementations may cache not only memory-resident data
structures, but also local I/O space locations."

-- which in my opinion makes it clear that we need these barriers that 
were removed with commit 92d7223a7423 ("alpha: io: reorder barriers to 
guarantee writeX() and iowriteX() ordering #2"), in addition to ones added 
there (the `*_relaxed' accessors don't need the latter ones though).

 NB for some reason the whole chapter on I/O has been reduced in later 
revisions of the architecure manual[2][3] (as well as the shortened 
version of the book called the architecture handbook) to a single-page 
overview lacking any details.  I think that does not make the architecture 
ordered any more strongly WRT MMIO though; instead it makes it even less 
specified.

 Linus, can you confirm or contradict my conclusions taken from the 
manual in the context of their later removal?

 You did the Alpha port back in mid-1990s (BTW, Jon 'maddog' spoke very 
fondly about that effort at FOSDEM earlier this month) and certainly had 
to know these details to complete it and you worked with people directly 
involved with the design of the Alpha at the time it was being developed, 
and given the somewhat terse formal specification you could therefore be 
the only person around I know of who can give a definite answer.  I hope 
you still remember the bits.

 Questions, thoughts?

References:

[1] Richard L. Sites, ed., "Alpha Architecture Reference Manual", 
Digital Press, 1992, Chapter 8, "Input/Output (I)", Section 8.2.1 
"Read/Write Ordering",