Re: svn commit: r295557 - head/sys/dev/uart

2016-02-16 Thread Ian Lepore
On Tue, 2016-02-16 at 12:46 +1100, Bruce Evans wrote:
> On Mon, 15 Feb 2016, Ian Lepore wrote:
> 
> > On Tue, 2016-02-16 at 11:01 +1100, Bruce Evans wrote:
> >> On Mon, 15 Feb 2016, Ian Lepore wrote:
> >>
> >>> On Tue, 2016-02-16 at 09:28 +1100, Bruce Evans wrote:
>  On Mon, 15 Feb 2016, Michal Meloun wrote:
> 
> > [...]
> > Please note that ARM architecture does not have vectored interrupts,
> > CPU must read actual interrupt source from external interrupt
> > controller (GIC) register. This register contain predefined value if
> > none of interrupts are active.
> >
> > 1 - CPU1: enters ns8250_bus_transmit() and sets IER_ETXRDY.
> > 2 - HW: UART interrupt is asserted, processed by GIC and signaled
> >to CPU2.
> > 3 - CPU2: enters interrupt service.
> 
>  It is blocked by uart_lock(), right?
> 
> > 4 - CPU1: writes character to into REG_DATA register.
> > 5 - HW: UART clear its interrupt request
> > 6 - CPU2: reads interrupt source register. No active interrupt is
> >found, spurious interrupt is signaled, and CPU leaves interrupted
> >state.
> > 7 - CPU1: executes uart_barrier(). This function is not empty on ARM,
> >and can be slow in some cases.
> 
>  It is not empty even on x86, although it probably should be.
> 
>  BTW, if arm needs the barrier, then how does it work with
>  bus_space_barrier() referenced in just 25 files in all of /sys/dev?
> >>>
> >>> With a hack, of course.  In the arm interrupt-controller drivers we
> >>> always call bus_space_barrier() right before doing an EOI.  It's not a
> >>> 100% solution, but in practice it seems to work pretty well.
> >>
> >> I thought about the x86 behaviour a bit more and now see that it does
> >> need barriers but not the ones given by bus_space_barrier().  All (?)
> >> interrupt handlers use mutexes (if not driver ones, then higher-level
> >> ones).   These might give stronger or different ordering than given by
> >> bus_space_barrier().  On x86, they use the same memory bus lock as
> >> the bus_space_barrier().  This is needed to give ordering across
> >> CPUs.  But for accessing a single device, you only need program order
> >> for a single CPU.  This is automatic on x86 provided a mutex is used
> >> to prevent other CPUs accessing the same device.  And if you don't use
> >> a mutex, then bus_space_barrier() cannot give the necessary ordering
> >> since if cannot prevent other CPUs interfering.
> >>
> >> So how does bus_space_barrier() before EOI make much difference?  It
> >> doesn't affect the order for a bunch of accesses on a single CPU.
> >> It must do more than a mutex to do something good across CPUs.
> >> Arguably, it is a bug in mutexes is they don't gives synchronization
> >> for device memory.
> >>> ...
> >>> The hack code does a drain-write-buffer which doesn't g'tee that the
> >>> slow peripheral write has made it all the way to the device, but it
> >>> does at least g'tee that the write to the bus the perhiperal is on has
> >>> been posted and ack'd by any bus<->bus bridge, and that seems to be
> >>> good enough in practice.  (If there were multiple bridged busses
> >>> downstream it probably wouldn't be, but so far things aren't that
> >>> complicated inside the socs we support.)
> >>
> >> Hmm, so there is some automatic strong ordering but mutexes don't
> >> work for device memory?
> >
> > I guess you keep mentioning mutexes because on x86 their implementation
> > uses some of the same instructions that are involved in bus_space
> > barriers on x86?  Otherwise I can't see what they have to do with
> > anything related to the spurious interrupts that happen on arm.  (You
> > also mentioned multiple CPUs, which is not a requirement for this
> > trouble on arm, it'll happen with a single core.)
> 
> Partly.  I wasn't worrying about this "spurious" interrupt but locking
> in general.  Now I don't see how mutexes can work unless they order
> device accesses in exactly the same way as ordinary memory accesses.
> 

Mutexes on arm are implemented with entirely different instructions,
which do not cause any of this buffer draining or influence the
ordering of surrouding accesses to non-mutex data.  When a mutex is
used to prevent concurrent hardware access between the interrupt and
non-interrupt parts of a driver, or multiple cores accessing the
hardware, a bus_space_barrier() is required after writing to the
hardware and before releasing the mutex.

Of course, such barriers don't exist in most drivers, especially ones
not written for soc-specific hardware, and to the degree those drivers
work, it's by accident.  ::sigh::  There's no easy way to slip in a
"mostly fixes all drivers" hack like the EOI hack, short of doing a
barrier at the end of every bus_space access, and that's too expensive.

> > The piece of info you're missing might be the fact that memory-mapped
> > device registers on arm are mapped with the Device 

Re: svn commit: r295557 - head/sys/dev/uart

2016-02-15 Thread Bruce Evans

On Mon, 15 Feb 2016, Ian Lepore wrote:


On Tue, 2016-02-16 at 11:01 +1100, Bruce Evans wrote:

On Mon, 15 Feb 2016, Ian Lepore wrote:


On Tue, 2016-02-16 at 09:28 +1100, Bruce Evans wrote:

On Mon, 15 Feb 2016, Michal Meloun wrote:


[...]
Please note that ARM architecture does not have vectored interrupts,
CPU must read actual interrupt source from external interrupt
controller (GIC) register. This register contain predefined value if
none of interrupts are active.

1 - CPU1: enters ns8250_bus_transmit() and sets IER_ETXRDY.
2 - HW: UART interrupt is asserted, processed by GIC and signaled
   to CPU2.
3 - CPU2: enters interrupt service.


It is blocked by uart_lock(), right?


4 - CPU1: writes character to into REG_DATA register.
5 - HW: UART clear its interrupt request
6 - CPU2: reads interrupt source register. No active interrupt is
   found, spurious interrupt is signaled, and CPU leaves interrupted
   state.
7 - CPU1: executes uart_barrier(). This function is not empty on ARM,
   and can be slow in some cases.


It is not empty even on x86, although it probably should be.

BTW, if arm needs the barrier, then how does it work with
bus_space_barrier() referenced in just 25 files in all of /sys/dev?


With a hack, of course.  In the arm interrupt-controller drivers we
always call bus_space_barrier() right before doing an EOI.  It's not a
100% solution, but in practice it seems to work pretty well.


I thought about the x86 behaviour a bit more and now see that it does
need barriers but not the ones given by bus_space_barrier().  All (?)
interrupt handlers use mutexes (if not driver ones, then higher-level
ones).   These might give stronger or different ordering than given by
bus_space_barrier().  On x86, they use the same memory bus lock as
the bus_space_barrier().  This is needed to give ordering across
CPUs.  But for accessing a single device, you only need program order
for a single CPU.  This is automatic on x86 provided a mutex is used
to prevent other CPUs accessing the same device.  And if you don't use
a mutex, then bus_space_barrier() cannot give the necessary ordering
since if cannot prevent other CPUs interfering.

So how does bus_space_barrier() before EOI make much difference?  It
doesn't affect the order for a bunch of accesses on a single CPU.
It must do more than a mutex to do something good across CPUs.
Arguably, it is a bug in mutexes is they don't gives synchronization
for device memory.

...
The hack code does a drain-write-buffer which doesn't g'tee that the
slow peripheral write has made it all the way to the device, but it
does at least g'tee that the write to the bus the perhiperal is on has
been posted and ack'd by any bus<->bus bridge, and that seems to be
good enough in practice.  (If there were multiple bridged busses
downstream it probably wouldn't be, but so far things aren't that
complicated inside the socs we support.)


Hmm, so there is some automatic strong ordering but mutexes don't
work for device memory?


I guess you keep mentioning mutexes because on x86 their implementation
uses some of the same instructions that are involved in bus_space
barriers on x86?  Otherwise I can't see what they have to do with
anything related to the spurious interrupts that happen on arm.  (You
also mentioned multiple CPUs, which is not a requirement for this
trouble on arm, it'll happen with a single core.)


Partly.  I wasn't worrying about this "spurious" interrupt but locking
in general.  Now I don't see how mutexes can work unless they order
device accesses in exactly the same way as ordinary memory accesses.


The piece of info you're missing might be the fact that memory-mapped
device registers on arm are mapped with the Device attribute which
gives stronger ordering than Normal memory.  In particular, writes are
in order and not combined, but they are buffered.


arm doesn't need the barrier after each output byte then, and in general
bus_space_write_multi_N() shouldn't need a barrier after each write, but
only it can reasonably know if it does, so any barrier(s) for it belong
in it and not in callers.

Drivers for arm could also do a bunch of unrelated writes (and reads?)
followed by a single barrier.  But this is even more MD.


In some designs
there are multiple buffers, so there can be multiple writes that
haven't reached the hardware yet.  A read from the same region will
stall until all writes to that region are done, and there is also an
instruction that specifically forces out the buffers and stalls until
they're empty.


The multiple regions should be in separate bus spaces so that the barriers
can be optimized to 1 at the end.  I now see how the optimization can
almost be done at the bus_space level -- drivers call
bus_space_maybe_barrier() after every access, but this is a no-op on bus
spaces with sufficient ordering iff the bus space hasn't changed;
drivers call bus_space_sync_barriers() at the end.  I think the DMA sync
functions work a bit like this.  A 

Re: svn commit: r295557 - head/sys/dev/uart

2016-02-15 Thread Ian Lepore
On Tue, 2016-02-16 at 11:01 +1100, Bruce Evans wrote:
> On Mon, 15 Feb 2016, Ian Lepore wrote:
> 
> > On Tue, 2016-02-16 at 09:28 +1100, Bruce Evans wrote:
> >> On Mon, 15 Feb 2016, Michal Meloun wrote:
> >>
> >>> [...]
> >>> Please note that ARM architecture does not have vectored interrupts,
> >>> CPU must read actual interrupt source from external interrupt
> >>> controller (GIC) register. This register contain predefined value if
> >>> none of interrupts are active.
> >>>
> >>> 1 - CPU1: enters ns8250_bus_transmit() and sets IER_ETXRDY.
> >>> 2 - HW: UART interrupt is asserted, processed by GIC and signaled
> >>>to CPU2.
> >>> 3 - CPU2: enters interrupt service.
> >>
> >> It is blocked by uart_lock(), right?
> >>
> >>> 4 - CPU1: writes character to into REG_DATA register.
> >>> 5 - HW: UART clear its interrupt request
> >>> 6 - CPU2: reads interrupt source register. No active interrupt is
> >>>found, spurious interrupt is signaled, and CPU leaves interrupted
> >>>state.
> >>> 7 - CPU1: executes uart_barrier(). This function is not empty on ARM,
> >>>and can be slow in some cases.
> >>
> >> It is not empty even on x86, although it probably should be.
> >>
> >> BTW, if arm needs the barrier, then how does it work with
> >> bus_space_barrier() referenced in just 25 files in all of /sys/dev?
> >
> > With a hack, of course.  In the arm interrupt-controller drivers we
> > always call bus_space_barrier() right before doing an EOI.  It's not a
> > 100% solution, but in practice it seems to work pretty well.
> 
> I thought about the x86 behaviour a bit more and now see that it does
> need barriers but not the ones given by bus_space_barrier().  All (?)
> interrupt handlers use mutexes (if not driver ones, then higher-level
> ones).   These might give stronger or different ordering than given by
> bus_space_barrier().  On x86, they use the same memory bus lock as
> the bus_space_barrier().  This is needed to give ordering across
> CPUs.  But for accessing a single device, you only need program order
> for a single CPU.  This is automatic on x86 provided a mutex is used
> to prevent other CPUs accessing the same device.  And if you don't use
> a mutex, then bus_space_barrier() cannot give the necessary ordering
> since if cannot prevent other CPUs interfering.
> 
> So how does bus_space_barrier() before EOI make much difference?  It
> doesn't affect the order for a bunch of accesses on a single CPU.
> It must do more than a mutex to do something good across CPUs.
> Arguably, it is a bug in mutexes is they don't gives synchronization
> for device memory.
> 
> > ...
> > The hack code does a drain-write-buffer which doesn't g'tee that the
> > slow peripheral write has made it all the way to the device, but it
> > does at least g'tee that the write to the bus the perhiperal is on has
> > been posted and ack'd by any bus<->bus bridge, and that seems to be
> > good enough in practice.  (If there were multiple bridged busses
> > downstream it probably wouldn't be, but so far things aren't that
> > complicated inside the socs we support.)
> 
> Hmm, so there is some automatic strong ordering but mutexes don't
> work for device memory?
> 

I guess you keep mentioning mutexes because on x86 their implementation
uses some of the same instructions that are involved in bus_space
barriers on x86?  Otherwise I can't see what they have to do with
anything related to the spurious interrupts that happen on arm.  (You
also mentioned multiple CPUs, which is not a requirement for this
trouble on arm, it'll happen with a single core.)

The piece of info you're missing might be the fact that memory-mapped
device registers on arm are mapped with the Device attribute which
gives stronger ordering than Normal memory.  In particular, writes are
in order and not combined, but they are buffered.  In some designs
there are multiple buffers, so there can be multiple writes that
haven't reached the hardware yet.  A read from the same region will
stall until all writes to that region are done, and there is also an
instruction that specifically forces out the buffers and stalls until
they're empty.

Without doing the drain-write-buffer (or a device read) after each
write, the only g'tee you'd get is that each device sees the writes
directed at it in the order they were issued.  With devices A and B,
you could write a sequence of A1 B1 A2 B2 A3 B3 and they could arrive
at the devices as A1 A2 B1 B2 A3 B3, or any other permutation, as long
as device A sees 123 and device B sees 123.

So on arm the need for barriers arises primarily when two different
devices interact with each other in some way and it matters that a
series of interleaved writes reaches the devices in the same relative
order they were issued by the cpu.  That condition mostly comes up only
in terms of the PIC interacting with basically every other device. 

I expect trouble to show up any time now as we start implementing DMA
drivers in socs that have generic DMA 

Re: svn commit: r295557 - head/sys/dev/uart

2016-02-15 Thread Bruce Evans

On Mon, 15 Feb 2016, Ian Lepore wrote:


On Tue, 2016-02-16 at 09:28 +1100, Bruce Evans wrote:

On Mon, 15 Feb 2016, Michal Meloun wrote:


[...]
Please note that ARM architecture does not have vectored interrupts,
CPU must read actual interrupt source from external interrupt
controller (GIC) register. This register contain predefined value if
none of interrupts are active.

1 - CPU1: enters ns8250_bus_transmit() and sets IER_ETXRDY.
2 - HW: UART interrupt is asserted, processed by GIC and signaled
   to CPU2.
3 - CPU2: enters interrupt service.


It is blocked by uart_lock(), right?


4 - CPU1: writes character to into REG_DATA register.
5 - HW: UART clear its interrupt request
6 - CPU2: reads interrupt source register. No active interrupt is
   found, spurious interrupt is signaled, and CPU leaves interrupted
   state.
7 - CPU1: executes uart_barrier(). This function is not empty on ARM,
   and can be slow in some cases.


It is not empty even on x86, although it probably should be.

BTW, if arm needs the barrier, then how does it work with
bus_space_barrier() referenced in just 25 files in all of /sys/dev?


With a hack, of course.  In the arm interrupt-controller drivers we
always call bus_space_barrier() right before doing an EOI.  It's not a
100% solution, but in practice it seems to work pretty well.


I thought about the x86 behaviour a bit more and now see that it does
need barriers but not the ones given by bus_space_barrier().  All (?)
interrupt handlers use mutexes (if not driver ones, then higher-level
ones).   These might give stronger or different ordering than given by
bus_space_barrier().  On x86, they use the same memory bus lock as
the bus_space_barrier().  This is needed to give ordering across
CPUs.  But for accessing a single device, you only need program order
for a single CPU.  This is automatic on x86 provided a mutex is used
to prevent other CPUs accessing the same device.  And if you don't use
a mutex, then bus_space_barrier() cannot give the necessary ordering
since if cannot prevent other CPUs interfering.

So how does bus_space_barrier() before EOI make much difference?  It
doesn't affect the order for a bunch of accesses on a single CPU.
It must do more than a mutex to do something good across CPUs.
Arguably, it is a bug in mutexes is they don't gives synchronization
for device memory.


...
The hack code does a drain-write-buffer which doesn't g'tee that the
slow peripheral write has made it all the way to the device, but it
does at least g'tee that the write to the bus the perhiperal is on has
been posted and ack'd by any bus<->bus bridge, and that seems to be
good enough in practice.  (If there were multiple bridged busses
downstream it probably wouldn't be, but so far things aren't that
complicated inside the socs we support.)


Hmm, so there is some automatic strong ordering but mutexes don't
work for device memory?


The g'teed way is to do a readback of the register written-to, or some
nearby register if it's write-only.  That's a hard thing to do in a
bus_space_barrier implementation that has no knowledge of things like
which registers in a device might be write-only.

And of course, then you're still left with the problem of hundreds of
drivers that don't do any barrier calls at all.


Mostly unimportant ones for fast NICs :-).  These could be made even slower
using lots of barriers.  But the fast ones already have to use special
interfaces DMA since single-word bus accesses are too slow.


[...]


Also, at this time, UART driver is last one known to generate spurious
interrupts in ARM world.

So, whats now? I can #ifdef __arm__ change made in r295557 (for maximum
safety), if you want this. Or we can just wait to see if someone reports
a problem ...


Use better methods.


How about a dev.uart..broken_txrdy tunable sysctl that defaults
to the new behavior but can be turned on by anyone with the buggy
hardware?


It's difficult to know that such knobs even exist.

Bruce
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r295557 - head/sys/dev/uart

2016-02-15 Thread Ian Lepore
On Tue, 2016-02-16 at 09:28 +1100, Bruce Evans wrote:
> On Mon, 15 Feb 2016, Michal Meloun wrote:
> 
> > [...]
> > Please note that ARM architecture does not have vectored interrupts,
> > CPU must read actual interrupt source from external interrupt
> > controller (GIC) register. This register contain predefined value if
> > none of interrupts are active.
> >
> > 1 - CPU1: enters ns8250_bus_transmit() and sets IER_ETXRDY.
> > 2 - HW: UART interrupt is asserted, processed by GIC and signaled
> >to CPU2.
> > 3 - CPU2: enters interrupt service.
> 
> It is blocked by uart_lock(), right?
> 
> > 4 - CPU1: writes character to into REG_DATA register.
> > 5 - HW: UART clear its interrupt request
> > 6 - CPU2: reads interrupt source register. No active interrupt is
> >found, spurious interrupt is signaled, and CPU leaves interrupted
> >state.
> > 7 - CPU1: executes uart_barrier(). This function is not empty on ARM,
> >and can be slow in some cases.
> 
> It is not empty even on x86, although it probably should be.
> 
> BTW, if arm needs the barrier, then how does it work with
> bus_space_barrier() referenced in just 25 files in all of /sys/dev?
> 

With a hack, of course.  In the arm interrupt-controller drivers we
always call bus_space_barrier() right before doing an EOI.  It's not a
100% solution, but in practice it seems to work pretty well.

Spurious interrupts which are not caused by driver bugs on arm tend to
happen when the interrupt controller and some peripheral are on
different internal busses, especially when the peripheral is on a slow
bus and access to the PIC's registers is faster.  In that case it often
happens that the clearing of the condition at the source goes slower
than the EOI and the PIC immediately re-interrupts.  By time we've re
-vectored back to the PIC driver and read the pending-irq register the
slow write to the peripheral has completed and the interrupt has been
deasserted, so the PIC returns the spurious-irq indication.

The hack code does a drain-write-buffer which doesn't g'tee that the
slow peripheral write has made it all the way to the device, but it
does at least g'tee that the write to the bus the perhiperal is on has
been posted and ack'd by any bus<->bus bridge, and that seems to be
good enough in practice.  (If there were multiple bridged busses
downstream it probably wouldn't be, but so far things aren't that
complicated inside the socs we support.)

The g'teed way is to do a readback of the register written-to, or some
nearby register if it's write-only.  That's a hard thing to do in a
bus_space_barrier implementation that has no knowledge of things like
which registers in a device might be write-only.

And of course, then you're still left with the problem of hundreds of
drivers that don't do any barrier calls at all.

> [...]
> 
> > Also, at this time, UART driver is last one known to generate spurious
> > interrupts in ARM world.
> > 
> > So, whats now? I can #ifdef __arm__ change made in r295557 (for maximum
> > safety), if you want this. Or we can just wait to see if someone reports
> > a problem ...
> 
> Use better methods.
> 

How about a dev.uart..broken_txrdy tunable sysctl that defaults
to the new behavior but can be turned on by anyone with the buggy
hardware?

-- Ian

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r295557 - head/sys/dev/uart

2016-02-15 Thread Adrian Chadd
On 15 February 2016 at 14:28, Bruce Evans  wrote:


>
> BTW, if arm needs the barrier, then how does it work with
> bus_space_barrier() referenced in just 25 files in all of /sys/dev?

likely the same reason it does for mips - everything's cache coherent
on hardware that has pci/pcie slots, or people just didn't notice.

For ppc they put an implicit barrier into the bus calls because too
many drivers don't do barriers (but USB does, yay) and fixing it all
was likely a terrible task. I'm doing it for the MIPS related pieces
that need it, but it's slow going.


-adrian
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r295557 - head/sys/dev/uart

2016-02-15 Thread Bruce Evans

On Mon, 15 Feb 2016, Michal Meloun wrote:


Dne 13.02.2016 v 1:58 Marius Strobl napsal(a):

On Sat, Feb 13, 2016 at 06:53:25AM +1100, Bruce Evans wrote:

On Fri, 12 Feb 2016, Marius Strobl wrote:


On Fri, Feb 12, 2016 at 05:14:58AM +, Michal Meloun wrote:

Author: mmel
Date: Fri Feb 12 05:14:58 2016
New Revision: 295557
URL: https://svnweb.freebsd.org/changeset/base/295557

Log:
  UART: Fix spurious interrupts generated by ns8250 and lpc drivers:
   - don't enable transmitter empty interrupt before filling TX FIFO.


Are you sure this doesn't create a race that leads to lost TX ready
interrupts? For a single character, the TX FIFO very well may be empty
again at the point in time IER_ETXRDY is enabled now.  With the varying
behavior of 8250/16x50 chips - some of which is documented in sio(4) -


That is mostly FUD.  More likely driver bugs than chip bugs.

A non-broken xx50 interrupts after you (re)enable tx interrupts, iff
the fifo is already empty.  This gives a "spurious" interrupt.  But
perhaps depending on this is too fragile.  Normal operation is to keep
...

I'd expect there are many that no longer generate a TX ready at all
with this change in place. In this case, receiving spurious interrupts
(which ones? IIR_NOPEND? IIR_TXRDY?) with some devices appears to be
the lesser evil.


Not many.  Only broken ones.


In my experience many xx50 are broken, especially the integrated
on-board ones you still have in workstations and servers today.


I haven't seen any with this bug.  But I haven't seen many newer ones
or any in workstations or servers since I only use consumer-grade
motherboards -- maybe those are better :-).  Why would a new ASIC
reimplement bugs that were in the original 8250?  (IIRC, there was
an 8250 with lots of bugs and limited production, and an 8250A that
was better.  FreeBSD is much newer than the 8250A so it might never
have been used on an 8250.)


The "spurious" interrupts are just normal
ones from bon-broken chips:

- uart first does a potentially-unbounded busy-wait before the doing
   anything to ensure that the fifo is empty.  This should be unecessary
   since this function should not be called unless sc_txbusy is 0 and
   sc_txbusy should be nonzero if the fifo is not empty.  If it is called
   when the fifo is not emptu, then the worst-case busy-wait is approx.
   640 seconds for a 128-byte fifo at 2 bps. The 'broken_txfifo case'
   busy-waits for a long time in normal operation.
- enabling the tx interrupt causes one immediately on non-broken uarts
- the interrupt handler is normally called immediately.  Then it always
   blocks on uart_lock()
- then the main code fills the fifo and unlocks
- then the interrupt handler runs.  It normally finds that the fifo is
   not empty (since it has just been filled) and does nothing
- another tx interrupt occurs later and the interrupt handler runs again.
   It normally doesn't hit the lock again, and normally finds the fifo
   empty, so it does something.


You correctly describe what happens at r295556 with a non-broken xx50.
That revision causes a spurious interrupt with non-broken xx50 but
also ensures that the relevant TX interrupt isn't missed with broken
xx50 that do not issue an interrupt when enabling IER_ETXRDY. Besides,
as you say, the general approach of dynamically enabling TX interrupts
works around the common brokenness of these interrupts no longer going
away when they should.


But you are probably correct that a 1-byte write to the fifo often
loses the race.  This depends on how fast the hardware moves the byte
from the fifo to the tx register.  Actually, since we didn't wait
for the tx register to become empty, it will often take a full character
time before the move.  After that, I think it might take 1 bit time but
no more.


My concern is that with r295557, when this race is lost no TX interrupt
is seen at all with broken xx50 that do not trigger an interrupt when
enabling IER_ETXRDY.


Certainly that is a concern if there are chips with the bug.


No, I?m not sure, nobody can be sure if we talking about ns8250
compatible device(s). Also, all UARTs known to me, generates an
interrupt on TX unmasking (assuming level sensitive interrupt).
Only IIR can reports bad priority so some very old 8250 (if
memory still serve me).


Nah, compatible means only having bugs that are compatible, and a
noremal xx50 doesn't have this bug :-).  Strict compatibility with
the original 8250 would give lots of bugs but it is unlikely that
anything new is compatible with that.

Driver bugs are still more likely.  This reminds me that the most
likely source of them is edge triggered ISA interrupts and drivers
not doing enough in the interrupt handler to ensure getting a new
interrupt.  I think you remember correctly that bad priority was
one of the bugs in the original 8250.


I only found following scenario on multiple ARM SOCs.

Please note that ARM architecture does not have vectored interrupts,
CPU must read actual interrupt 

Re: svn commit: r295557 - head/sys/dev/uart

2016-02-15 Thread Michal Meloun
Dne 13.02.2016 v 1:58 Marius Strobl napsal(a):
> On Sat, Feb 13, 2016 at 06:53:25AM +1100, Bruce Evans wrote:
>> On Fri, 12 Feb 2016, Marius Strobl wrote:
>>
>>> On Fri, Feb 12, 2016 at 05:14:58AM +, Michal Meloun wrote:
 Author: mmel
 Date: Fri Feb 12 05:14:58 2016
 New Revision: 295557
 URL: https://svnweb.freebsd.org/changeset/base/295557

 Log:
   UART: Fix spurious interrupts generated by ns8250 and lpc drivers:
- don't enable transmitter empty interrupt before filling TX FIFO.
>>>
>>> Are you sure this doesn't create a race that leads to lost TX ready
>>> interrupts? For a single character, the TX FIFO very well may be empty
>>> again at the point in time IER_ETXRDY is enabled now.  With the varying
>>> behavior of 8250/16x50 chips - some of which is documented in sio(4) -
>>
>> That is mostly FUD.  More likely driver bugs than chip bugs.
>>
>> A non-broken xx50 interrupts after you (re)enable tx interrupts, iff
>> the fifo is already empty.  This gives a "spurious" interrupt.  But
>> perhaps depending on this is too fragile.  Normal operation is to keep
>> the tx interrupt enabled and depend on writing to the fifo causing a
>> tx interrupt later.  But it is a more common chip bug for tx interrupts
>> later to not go away when they should (normally by reading the IIR),
>> so some drivers toggle the tx interrupt enable dynamically.
>>
>> An example of a driver bug is only enabling tx interrupts for this.
>> It takes a transition of the interrupt enable bit from off to on to
>> get the interrupt.  Other driver bugs may result in a missing transition
>> because the bit was supposed to be off but is actually on.
>>
>>> I'd expect there are many that no longer generate a TX ready at all
>>> with this change in place. In this case, receiving spurious interrupts
>>> (which ones? IIR_NOPEND? IIR_TXRDY?) with some devices appears to be
>>> the lesser evil.
>>
>> Not many.  Only broken ones.
> 
> In my experience many xx50 are broken, especially the integrated
> on-board ones you still have in workstations and servers today.
> 
>> The "spurious" interrupts are just normal
>> ones from bon-broken chips:
>>
>> - uart first does a potentially-unbounded busy-wait before the doing
>>anything to ensure that the fifo is empty.  This should be unecessary
>>since this function should not be called unless sc_txbusy is 0 and
>>sc_txbusy should be nonzero if the fifo is not empty.  If it is called
>>when the fifo is not emptu, then the worst-case busy-wait is approx.
>>640 seconds for a 128-byte fifo at 2 bps. The 'broken_txfifo case'
>>busy-waits for a long time in normal operation.
>> - enabling the tx interrupt causes one immediately on non-broken uarts
>> - the interrupt handler is normally called immediately.  Then it always
>>blocks on uart_lock()
>> - then the main code fills the fifo and unlocks
>> - then the interrupt handler runs.  It normally finds that the fifo is
>>not empty (since it has just been filled) and does nothing
>> - another tx interrupt occurs later and the interrupt handler runs again.
>>It normally doesn't hit the lock again, and normally finds the fifo
>>empty, so it does something.
> 
> You correctly describe what happens at r295556 with a non-broken xx50.
> That revision causes a spurious interrupt with non-broken xx50 but
> also ensures that the relevant TX interrupt isn't missed with broken
> xx50 that do not issue an interrupt when enabling IER_ETXRDY. Besides,
> as you say, the general approach of dynamically enabling TX interrupts
> works around the common brokenness of these interrupts no longer going
> away when they should.
> 
>> But you are probably correct that a 1-byte write to the fifo often
>> loses the race.  This depends on how fast the hardware moves the byte
>> from the fifo to the tx register.  Actually, since we didn't wait
>> for the tx register to become empty, it will often take a full character
>> time before the move.  After that, I think it might take 1 bit time but
>> no more.
> 
> My concern is that with r295557, when this race is lost no TX interrupt
> is seen at all with broken xx50 that do not trigger an interrupt when
> enabling IER_ETXRDY.
> 
> Marius
> 

No, I’m not sure, nobody can be sure if we talking about ns8250
compatible device(s). Also, all UARTs known to me, generates an
interrupt on TX unmasking (assuming level sensitive interrupt).
Only IIR can reports bad priority so some very old 8250 (if
memory still serve me).

I only found following scenario on multiple ARM SOCs.

Please note that ARM architecture does not have vectored interrupts,
CPU must read actual interrupt source from external interrupt
controller (GIC) register. This register contain predefined value if
none of interrupts are active.

1 - CPU1: enters ns8250_bus_transmit() and sets IER_ETXRDY.
2 - HW: UART interrupt is asserted, processed by GIC and signaled
to CPU2.
3 - CPU2: enters interrupt service.
4 

Re: svn commit: r295557 - head/sys/dev/uart

2016-02-12 Thread Marius Strobl
On Fri, Feb 12, 2016 at 05:14:58AM +, Michal Meloun wrote:
> Author: mmel
> Date: Fri Feb 12 05:14:58 2016
> New Revision: 295557
> URL: https://svnweb.freebsd.org/changeset/base/295557
> 
> Log:
>   UART: Fix spurious interrupts generated by ns8250 and lpc drivers:
>- don't enable transmitter empty interrupt before filling TX FIFO.

Are you sure this doesn't create a race that leads to lost TX ready
interrupts? For a single character, the TX FIFO very well may be empty
again at the point in time IER_ETXRDY is enabled now. With the varying
behavior of 8250/16x50 chips - some of which is documented in sio(4) -
I'd expect there are many that no longer generate a TX ready at all
with this change in place. In this case, receiving spurious interrupts
(which ones? IIR_NOPEND? IIR_TXRDY?) with some devices appears to be
the lesser evil.

Marius

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r295557 - head/sys/dev/uart

2016-02-12 Thread Bruce Evans

On Fri, 12 Feb 2016, Marius Strobl wrote:


On Fri, Feb 12, 2016 at 05:14:58AM +, Michal Meloun wrote:

Author: mmel
Date: Fri Feb 12 05:14:58 2016
New Revision: 295557
URL: https://svnweb.freebsd.org/changeset/base/295557

Log:
  UART: Fix spurious interrupts generated by ns8250 and lpc drivers:
   - don't enable transmitter empty interrupt before filling TX FIFO.


Are you sure this doesn't create a race that leads to lost TX ready
interrupts? For a single character, the TX FIFO very well may be empty
again at the point in time IER_ETXRDY is enabled now.  With the varying
behavior of 8250/16x50 chips - some of which is documented in sio(4) -


That is mostly FUD.  More likely driver bugs than chip bugs.

A non-broken xx50 interrupts after you (re)enable tx interrupts, iff
the fifo is already empty.  This gives a "spurious" interrupt.  But
perhaps depending on this is too fragile.  Normal operation is to keep
the tx interrupt enabled and depend on writing to the fifo causing a
tx interrupt later.  But it is a more common chip bug for tx interrupts
later to not go away when they should (normally by reading the IIR),
so some drivers toggle the tx interrupt enable dynamically.

An example of a driver bug is only enabling tx interrupts for this.
It takes a transition of the interrupt enable bit from off to on to
get the interrupt.  Other driver bugs may result in a missing transition
because the bit was supposed to be off but is actually on.


I'd expect there are many that no longer generate a TX ready at all
with this change in place. In this case, receiving spurious interrupts
(which ones? IIR_NOPEND? IIR_TXRDY?) with some devices appears to be
the lesser evil.


Not many.  Only broken ones.  The "spurious" interrupts are just normal
ones from bon-broken chips:

- uart first does a potentially-unbounded busy-wait before the doing
  anything to ensure that the fifo is empty.  This should be unecessary
  since this function should not be called unless sc_txbusy is 0 and
  sc_txbusy should be nonzero if the fifo is not empty.  If it is called
  when the fifo is not emptu, then the worst-case busy-wait is approx.
  640 seconds for a 128-byte fifo at 2 bps. The 'broken_txfifo case'
  busy-waits for a long time in normal operation.
- enabling the tx interrupt causes one immediately on non-broken uarts
- the interrupt handler is normally called immediately.  Then it always
  blocks on uart_lock()
- then the main code fills the fifo and unlocks
- then the interrupt handler runs.  It normally finds that the fifo is
  not empty (since it has just been filled) and does nothing
- another tx interrupt occurs later and the interrupt handler runs again.
  It normally doesn't hit the lock again, and normally finds the fifo
  empty, so it does something.

But you are probably correct that a 1-byte write to the fifo often
loses the race.  This depends on how fast the hardware moves the byte
from the fifo to the tx register.  Actually, since we didn't wait
for the tx register to become empty, it will often take a full character
time before the move.  After that, I think it might take 1 bit time but
no more.

sio uses a timeout to recover from lost output interrupts in case they
occur because the driver or hardware has unforseen bugs.  This works
too well.  It hides bugs like interrupts not working at all provided
the fifo size is larger enough for the low timeout frequency to give
enough throughput.  On newer systems, the overhead for timeouts is in
the noise compared with the overhead for bus accesses (especially
reads), so a simple timeout method is almost as good as an
interrupt-driven method.  It can probably be even better for output
by calculating times to avoid slow reads of status registers to see
how far the chip got.

Bruce
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


Re: svn commit: r295557 - head/sys/dev/uart

2016-02-12 Thread Marius Strobl
On Sat, Feb 13, 2016 at 06:53:25AM +1100, Bruce Evans wrote:
> On Fri, 12 Feb 2016, Marius Strobl wrote:
> 
> > On Fri, Feb 12, 2016 at 05:14:58AM +, Michal Meloun wrote:
> >> Author: mmel
> >> Date: Fri Feb 12 05:14:58 2016
> >> New Revision: 295557
> >> URL: https://svnweb.freebsd.org/changeset/base/295557
> >>
> >> Log:
> >>   UART: Fix spurious interrupts generated by ns8250 and lpc drivers:
> >>- don't enable transmitter empty interrupt before filling TX FIFO.
> >
> > Are you sure this doesn't create a race that leads to lost TX ready
> > interrupts? For a single character, the TX FIFO very well may be empty
> > again at the point in time IER_ETXRDY is enabled now.  With the varying
> > behavior of 8250/16x50 chips - some of which is documented in sio(4) -
> 
> That is mostly FUD.  More likely driver bugs than chip bugs.
> 
> A non-broken xx50 interrupts after you (re)enable tx interrupts, iff
> the fifo is already empty.  This gives a "spurious" interrupt.  But
> perhaps depending on this is too fragile.  Normal operation is to keep
> the tx interrupt enabled and depend on writing to the fifo causing a
> tx interrupt later.  But it is a more common chip bug for tx interrupts
> later to not go away when they should (normally by reading the IIR),
> so some drivers toggle the tx interrupt enable dynamically.
> 
> An example of a driver bug is only enabling tx interrupts for this.
> It takes a transition of the interrupt enable bit from off to on to
> get the interrupt.  Other driver bugs may result in a missing transition
> because the bit was supposed to be off but is actually on.
> 
> > I'd expect there are many that no longer generate a TX ready at all
> > with this change in place. In this case, receiving spurious interrupts
> > (which ones? IIR_NOPEND? IIR_TXRDY?) with some devices appears to be
> > the lesser evil.
> 
> Not many.  Only broken ones.

In my experience many xx50 are broken, especially the integrated
on-board ones you still have in workstations and servers today.

> The "spurious" interrupts are just normal
> ones from bon-broken chips:
> 
> - uart first does a potentially-unbounded busy-wait before the doing
>anything to ensure that the fifo is empty.  This should be unecessary
>since this function should not be called unless sc_txbusy is 0 and
>sc_txbusy should be nonzero if the fifo is not empty.  If it is called
>when the fifo is not emptu, then the worst-case busy-wait is approx.
>640 seconds for a 128-byte fifo at 2 bps. The 'broken_txfifo case'
>busy-waits for a long time in normal operation.
> - enabling the tx interrupt causes one immediately on non-broken uarts
> - the interrupt handler is normally called immediately.  Then it always
>blocks on uart_lock()
> - then the main code fills the fifo and unlocks
> - then the interrupt handler runs.  It normally finds that the fifo is
>not empty (since it has just been filled) and does nothing
> - another tx interrupt occurs later and the interrupt handler runs again.
>It normally doesn't hit the lock again, and normally finds the fifo
>empty, so it does something.

You correctly describe what happens at r295556 with a non-broken xx50.
That revision causes a spurious interrupt with non-broken xx50 but
also ensures that the relevant TX interrupt isn't missed with broken
xx50 that do not issue an interrupt when enabling IER_ETXRDY. Besides,
as you say, the general approach of dynamically enabling TX interrupts
works around the common brokenness of these interrupts no longer going
away when they should.

> But you are probably correct that a 1-byte write to the fifo often
> loses the race.  This depends on how fast the hardware moves the byte
> from the fifo to the tx register.  Actually, since we didn't wait
> for the tx register to become empty, it will often take a full character
> time before the move.  After that, I think it might take 1 bit time but
> no more.

My concern is that with r295557, when this race is lost no TX interrupt
is seen at all with broken xx50 that do not trigger an interrupt when
enabling IER_ETXRDY.

Marius

___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r295557 - head/sys/dev/uart

2016-02-11 Thread Michal Meloun
Author: mmel
Date: Fri Feb 12 05:14:58 2016
New Revision: 295557
URL: https://svnweb.freebsd.org/changeset/base/295557

Log:
  UART: Fix spurious interrupts generated by ns8250 and lpc drivers:
   - don't enable transmitter empty interrupt before filling TX FIFO.
   - add missing uart_barrier() call in interrupt service routine

Modified:
  head/sys/dev/uart/uart_dev_lpc.c
  head/sys/dev/uart/uart_dev_ns8250.c

Modified: head/sys/dev/uart/uart_dev_lpc.c
==
--- head/sys/dev/uart/uart_dev_lpc.cFri Feb 12 02:53:44 2016
(r295556)
+++ head/sys/dev/uart/uart_dev_lpc.cFri Feb 12 05:14:58 2016
(r295557)
@@ -659,6 +659,7 @@ lpc_ns8250_bus_ipend(struct uart_softc *
if (iir & IIR_TXRDY) {
ipend |= SER_INT_TXIDLE;
uart_setreg(bas, REG_IER, lpc_ns8250->ier);
+   uart_barrier(bas);
} else
ipend |= SER_INT_SIGCHG;
}
@@ -892,12 +893,12 @@ lpc_ns8250_bus_transmit(struct uart_soft
uart_lock(sc->sc_hwmtx);
while ((uart_getreg(bas, REG_LSR) & LSR_THRE) == 0)
;
-   uart_setreg(bas, REG_IER, lpc_ns8250->ier | IER_ETXRDY);
-   uart_barrier(bas);
for (i = 0; i < sc->sc_txdatasz; i++) {
uart_setreg(bas, REG_DATA, sc->sc_txbuf[i]);
uart_barrier(bas);
}
+   uart_setreg(bas, REG_IER, lpc_ns8250->ier | IER_ETXRDY);
+   uart_barrier(bas);
sc->sc_txbusy = 1;
uart_unlock(sc->sc_hwmtx);
return (0);

Modified: head/sys/dev/uart/uart_dev_ns8250.c
==
--- head/sys/dev/uart/uart_dev_ns8250.c Fri Feb 12 02:53:44 2016
(r295556)
+++ head/sys/dev/uart/uart_dev_ns8250.c Fri Feb 12 05:14:58 2016
(r295557)
@@ -708,6 +708,7 @@ ns8250_bus_ipend(struct uart_softc *sc)
if (iir & IIR_TXRDY) {
ipend |= SER_INT_TXIDLE;
uart_setreg(bas, REG_IER, ns8250->ier);
+   uart_barrier(bas);
} else
ipend |= SER_INT_SIGCHG;
}
@@ -979,12 +980,12 @@ ns8250_bus_transmit(struct uart_softc *s
uart_lock(sc->sc_hwmtx);
while ((uart_getreg(bas, REG_LSR) & LSR_THRE) == 0)
;
-   uart_setreg(bas, REG_IER, ns8250->ier | IER_ETXRDY);
-   uart_barrier(bas);
for (i = 0; i < sc->sc_txdatasz; i++) {
uart_setreg(bas, REG_DATA, sc->sc_txbuf[i]);
uart_barrier(bas);
}
+   uart_setreg(bas, REG_IER, ns8250->ier | IER_ETXRDY);
+   uart_barrier(bas);
if (broken_txfifo)
ns8250_drain(bas, UART_DRAIN_TRANSMITTER);
else
___
svn-src-head@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"