On Thu, Apr 25, 2019 at 1:23 AM Jan Kiszka <jan.kis...@siemens.com> wrote:

> On 25.04.19 09:15, C Smith wrote:
> > Hi Jan,
> >
> > Your patch worked somewhat but not completely. It prevents my app from
> stalling
> > forever, but I caugh the serial transmission itself stalling on the
> oscilloscope
> > for quite a long time. My 72 byte TX packet from the xenomai periodic
> task gets
> > cut in half and there is no transmission for 7msec, then the
> transmission
> > resumes. (I'll send you a screenshot)
>
> What is driver and application state during that phase? Who is waiting on
> what?
> This will be the key to resolve that issue as I'm not yet seeing another
> mistake
> in the driver.
>

I don't think there is a bug in the serial driver, per se, but my strange
UART requires more from a driver to prevent stalls.
This is a BCM corp 'BCM87Q' industrial motherboard. They are still sold,
not yet EOL.

We do know a lot about the state the serial driver is in: It is just
waiting, thinking it doesn't have any more bytes to transmit. Remember in
previous tests the IIR indicated no pending bytes in the THR. I've
demonstrated how to get past this state with my TX "polling patch".  I ran
my latest test for 12+ hours where I was using your patch plus my polling
patch and there were no stalls whatsoever of the serial driver, as verified
by an Oscilloscope which triggers on a TX stall. The maximum inter-packet
jitter of my TX packet was also fairly low, at <= 450us. In my polling
patch, during a RX interrupt, the code redundantly checks the high level
transmit buffer to see if rt_16550_tx_fill() should be called. Sure, this
workaround only helps when you have full-duplex communications, it would
not help during simplex communications.

Since a device driver can't be reliably polled, I'd prefer some
self-correcting mechanism in the driver which set a callback when it thinks
it has transmitted the last byte, and wakes up and checks one more time
about 100us later to see if it needs to transmit anything else.

> Also, I made the /.rx_timeout/.tx_timeout /change Jeff found, and it had
> the
> > obvious effect. I can make a patch for xeno 2.6.5 if you want. But I'll
> point
> > out that this fix may break peoples code functionally, so it may be a
> bad idea
> > to fix it on 2.x. Older code was written with a dependence on a truly
> different
> > timeout. It broke my app to fix this because there was suddenly a new
> unexpected
> > timeout. What's your policy on this issue?
>
> The 2.6 repo won't be touched anymore, it's officially dead. If course,
> you can
> share your patch on the list in case there are other remaining users.
>

Oh your fine work in 2.6 is very much alive!
But I can agree that adding fixes to it is not appropriate.

-C Smith

Reply via email to