Re: [Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-21 Thread Richard Cochran
On Wed, Feb 21, 2018 at 04:26:06PM +0100, Oliver Westermann wrote:
> I got curious after your last messages and again dove deeper into the
> kernel and drivers and tried to find out what happens when.
> Just FYI: I inherited the code, so feel free to criticize and suggest
> improvements where needed ;-)

We can't really comment on invisible code, so my first criticism is,
post the driver to linux netdev as a RFC.

Thanks,
Richard

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-21 Thread Oliver Westermann
2018-02-16 17:51 GMT+01:00 Keller, Jacob E :
>
> Even the interrupt itself takes up to 12 ms sometimes?
>

I got curious after your last messages and again dove deeper into the
kernel and drivers and tried to find out what happens when.
Just FYI: I inherited the code, so feel free to criticize and suggest
improvements where needed ;-) Also I'm quite new to anything on linux
kernel level so forgive me if I mess up somewhere.

My Setup was again two devices, I've created extra CPU Load AND Network
Load to increase the chance of big delays and I got some.
Here is a small extract from a kernel trace including trace messages from
ptp4l (hoping gmail does properly send the formatting):
# tracer: nop
#
# entries-in-buffer/entries-written: 6694/6694   #P:2
#
#  _-=> irqs-off
# / _=> need-resched
#| / _---=> hardirq/softirq
#|| / _--=> preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
   ptp4l-587   [000] ...1  2799.157884: tracing_mark_write: send
delay_req
   ptp4l-587   [000] ...1  2799.157900: tracing_mark_write:
transport_send() seq id: 31495/1915
   ptp4l-587   [000] d.H2  2799.158000: phy_interrupt:
phy_interrupt "handled"
   ptp4l-587   [000] ...1  2799.158048: tracing_mark_write: enter
sk_receive
 kworker/0:1-1710  [000]   2799.162773: add_event.constprop.9:
add_event id 1915
 kworker/0:1-1710  [000]   2799.170782: clock_find_ts.part.0:
matched id 1915
   ptp4l-587   [000] ...1  2799.170791: tracing_mark_write: poll
worked

What I now notice is that the interrupt in fact is not delayed (takes
~0.1ms).
But by comments in the phy_interrupt() function (in phy.c) of the linux
kernel thats not the place to handle the interrupt, so it schedules a work
item in a work queue.

This work queue is handled by the phy_change() function (in phy.c), which
in a subfunction retrieves the timestamp from the phy (this is the
add_event call from the trace above). This is delayed by 4ms.

IMHO, this looks like proper design of this driver. The actual interrupt
handler is short and according to a comment in the function phy_interrupt ()
"The MDIO bus is not allowed to be written in interrupt context", so I
cant't read the timestamps there.

The package send also sends a skb into the txtstamp function, which does
set the SKBTX_IN_PROGRESS flag as described by Jake. As the package is not
send out yet, is queues the skb in a workqueue which is part of the phy
driver.
The handler of this work queue matches the skbs to the timestamps queued by
the add_event() function and sends them back up to ptp4l. This handler
shows itself above by the " clock_find_ts" line and is as well delayed by
another 8ms, resulting in a 12 ms delay from sending to skb returning.

I do understand that this is an issue due to two scheduled workers, so what
would be a proper design of the "second stage" of the timestamp readout
process? And is there something like a good, readable phy driver I can look
into for this?

Best regards, Olli
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-21 Thread Richard Cochran
On Wed, Feb 21, 2018 at 04:26:06PM +0100, Oliver Westermann wrote:
> This work queue is handled by the phy_change() function (in phy.c), which
> in a subfunction retrieves the timestamp from the phy (this is the
> add_event call from the trace above). This is delayed by 4ms.

That code in phy.c is for the phy state machine.  It is not time
critical, and so using a work item is appropriate.

However your packet time stamps *are* time critical.  I wouldn't put
the time stamp code there.  You can't really increase the scheduling
priority of work queues.

We have a ready made kworker in ptp_clock.c.

ptp->kworker = kthread_create_worker(0, worker_name ?
 worker_name : info->name);

You can use that, and then set the priority of the kthread using chrt.

HTH,
Richard

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-18 Thread Keller, Jacob E
Hi,

> -Original Message-
> From: Oliver Westermann [mailto:owesterm...@gmail.com]
> Sent: Friday, February 16, 2018 1:20 AM
> To: Richard Cochran <richardcoch...@gmail.com>
> Cc: linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] Handling missing hardware timestamps on
> embedded systems
> 
> I may have run into a design constraint which I understood now, hopefully.
> 
> The issue were to strict timeouts that sometimes ran into an issue.
> The current process is like this:
> 
> - ptp4l sends a DELAY_REQ package into the network stack and sends a SKB to
> retrieve the timestamp
> - when the PHY handles it, it saves the timestamp and generates an interrupt
> - the interrupt handler of my PHY reads seq_id and timestamp and places an
> event with both into a queue
> - the PHY worker fetches the SKBs and searches for matching events in the
> queue
> 


So you're likely adding increased delays and potential pitfalls for a couple 
reasons here:

When ptp4l sends a delay request, it indicates an SKB which sets the 
SKBTX_HW_TSTAMP bit.

Your driver responds to this by setting the SKB_TX_IN_PROGRESS bit which 
indicates that you will be doing a Tx timestamp. You should only set this bit 
if you are capable of generating a timestamp. If your hardware can only 
timestamp one packet at a time, then you should be sure to never set this again 
until the outstanding request is finished, and it is expected that most 
hardware only supports one outstanding request at once. That's one pitfall you 
should be aware of.

Next, you have an interrupt which the PHY signals that a timestamp is ready, 
followed by a separate worker thread to handle this?

The problem with an extra worker thread here, is that it will potentially be 
hit by unknown delays due to scheduling. If possible you may be able to handle 
the work directly in the interrupt thread. Especially easy if you only support 
a single outstanding timestamp request, since there wouldn't need to be a queue 
at all.

> I added some tracing into ptp4l and my phy driver and measured the timings 
> over
> some minutes of PTP (with additional load on the network interface, additional
> load on cpus, all timeouts set to 100ms, 2170 measurements):
> 
> send to interrupt, mean/standard deviation/max in ms: 2,97 / 1,48 / 12,85
> 
> interrupt to match: 0,55 / 0,16 / 6,86
> send to match: 3,52 / 1,62 / 13,21
> 
> From that perspective my tx_timestamp_timeout of 10ms was to short. I would
> raise it to 18ms (~50% extra margin).
>

So if I understand this correctly, the mean time to match is 3.5 seconds, but 
there's occasional stalls?

Even the interrupt itself takes up to 12 ms sometimes?

It's quite possible that's just time until you service the interrupt, which 
could be impacted by a number of things.

A number of other drivers forgo using interrupts, and instead poll using the 
"do_work" ptp kernel thread, which is likely more resillient against delays 
compared to something such as a workqueue item.
 
> Is there a reason not to do this?

You can change the configuration to fit what you need, but I suspect this can 
be fixed by improving the driver.

Thanks,
Jake

> 
> Thanks, Olli


 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-16 Thread Oliver Westermann
2018-02-13 19:36 GMT+01:00 Richard Cochran :

> On Tue, Feb 13, 2018 at 07:01:22PM +0100, Oliver Westermann wrote:
> > The issue is that the PHY only has two slots for timestamps, one for
> > outgoing packages and one for incoming packages. If the device in
> question
> > is a ptp master and has multiple slave, it sometimes happen that both
> > DELAY_REQ packages come in short succession, which results in the PHY
> > driver/interrupt handler being to slow to get all timestamps.
>
> This is actually a bug in your driver.  If an incoming delay request
> misses a time stamp, no fault is generated.  The *transmit* time stamp
> is still available, but your driver fails to fetch it.
>

Thank you for your hints, you pointed me in the right direction.


> This should not happen.  You should fix your driver so that it always
> fetches the Tx time stamp after sending a marked packet.  No interrupt
> is necessary.
>

I may have run into a design constraint which I understood now, hopefully.

The issue were to strict timeouts that sometimes ran into an issue.
The current process is like this:

- ptp4l sends a DELAY_REQ package into the network stack and sends a SKB to
retrieve the timestamp
- when the PHY handles it, it saves the timestamp and generates an interrupt
- the interrupt handler of my PHY reads seq_id and timestamp and places an
event with both into a queue
- the PHY worker fetches the SKBs and searches for matching events in the
queue

I added some tracing into ptp4l and my phy driver and measured the timings
over some minutes of PTP (with additional load on the network interface,
additional load on cpus, all timeouts set to 100ms, 2170 measurements):

send to interrupt, mean/standard deviation/max in ms: 2,97 / 1,48 / 12,85
interrupt to match: 0,55 / 0,16 / 6,86
send to match: 3,52 / 1,62 / 13,21

>From that perspective my tx_timestamp_timeout of 10ms was to short. I would
raise it to 18ms (~50% extra margin).

Is there a reason not to do this?

Thanks, Olli

2018-02-13 19:36 GMT+01:00 Richard Cochran :

> On Tue, Feb 13, 2018 at 07:01:22PM +0100, Oliver Westermann wrote:
> > The issue is that the PHY only has two slots for timestamps, one for
> > outgoing packages and one for incoming packages. If the device in
> question
> > is a ptp master and has multiple slave, it sometimes happen that both
> > DELAY_REQ packages come in short succession, which results in the PHY
> > driver/interrupt handler being to slow to get all timestamps.
>
> This is actually a bug in your driver.  If an incoming delay request
> misses a time stamp, no fault is generated.  The *transmit* time stamp
> is still available, but your driver fails to fetch it.
>
> > From a PTP perspective this is not a big issue: A single missing delay
> > measurement doesn't break a clock.
>
> Right.
>
> > But PTP4L threats this as a critical error, printing
> > "timed out while polling for tx timestamp",
>
> This should not happen.  You should fix your driver so that it always
> fetches the Tx time stamp after sending a marked packet.  No interrupt
> is necessary.
>
> Thanks,
> Richard
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-13 Thread Richard Cochran
On Tue, Feb 13, 2018 at 07:01:22PM +0100, Oliver Westermann wrote:
> The issue is that the PHY only has two slots for timestamps, one for
> outgoing packages and one for incoming packages. If the device in question
> is a ptp master and has multiple slave, it sometimes happen that both
> DELAY_REQ packages come in short succession, which results in the PHY
> driver/interrupt handler being to slow to get all timestamps.

This is actually a bug in your driver.  If an incoming delay request
misses a time stamp, no fault is generated.  The *transmit* time stamp
is still available, but your driver fails to fetch it.

> From a PTP perspective this is not a big issue: A single missing delay
> measurement doesn't break a clock.

Right.

> But PTP4L threats this as a critical error, printing
> "timed out while polling for tx timestamp",

This should not happen.  You should fix your driver so that it always
fetches the Tx time stamp after sending a marked packet.  No interrupt
is necessary.

Thanks,
Richard

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-13 Thread Richard Cochran
On Tue, Feb 13, 2018 at 07:01:22PM +0100, Oliver Westermann wrote:
> Is there a reason not to do this or a better idea to use linuxptp on
> systems with similar hardware constraints?

I have had such HW, and I usually just set "fault_reset_interval ASAP"
in the configuration.

> Are there design considerations within linuxptp that I should know of
> before implementing and submitting such a fix?

The vast majority of HW is not limited in this respect, and there are
no mainline drivers that support such HW.  My feeling is that the
linuxptp stack shouldn't be filled with special cases just for buggy
or deficient HW designs.

Thanks,
Richard

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


[Linuxptp-devel] Handling missing hardware timestamps on embedded systems

2018-02-13 Thread Oliver Westermann
 I've a setup of multiple devices with a ARM CPU using a Marvell PHY with
hardware timestamping capabilitys for networking.

These Marvell PHYs analyse incoming and outgoing packages for PTP packages,
saves a hardware timestamp and issues a interrupt.
The PHY driver catches the interrupt, gets the timestamp and places it in a
queue for ptp4l to get it via sk_receive().

The issue is that the PHY only has two slots for timestamps, one for
outgoing packages and one for incoming packages. If the device in question
is a ptp master and has multiple slave, it sometimes happen that both
DELAY_REQ packages come in short succession, which results in the PHY
driver/interrupt handler being to slow to get all timestamps.

>From a PTP perspective this is not a big issue: A single missing delay
measurement doesn't break a clock.
But PTP4L threats this as a critical error, printing
"timed out while polling for tx timestamp",
"increasing tx_timestamp_timeout may correct this issue, but it is likely
caused by a driver bug"
and going into the "PS_FAULTY" state.

My first idea for the fix is to add a EV_NONCRIT_FAULT_DETECTED event,
fired on such occasions.
This event would be counted in the fsm, reset by EV_NONE and resulting in
a PS_FAULTY after x appearances.

Is there a reason not to do this or a better idea to use linuxptp on
systems with similar hardware constraints?
Are there design considerations within linuxptp that I should know of
before implementing and submitting such a fix?
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel