Re: [Linuxptp-devel] tx_timestamp_timeout default

2021-07-07 Thread Hal Murray
jacob.e.kel...@intel.com said:
> We get into the kthread function within a few hundred usec or less, and then
> the firmware read takes a long time, sometimes over 2 milliseconds.

Why is it taking so long?

How long does it take when things go well?  Is there anything complicated 
going on with a "firmware read"?

Is there a software lock, maybe because some other code is using related 
hardware resources?  Is the thread not getting scheduled?

---

My first try was rejected by the list software because I sent it from the 
wrong address.


-- 
These are my opinions.  I hate spam.





___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] [PATCH] Increase the default tx_timestamp_timeout to 5

2021-07-07 Thread Eric Decker
If the timestamp is available in less than the timeout (5ms) will it still wait 
for the timeout, or continue processing after the timestamp is received?

Eric


-Original Message-
From: Jacob Keller  
Sent: Wednesday, July 7, 2021 8:02 PM
To: linuxptp-devel@lists.sourceforge.net
Subject: [Linuxptp-devel] [PATCH] Increase the default tx_timestamp_timeout to 5

The tx_timestamp_timeout configuration defines the number of milliseconds to 
wait for a Tx timestamp from the kernel stack. This delay is necessary as Tx 
timestamps are captured after a packet is sent and reported back via the socket 
error queue.

The current default is to poll for up to 1 millisecond. In practice, it turns 
out that this is not always enough time for hardware and software to capture 
the timestamp and report it back. Some hardware designs require reading 
timestamps over registers or other slow mechanisms.

This extra delay results in the timestamp not being sent back to userspace 
within the default 1 millisecond polling time. If that occurs the following can 
be seen from ptp4l:

  ptp4l[4756.840]: timed out while polling for tx timestamp
  ptp4l[4756.840]: increasing tx_timestamp_timeout may correct this issue,
   but it is likely caused by a driver bug
  ptp4l[4756.840]: port 1 (p2p1): send sync failed
  ptp4l[4756.840]: port 1 (p2p1): MASTER to FAULTY on FAULT_DETECTED
   (FT_UNSPECIFIED)

This can confuse users because it implies this is a bug, when the correct 
solution in many cases is to just increase the timeout to a slightly higher 
value.

Since we know this is a problem for many drivers and hardware designs, lets 
increase the default timeout. I chose 5 since it is a large enough increase to 
avoid the issues on test systems I have. We do want to keep this timeout small 
because it prevents ptp4l from doing any other processing while we wait for the 
timestamp.

An alternative solution would be to refactor ptp4l so that it does not stop and 
wait for a Tx timestamp, but instead handles the timestamps asynchronously. 
While this could be done, it adds significant complexity to the application 
with minimal or no gain. In most cases, hardware is only capable of a single 
outstanding timestamp at a time, so we cannot send another packet anyways until 
the first one has completed.

Signed-off-by: Jacob Keller 
---
 config.c| 2 +-
 configs/default.cfg | 2 +-
 ptp4l.8 | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/config.c b/config.c
index 4472d3d9d6f9..f33f177c696a 100644
--- a/config.c
+++ b/config.c
@@ -323,7 +323,7 @@ struct config_item config_tab[] = {
GLOB_ITEM_INT("ts2phc.pulsewidth", 5, 100, 99900),
PORT_ITEM_ENU("tsproc_mode", TSPROC_FILTER, tsproc_enu),
GLOB_ITEM_INT("twoStepFlag", 1, 0, 1),
-   GLOB_ITEM_INT("tx_timestamp_timeout", 1, 1, INT_MAX),
+   GLOB_ITEM_INT("tx_timestamp_timeout", 5, 1, INT_MAX),
PORT_ITEM_INT("udp_ttl", 1, 1, 255),
PORT_ITEM_INT("udp6_scope", 0x0E, 0x00, 0x0F),
GLOB_ITEM_STR("uds_address", "/var/run/ptp4l"), diff --git 
a/configs/default.cfg b/configs/default.cfg index 64ef3bd7c81d..5e9444da57ee 
100644
--- a/configs/default.cfg
+++ b/configs/default.cfg
@@ -51,7 +51,7 @@ hybrid_e2e0
 inhibit_multicast_service  0
 net_sync_monitor   0
 tc_spanning_tree   0
-tx_timestamp_timeout   1
+tx_timestamp_timeout   5
 unicast_listen 0
 unicast_master_table   0
 unicast_req_duration   3600
diff --git a/ptp4l.8 b/ptp4l.8
index fe9e1502231c..024fad3d19b2 100644
--- a/ptp4l.8
+++ b/ptp4l.8
@@ -496,7 +496,7 @@ switches all implement this option together with the BMCA.
 .B tx_timestamp_timeout
 The number of milliseconds to poll waiting for the tx time stamp from the 
kernel  when a message has recently been sent.
-The default is 1.
+The default is 5.
 .TP
 .B check_fup_sync
 Because of packet reordering that can occur in the network, in the
--
2.31.1.331.gb0c09ab8796f



___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] [PATCH] Increase the default tx_timestamp_timeout to 5

2021-07-07 Thread Richard Cochran
On Wed, Jul 07, 2021 at 05:02:21PM -0700, Jacob Keller wrote:
> diff --git a/config.c b/config.c
> index 4472d3d9d6f9..f33f177c696a 100644
> --- a/config.c
> +++ b/config.c
> @@ -323,7 +323,7 @@ struct config_item config_tab[] = {
>   GLOB_ITEM_INT("ts2phc.pulsewidth", 5, 100, 99900),
>   PORT_ITEM_ENU("tsproc_mode", TSPROC_FILTER, tsproc_enu),
>   GLOB_ITEM_INT("twoStepFlag", 1, 0, 1),
> - GLOB_ITEM_INT("tx_timestamp_timeout", 1, 1, INT_MAX),
> + GLOB_ITEM_INT("tx_timestamp_timeout", 5, 1, INT_MAX),

Let's make it 10 ms.

Thanks,
Richard


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] tx_timestamp_timeout default

2021-07-07 Thread Richard Cochran
On Wed, Jul 07, 2021 at 11:46:16PM +, Keller, Jacob E wrote:

> Either way, I found that whether I used a kthread or not I was
> unable to avoid the timeout issue with ice hardware because the
> delay is caused by the method we must use to access the Tx
> timestamps :( We get into the kthread function within a few hundred
> usec or less, and then the firmware read takes a long time,
> sometimes over 2 milliseconds.

Yes, but when using "work" that 2 ms can easily become 200 ms or more
on a busy RT system.  The work is sched_other, and as such it is
effectively running at the lowest priority WRT the sched_fifo tasks.

Thanks,
Richard


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


[Linuxptp-devel] [PATCH] Increase the default tx_timestamp_timeout to 5

2021-07-07 Thread Jacob Keller
The tx_timestamp_timeout configuration defines the number of
milliseconds to wait for a Tx timestamp from the kernel stack. This
delay is necessary as Tx timestamps are captured after a packet is sent
and reported back via the socket error queue.

The current default is to poll for up to 1 millisecond. In practice, it
turns out that this is not always enough time for hardware and software
to capture the timestamp and report it back. Some hardware designs
require reading timestamps over registers or other slow mechanisms.

This extra delay results in the timestamp not being sent back to
userspace within the default 1 millisecond polling time. If that occurs
the following can be seen from ptp4l:

  ptp4l[4756.840]: timed out while polling for tx timestamp
  ptp4l[4756.840]: increasing tx_timestamp_timeout may correct this issue,
   but it is likely caused by a driver bug
  ptp4l[4756.840]: port 1 (p2p1): send sync failed
  ptp4l[4756.840]: port 1 (p2p1): MASTER to FAULTY on FAULT_DETECTED
   (FT_UNSPECIFIED)

This can confuse users because it implies this is a bug, when the
correct solution in many cases is to just increase the timeout to
a slightly higher value.

Since we know this is a problem for many drivers and hardware designs,
lets increase the default timeout. I chose 5 since it is a large enough
increase to avoid the issues on test systems I have. We do want to keep
this timeout small because it prevents ptp4l from doing any other
processing while we wait for the timestamp.

An alternative solution would be to refactor ptp4l so that it does not
stop and wait for a Tx timestamp, but instead handles the timestamps
asynchronously. While this could be done, it adds significant complexity
to the application with minimal or no gain. In most cases, hardware is
only capable of a single outstanding timestamp at a time, so we cannot
send another packet anyways until the first one has completed.

Signed-off-by: Jacob Keller 
---
 config.c| 2 +-
 configs/default.cfg | 2 +-
 ptp4l.8 | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/config.c b/config.c
index 4472d3d9d6f9..f33f177c696a 100644
--- a/config.c
+++ b/config.c
@@ -323,7 +323,7 @@ struct config_item config_tab[] = {
GLOB_ITEM_INT("ts2phc.pulsewidth", 5, 100, 99900),
PORT_ITEM_ENU("tsproc_mode", TSPROC_FILTER, tsproc_enu),
GLOB_ITEM_INT("twoStepFlag", 1, 0, 1),
-   GLOB_ITEM_INT("tx_timestamp_timeout", 1, 1, INT_MAX),
+   GLOB_ITEM_INT("tx_timestamp_timeout", 5, 1, INT_MAX),
PORT_ITEM_INT("udp_ttl", 1, 1, 255),
PORT_ITEM_INT("udp6_scope", 0x0E, 0x00, 0x0F),
GLOB_ITEM_STR("uds_address", "/var/run/ptp4l"),
diff --git a/configs/default.cfg b/configs/default.cfg
index 64ef3bd7c81d..5e9444da57ee 100644
--- a/configs/default.cfg
+++ b/configs/default.cfg
@@ -51,7 +51,7 @@ hybrid_e2e0
 inhibit_multicast_service  0
 net_sync_monitor   0
 tc_spanning_tree   0
-tx_timestamp_timeout   1
+tx_timestamp_timeout   5
 unicast_listen 0
 unicast_master_table   0
 unicast_req_duration   3600
diff --git a/ptp4l.8 b/ptp4l.8
index fe9e1502231c..024fad3d19b2 100644
--- a/ptp4l.8
+++ b/ptp4l.8
@@ -496,7 +496,7 @@ switches all implement this option together with the BMCA.
 .B tx_timestamp_timeout
 The number of milliseconds to poll waiting for the tx time stamp from the 
kernel
 when a message has recently been sent.
-The default is 1.
+The default is 5.
 .TP
 .B check_fup_sync
 Because of packet reordering that can occur in the network, in the
-- 
2.31.1.331.gb0c09ab8796f



___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] tx_timestamp_timeout default

2021-07-07 Thread Keller, Jacob E



> -Original Message-
> From: Richard Cochran 
> Sent: Wednesday, July 07, 2021 4:29 PM
> To: Keller, Jacob E 
> Cc: linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] tx_timestamp_timeout default
> 
> On Wed, Jul 07, 2021 at 10:22:59PM +, Keller, Jacob E wrote:
> > I am wondering if there would be support for either (a) increasing
> > the default timeout, or (b) adding something to the PTP get
> > capabilities for indicating a sort of default timeout for the
> > device, and if it's not set in the config then the default timeout
> > is used?
> 
> I wouldn't mind increasing the default.
> 
> I doubt capabilities would help, because much depends on the system
> being used.  We really should replace "work" with the kthread in the
> drivers, and then tell people to give the kthreads sched_fifo using
> chrt.
> 
> Thanks,
> Richard

I implemented this as a kthread in the ice driver, and I am hoping to get some 
time to fix the other Intel drivers.

Note that ice did not use the ptp do_aux_work kthread because of needing to 
handle multiple ports simultaneously across multiple PFs (the clock is shared 
across all PFs, so they don't all have access to the do_aux_work function which 
is only controlled by a single PF which allocated the clock).

Either way, I found that whether I used a kthread or not I was unable to avoid 
the timeout issue with ice hardware because the delay is caused by the method 
we must use to access the Tx timestamps :( We get into the kthread function 
within a few hundred usec or less, and then the firmware read takes a long 
time, sometimes over 2 milliseconds.

Ok. I'll propose a patch to increase the default.

Thanks,
Jake


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] tx_timestamp_timeout default

2021-07-07 Thread Richard Cochran
On Wed, Jul 07, 2021 at 10:22:59PM +, Keller, Jacob E wrote:
> I am wondering if there would be support for either (a) increasing
> the default timeout, or (b) adding something to the PTP get
> capabilities for indicating a sort of default timeout for the
> device, and if it's not set in the config then the default timeout
> is used?

I wouldn't mind increasing the default.

I doubt capabilities would help, because much depends on the system
being used.  We really should replace "work" with the kthread in the
drivers, and then tell people to give the kthreads sched_fifo using
chrt.

Thanks,
Richard


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


[Linuxptp-devel] tx_timestamp_timeout default

2021-07-07 Thread Keller, Jacob E
Hi,

I've been working on implementing support for PTP in the E810 and E822 devices 
for Intel's ice driver. As part of this work we've discovered that (again...) 
timestamps can sometimes take longer than the 1millisecond default delay.

Although we get an interrupt within 10usec, the timestamp has to be retrieved 
from a PHY register, and this access is unfortunately mediated by firmware. The 
result is that while we can sometimes recover the timestamp within the 1000usec 
default delay, we sometimes need as much as 2000 or 2500 usec.

We've been recommending that people default to using tx_timestamp_timeout of 5 
or 10 to avoid issues.

I am wondering if there would be support for either (a) increasing the default 
timeout, or (b) adding something to the PTP get capabilities for indicating a 
sort of default timeout for the device, and if it's not set in the config then 
the default timeout is used?

The reason for this need is because it is confusing for users to be immediately 
prompted with "this is likely a driver bug" messaging. We know that the only 
resolution is to wait slightly longer. This is also not the first time we've 
had issues with the 1msec delay being too short, I believe we've often had to 
recommend increasing the delay for other Intel parts in the past for similar 
reasons.

Note I tried a number of changes to the driver implementation in order to see 
if I could reduce the delay, and although I was able to improve it, I was never 
able to get it consistently below 1000usec, as we're effectively limited by the 
need to read the register access over the firmware :(

Thanks,
Jake
___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel