Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

2021-05-28 Thread Richard Cochran
On Fri, May 28, 2021 at 09:26:07AM +, Amar Subramanyam via Linuxptp-devel 
wrote:
> We could see that ptp4l already has RTNL functionality and hence this 
> port_renew_transport() seems to be unnecessary redundancy.

The link notifications from the kernel are best effort and are not
guaranteed.  So renewing the transport is the safe thing to do.

Thanks,
Richard


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.

2021-05-28 Thread Amar Subramanyam via Linuxptp-devel
Hi Miroslav,

Please find our response inline.

Thanks,
Amar B S

> -Original Message-
> From: Miroslav Lichvar [mailto:mlich...@redhat.com]
> Sent: 27 May 2021 15:14
> To: Amar Subramanyam 
> Cc: linuxptp-devel@lists.sourceforge.net
> Subject: Re: [PATCH v2 3/3] port: Don't check timestamps from non-slave ports.
> 
> CAUTION: This email originated from outside of Altiostar. Do not click on 
> links
> or open attachments unless you recognize the sender and you are sure the
> content is safe. You will never be asked to reset your Altiostar password via
> email.
> 
> 
> On Wed, May 26, 2021 at 05:45:29PM +, Amar Subramanyam wrote:
> > Hi Miroslav,
> >
> > We were able to reproduce the issue even without running phc2sys.
> > Please find the attached strace and ptp4l logs when this issue is seen.
> 
> Ok, thanks. That's very helpful.
> 
> > From the ptp4l log, we can see that BMCA took 148 msec to run(Mono Interval
> :: 148445967).
> > The same can be observed from strace logs. In the attached strace log, BMCA
> is executed between the timestamps 00:07:27.047830 (line number 53) to
> 00:07:27.196275(line number 102).
> 
> I don't see that in the log.
> 
> > 00:07:27.142942 close(4)= 0
> > 00:07:27.153733 close(15)   = 0
> > 00:07:27.167783 socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
> > 00:07:27.167829 ioctl(4, SIOCGIFHWADDR, {ifr_name="sriov0",
> ifr_hwaddr={sa_family=ARPHRD_ETHER, sa_data=64:4c:36:12:55:e0}}) = 0
> > 00:07:27.167878 close(4)= 0
> > 00:07:27.167964 socket(AF_PACKET, SOCK_RAW, 768) = 4
> > 00:07:27.168023 ioctl(4, SIOCGIFINDEX, {ifr_name="sriov0", }) = 0
> > 00:07:27.168095 bind(4, {sa_family=AF_PACKET,
> > sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("sriov0"),
> > sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) =
> > 0
> > 00:07:27.178781 setsockopt(4, SOL_SOCKET, SO_BINDTODEVICE, "sriov0",
> > 6) = 0
> > 00:07:27.178830 setsockopt(4, SOL_SOCKET, SO_ATTACH_FILTER, {len=12,
> > filter=0x635ae0}, 16) = 0
> > 00:07:27.178875 setsockopt(4, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
> > {mr_ifindex=if_nametoindex("sriov0"), mr_type=PACKET_MR_MULTICAST,
> > mr_alen=6, mr_address=011b1900}, 16) = 0
> > 00:07:27.179151 setsockopt(4, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
> > {mr_ifindex=if_nametoindex("sriov0"), mr_type=PACKET_MR_MULTICAST,
> > mr_alen=6, mr_address=0180c20e}, 16) = 0
> > 00:07:27.179415 socket(AF_PACKET, SOCK_RAW, 768) = 15
> > 00:07:27.179482 ioctl(15, SIOCGIFINDEX, {ifr_name="sriov0", }) = 0
> > 00:07:27.179518 bind(15, {sa_family=AF_PACKET,
> > sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("sriov0"),
> > sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) =
> > 0
> > 00:07:27.193807 setsockopt(15, SOL_SOCKET, SO_BINDTODEVICE, "sriov0",
> > 6) = 0
> 
> What I see is that it's the closing and binding of the raw sockets that is 
> slowing
> down ptp4l so much that a received message waits for up to ~40 milliseconds
> before the clock check gets its timestamp.
> 
> ptp4l is renewing the transport on announce/sync timeout, which according to
> the git log is needed in the client-only mode to not get multicast sockets 
> stuck
> when the link goes down. I think the fix here is to avoid renewing the raw
> transport. It shouldn't be necessary if I understand it correctly.
> 
> Can you please verify that the following change fixes the problem for you?
> 
> --- a/port.c
> +++ b/port.c
> @@ -1806,6 +1806,12 @@ static int port_renew_transport(struct port *p)
> if (!port_is_enabled(p)) {
> return 0;
> }
> +
> +   /* Closing and binding of raw sockets is too slow and unnecessary */
> +   if (transport_type(p->trp) == TRANS_IEEE_802_3) {
> +   return 0;
> +   }
> +
> transport_close(p->trp, &p->fda);
> port_clear_fda(p, FD_FIRST_TIMER);
> res = transport_open(p->trp, p->iface, &p->fda, p->timestamping);
> 

Yes, With this patch applied, the problem seems to be fixed. We even did a 
cable removal test and link status was correctly detected.
We could see that ptp4l already has RTNL functionality and hence this 
port_renew_transport() seems to be unnecessary redundancy.
Should bypassing the function entirely not just for 802_3 but also for UDPv4 
and UDPv6 transports be the right solution?
Let us know your comments.

> --
> Miroslav Lichvar



___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel