Re: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets received on the error queues of sockets

2019-12-17 Thread Keller, Jacob E
> -Original Message-
> From: Vladimir Oltean 
> Sent: Tuesday, December 17, 2019 12:34 PM
> To: Keller, Jacob E 
> Cc: richardcoch...@gmail.com; linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets
> received on the error queues of sockets
> 
> On Tue, 17 Dec 2019 at 22:04, Keller, Jacob E  
> wrote:
> 
> > > -Original Message-
> > > From: Vladimir Oltean 
> > > Sent: Monday, December 16, 2019 3:11 PM
> > > To: richardcoch...@gmail.com
> > > Cc: linuxptp-devel@lists.sourceforge.net
> > > Subject: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets
> received
> > > on the error queues of sockets
> > > For messages that the application does not / no longer expects, such as
> > > TX timestamps delivered late, duplicate TX timestamps, general
> > > exceptional messages enqueued by the kernel in the socket error queue
> > > etc, ptp4l will be taken by surprise in clock_poll() by these, and will
> > > think that there is data, since POLLIN is set (but in fact in the socket
> > > error queue etc, ptp4l will be taken by surprise in clock_poll() by
> > > these, and will think that there is data, since POLLIN is set (but in
> > > fact POLLERR is also set, and this has an entirely different meaning).
> >
> > This double parenthetical is a bit hard to parse.
> >
> 
> Sorry, this is hard for me to understand too. The paragraph should have been:
> 
> For messages that the application does not / no longer expects, such as
> TX timestamps delivered late, duplicate TX timestamps, general
> exceptional messages enqueued by the kernel in the socket error queue
> etc, ptp4l will be taken by surprise in clock_poll() by these, and will
> think that there is data, since POLLIN is set (but in fact POLLERR is
> also set, and this has an entirely different meaning), and will attempt
> to dequeue them from the wrong queue, which is empty.
> 

That sounds better. It still feels a bit like a run-on sentence. Maybe 
something like:

For messages that the application does not / no longer expects, such as Tx 
timestamps delivered late, duplicate Tx timestamps, or general exceptional 
messages enqueued by the kernel in the socket error queue, ptp4l will be taken 
by surprise in clock_poll(). It will think there is data, since POLLIN is set. 
In fact, POLLERR is also set which gives an entirely different meaning. Because 
of this, ptp4l will attempt to dequeue messages from the wrong (empty) queue.

> 
> > >
> > > A very, very simple reproducer is to take a DSA switch and run:
> > >
> > > tcpdump -i eth0 -j adapter_unsynced
> > >
> > > on its DSA master net device. The above command will enable timestamping
> > > on that net device, and if both the DSA switch and the master support
> > > PTP, this will make the kernel send duplicate TX timestamps for every
> > > sent event packet, which will completely kill ptp4l until a reboot, with
> > > no indication whatsoever of what's going on.
> > >
> > > Since the messages on the error queue are unexpected, we have no need
> > > for them. And they can be in theory anything, so simply hex dumping
> > > their content and moving along sounds like a good idea.
> > >
> > > Printing them to the user is optional (and helpful), but reading them is
> > > not. With this patch, even with extraneous data delivered by a buggy
> > > kernel (which the application now loudly complains about), the
> > > synchronization keeps chugging along. Otherwise the application starts
> > > reordering packets in recvmsg() due to misinterpreting which socket
> > > queue has data available.
> > >
> > > Signed-off-by: Vladimir Oltean 
> > > ---
> > >  clock.c | 11 +++
> > >  msg.c   | 12 
> > >  msg.h   |  7 +++
> > >  3 files changed, 30 insertions(+)
> > >
> > > diff --git a/clock.c b/clock.c
> > > index 146576ac589c..768bbb49513d 100644
> > > --- a/clock.c
> > > +++ b/clock.c
> > > @@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c)
> > >   LIST_FOREACH(p, >ports, list) {
> > >   /* Let the ports handle their events. */
> > >   for (i = 0; i < N_POLLFD; i++) {
> > > + if (cur[i].revents & POLLERR) {
> > > + unsigned char pkt[1600];
> > > +
> > > + cnt = recv(cur[i].fd, pkt, sizeof(pkt),
> > > +  

Re: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets received on the error queues of sockets

2019-12-17 Thread Vladimir Oltean
On Tue, 17 Dec 2019 at 22:04, Keller, Jacob E  wrote:

> > -Original Message-
> > From: Vladimir Oltean 
> > Sent: Monday, December 16, 2019 3:11 PM
> > To: richardcoch...@gmail.com
> > Cc: linuxptp-devel@lists.sourceforge.net
> > Subject: [Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets 
> > received
> > on the error queues of sockets
> > For messages that the application does not / no longer expects, such as
> > TX timestamps delivered late, duplicate TX timestamps, general
> > exceptional messages enqueued by the kernel in the socket error queue
> > etc, ptp4l will be taken by surprise in clock_poll() by these, and will
> > think that there is data, since POLLIN is set (but in fact in the socket
> > error queue etc, ptp4l will be taken by surprise in clock_poll() by
> > these, and will think that there is data, since POLLIN is set (but in
> > fact POLLERR is also set, and this has an entirely different meaning).
>
> This double parenthetical is a bit hard to parse.
>

Sorry, this is hard for me to understand too. The paragraph should have been:

For messages that the application does not / no longer expects, such as
TX timestamps delivered late, duplicate TX timestamps, general
exceptional messages enqueued by the kernel in the socket error queue
etc, ptp4l will be taken by surprise in clock_poll() by these, and will
think that there is data, since POLLIN is set (but in fact POLLERR is
also set, and this has an entirely different meaning), and will attempt
to dequeue them from the wrong queue, which is empty.


> >
> > A very, very simple reproducer is to take a DSA switch and run:
> >
> > tcpdump -i eth0 -j adapter_unsynced
> >
> > on its DSA master net device. The above command will enable timestamping
> > on that net device, and if both the DSA switch and the master support
> > PTP, this will make the kernel send duplicate TX timestamps for every
> > sent event packet, which will completely kill ptp4l until a reboot, with
> > no indication whatsoever of what's going on.
> >
> > Since the messages on the error queue are unexpected, we have no need
> > for them. And they can be in theory anything, so simply hex dumping
> > their content and moving along sounds like a good idea.
> >
> > Printing them to the user is optional (and helpful), but reading them is
> > not. With this patch, even with extraneous data delivered by a buggy
> > kernel (which the application now loudly complains about), the
> > synchronization keeps chugging along. Otherwise the application starts
> > reordering packets in recvmsg() due to misinterpreting which socket
> > queue has data available.
> >
> > Signed-off-by: Vladimir Oltean 
> > ---
> >  clock.c | 11 +++
> >  msg.c   | 12 
> >  msg.h   |  7 +++
> >  3 files changed, 30 insertions(+)
> >
> > diff --git a/clock.c b/clock.c
> > index 146576ac589c..768bbb49513d 100644
> > --- a/clock.c
> > +++ b/clock.c
> > @@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c)
> >   LIST_FOREACH(p, >ports, list) {
> >   /* Let the ports handle their events. */
> >   for (i = 0; i < N_POLLFD; i++) {
> > + if (cur[i].revents & POLLERR) {
> > + unsigned char pkt[1600];
> > +
> > + cnt = recv(cur[i].fd, pkt, sizeof(pkt),
> > +MSG_ERRQUEUE);
> > + pr_err("Unexpected data on socket err 
> > queue:");
> > + msg_raw_dump(pkt, cnt);
> > +
> > + continue;
> > + }
> > +
>
> Does it makes sense to always print these at pr_err level? I suppose we 
> almost never expect messages besides Tx timestamps, so warning loudly is 
> better than only warning if you enable the higher debug printing.. However.. 
> with this change it almost feels like these are now ignored/handled properly 
> and thus not an issue so printing them at like pr_debug might make more sense?

For my particular case, this is dead code because of the kernel fix.
The other "[PATCH 3/3] port: Signal sync/follow-up mismatch events
loudly" is dead code because of this one.

For anybody else, I suppose that this is the beginning of the end of
what ptp4l will tell them anyway. So I'm not too concerned that it's
too loud. It isn't guarded by anything else, it is the other way
around. First you poll on the fd, then you recvmsg and process it to
see what it is.

>
> Hmm. I think I'm fine with this either way.
>
> Thanks,
> Jake

Thanks,
-Vladimir


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


[Linuxptp-devel] [PATCH 2/3] clock: Dump unexpected packets received on the error queues of sockets

2019-12-16 Thread Vladimir Oltean
In the current design of the SO_SELECT_ERR_QUEUE socket option (which is
enabled by sk_timestamping_init on the event fd), it is a bug to only
check revents & POLLIN, but not also POLLERR.

Normally the error queue events that the application expects (i.e. TX
timestamps) are received, within a given timeout, in-band with the
transmission of the timestampable message itself (for example in
raw_send).

For messages that the application does not / no longer expects, such as
TX timestamps delivered late, duplicate TX timestamps, general
exceptional messages enqueued by the kernel in the socket error queue
etc, ptp4l will be taken by surprise in clock_poll() by these, and will
think that there is data, since POLLIN is set (but in fact in the socket
error queue etc, ptp4l will be taken by surprise in clock_poll() by
these, and will think that there is data, since POLLIN is set (but in
fact POLLERR is also set, and this has an entirely different meaning).

A very, very simple reproducer is to take a DSA switch and run:

tcpdump -i eth0 -j adapter_unsynced

on its DSA master net device. The above command will enable timestamping
on that net device, and if both the DSA switch and the master support
PTP, this will make the kernel send duplicate TX timestamps for every
sent event packet, which will completely kill ptp4l until a reboot, with
no indication whatsoever of what's going on.

Since the messages on the error queue are unexpected, we have no need
for them. And they can be in theory anything, so simply hex dumping
their content and moving along sounds like a good idea.

Printing them to the user is optional (and helpful), but reading them is
not. With this patch, even with extraneous data delivered by a buggy
kernel (which the application now loudly complains about), the
synchronization keeps chugging along. Otherwise the application starts
reordering packets in recvmsg() due to misinterpreting which socket
queue has data available.

Signed-off-by: Vladimir Oltean 
---
 clock.c | 11 +++
 msg.c   | 12 
 msg.h   |  7 +++
 3 files changed, 30 insertions(+)

diff --git a/clock.c b/clock.c
index 146576ac589c..768bbb49513d 100644
--- a/clock.c
+++ b/clock.c
@@ -1508,6 +1508,17 @@ int clock_poll(struct clock *c)
LIST_FOREACH(p, >ports, list) {
/* Let the ports handle their events. */
for (i = 0; i < N_POLLFD; i++) {
+   if (cur[i].revents & POLLERR) {
+   unsigned char pkt[1600];
+
+   cnt = recv(cur[i].fd, pkt, sizeof(pkt),
+  MSG_ERRQUEUE);
+   pr_err("Unexpected data on socket err queue:");
+   msg_raw_dump(pkt, cnt);
+
+   continue;
+   }
+
if (cur[i].revents & (POLLIN|POLLPRI)) {
event = port_event(p, i);
if (EV_STATE_DECISION_EVENT == event) {
diff --git a/msg.c b/msg.c
index d1619d4973f1..aae6e141f2be 100644
--- a/msg.c
+++ b/msg.c
@@ -601,3 +601,15 @@ int msg_sots_missing(struct ptp_message *m)
}
return msg_sots_valid(m) ? 0 : 1;
 }
+
+void msg_raw_dump(unsigned char *pkt, int cnt)
+{
+   int k;
+
+   for (k = 0; k < cnt; k++) {
+   if (k % 16 == 0)
+   printf("\n%04x ", k);
+   printf("%02x ", pkt[k]);
+   }
+   printf("\n");
+}
diff --git a/msg.h b/msg.h
index c1294c23f6c3..e85fbb916aef 100644
--- a/msg.h
+++ b/msg.h
@@ -394,6 +394,13 @@ void msg_put(struct ptp_message *m);
  */
 int msg_sots_missing(struct ptp_message *m);
 
+/**
+ * Print a wireshark-compatible hex dump of a message buffer.
+ * @param pkt  Message buffer to print
+ * @param cnt  Length of message buffer
+ */
+void msg_raw_dump(unsigned char *pkt, int cnt);
+
 /**
  * Test whether a message has a valid SO_TIMESTAMPING time stamp.
  * @param m  Message to test.
-- 
2.17.1



___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel