Re: [Linuxptp-devel] PHC delay when calling clock_gettime
> -Original Message- > From: Thomas Behn [mailto:thomas.b...@meinberg.de] > Sent: Wednesday, December 05, 2018 10:01 PM > To: Keller, Jacob E ; > linuxptp-devel@lists.sourceforge.net > Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime > > Am 05.12.18 um 18:16 schrieb Keller, Jacob E: > > I've never heard of this symptom being reported before. > > > > My gut reaction is that this is caused by code in e1000e_read_systim, which > > reads > the SYSTIME multiple times. I am suspicious that multiple readings is somehow > impacting the clock time, which would give the results that you see above. > > > > You mentioned another board that didn't have this problem using the same > > driver? > > > > Thanks, > > Jake > > Yes, my HP Elitebook 830 G5 doesn't show this problem. > > > Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz > > 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) > I219-V (rev 21) > Subsystem: Hewlett-Packard Company Ethernet Connection (4) I219-V > Flags: bus master, fast devsel, latency 0, IRQ 123 > Memory at b640 (32-bit, non-prefetchable) [size=128K] > Capabilities: > Kernel driver in use: e1000e > Kernel modules: e1000e > > Linux Kernel 4.18.0-12-generic (Ubuntu 18.10) > > > Best Regards, > Thomas > Ok. The code I looked at in the reading of systime appears for all device versions (no hw type check there). However, it's possible maybe it's only triggering on that older hardware.. I'm not really sure how to verify that assumption easily though. Thanks, Jake ___ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Re: [Linuxptp-devel] PHC delay when calling clock_gettime
Dear gentlemen, I've been following your debate, and while it's a sin to comment off topic, it's difficult for me to hold back the following smirk: i82579V/LM brings back a recollection of a pretty famous bug discovered a few years ago, where PCI-e ASPM did not work properly, with the net result originally described as "low double digit per cent" of packet loss. Possibly originally reported for this particular chip... maybe because it was so ubiquitous: the i82579 is the external PHY chip of the LOM MAC integrated in the 6-series Intel south bridges (companion to SandyBridge CPU's). But actually the ASPM bug used to plague several Intel gigabit adaptor models of that era. I can't seem to find the bugzilla entry or mailing list thread... but I'm pretty sure someone close to the Linux kernel development finally nailed the bug after several months of its first report, and patched it in the kernel by disabling ASPM in the Intel NIC driver. Not sure which one it was, e1000e or igb or what, and what kernel version. A month or so later, this also got "fixed" in the Windows driver by Intel. I surely agree that a problem waking up the PCI-e lane from shallow sleep doesn't sound like something that would freeze the on-chip PHC for a fixed time quantum every time it gets asked :-) especially if the workaround is just to disable ASPM altogether. While googling in vain for traces of that bug, I've found a *different* bug report related to the i82579: https://bugzilla.redhat.com/show_bug.cgi?id=713315 Again... I'm not saying that this is necessarily related. Just that the 82579 probably had its share of issues ;-) that possibly got worked around in drivers. So it seems that I superficially agree with Mr. Keller on that one... (P.S.: not mentioning the "occasional cfg EEPROM invalidation issue", which seems totally unrelated and generic across the Intel NIC product line.) Don't get me wrong - I'm a great fan of Intel x86 silicon, including the accompanying chipsets and peripheral stuff. Feels like home to me. Bugs get fixed. I've seen worse elsewhere. Frank Rysanek ___ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Re: [Linuxptp-devel] PHC delay when calling clock_gettime
Am 05.12.18 um 18:16 schrieb Keller, Jacob E: I've never heard of this symptom being reported before. My gut reaction is that this is caused by code in e1000e_read_systim, which reads the SYSTIME multiple times. I am suspicious that multiple readings is somehow impacting the clock time, which would give the results that you see above. You mentioned another board that didn't have this problem using the same driver? Thanks, Jake Yes, my HP Elitebook 830 G5 doesn't show this problem. Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-V (rev 21) Subsystem: Hewlett-Packard Company Ethernet Connection (4) I219-V Flags: bus master, fast devsel, latency 0, IRQ 123 Memory at b640 (32-bit, non-prefetchable) [size=128K] Capabilities: Kernel driver in use: e1000e Kernel modules: e1000e Linux Kernel 4.18.0-12-generic (Ubuntu 18.10) Best Regards, Thomas ___ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Re: [Linuxptp-devel] PHC delay when calling clock_gettime
> -Original Message- > From: Thomas Behn [mailto:thomas.b...@meinberg.de] > Sent: Tuesday, December 04, 2018 10:04 PM > To: Keller, Jacob E ; > linuxptp-devel@lists.sourceforge.net > Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime > > Am 04.12.18 um 19:54 schrieb Keller, Jacob E: > >> -Original Message- > >> From: Thomas Behn [mailto:thomas.b...@meinberg.de] > >> Sent: Tuesday, December 04, 2018 12:46 AM > >> To: Keller, Jacob E ; linuxptp- > de...@lists.sourceforge.net > >> Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime > > Ah. Ok. From the sound of the original statement it sounded like you were > developing your own hardware. > > > > I suspect that there is a work around in the 82579V NIC that causes this, > > but I don't > know offhand. I'll have to go dig a bit, the e1000e driver isn't one I worked > on much > myself. > > > > Thanks, > > Jake > > Thanks again for your help! > > In case this really is caused by a work around in the 82579V, are there > other NICs, which have similar impairments? > I am asking, because I want to make my implementation as robust as > possible for all kinds of hardware. > > Thanks, > Thomas > I've never heard of this symptom being reported before. My gut reaction is that this is caused by code in e1000e_read_systim, which reads the SYSTIME multiple times. I am suspicious that multiple readings is somehow impacting the clock time, which would give the results that you see above. You mentioned another board that didn't have this problem using the same driver? Thanks, Jake ___ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Re: [Linuxptp-devel] PHC delay when calling clock_gettime
Am 04.12.18 um 19:54 schrieb Keller, Jacob E: -Original Message- From: Thomas Behn [mailto:thomas.b...@meinberg.de] Sent: Tuesday, December 04, 2018 12:46 AM To: Keller, Jacob E ; linuxptp-devel@lists.sourceforge.net Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime Ah. Ok. From the sound of the original statement it sounded like you were developing your own hardware. I suspect that there is a work around in the 82579V NIC that causes this, but I don't know offhand. I'll have to go dig a bit, the e1000e driver isn't one I worked on much myself. Thanks, Jake Thanks again for your help! In case this really is caused by a work around in the 82579V, are there other NICs, which have similar impairments? I am asking, because I want to make my implementation as robust as possible for all kinds of hardware. Thanks, Thomas ___ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Re: [Linuxptp-devel] PHC delay when calling clock_gettime
> -Original Message- > From: Thomas Behn [mailto:thomas.b...@meinberg.de] > Sent: Tuesday, December 04, 2018 12:46 AM > To: Keller, Jacob E ; > linuxptp-devel@lists.sourceforge.net > Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime > > That seems like a bug in either the driver implementation, or in the > > hardware. > Might be. I am not using own hardware, though. I saw this behaviour on a > desktop PC running Linux Mint 19 (Kernel 4.15.0-38-generic) with onboard > Intel 82579V (rev 05) NIC as well as on a similar system running > openSUSE Leap 42.1 (Kernel 4.1.39-56-default) with the same NIC. Maybe > this is a problem with that specific ethernet controller? > > I just found out, that I am not seeing it on my HP notebook running > Ubuntu 18.10 (Kernel 4.18.0-12-generic) with Intel I219-V (rev 21) NIC. > All of the NICs are using the Intel e1000e driver. Ah. Ok. From the sound of the original statement it sounded like you were developing your own hardware. I suspect that there is a work around in the 82579V NIC that causes this, but I don't know offhand. I'll have to go dig a bit, the e1000e driver isn't one I worked on much myself. Thanks, Jake ___ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel
Re: [Linuxptp-devel] PHC delay when calling clock_gettime
Am 03.12.18 um 22:41 schrieb Keller, Jacob E: -Original Message- From: Thomas Behn [mailto:thomas.b...@meinberg.de] Sent: Monday, December 03, 2018 11:31 AM To: linuxptp-devel@lists.sourceforge.net Subject: [Linuxptp-devel] PHC delay when calling clock_gettime Hi everyone, While developing a PTP slave using the Linux PHC, I noticed, that everytime I call clock_gettime with the PHC ID, the time on the PHC jumps or at least changes by around 200-300ns. I noticed this, because I have implemented an algorithm reading the PHC time ten times in a row to find a best possible assumption of the offset to the system time (as done in Ohly's timecompare). The second before the ten readings, my measured PTP offset is a few ns, while directly after the readings it is around 2.5us. The offset is stable below 200ns when not reading the PHC and leaving the system time unsynchronized. Just for further information, this is an excerpt of the linuxptp output. phc2sys has been started before 288.414 and stopped before 302.415. ptp4l[280.414]: master offset 43 s2 freq +14385 path delay 787 ptp4l[281.414]: master offset -2 s2 freq +14353 path delay 791 ptp4l[282.414]: master offset -42 s2 freq +14312 path delay 791 ptp4l[283.414]: master offset -39 s2 freq +14302 path delay 791 ptp4l[284.414]: master offset -35 s2 freq +14295 path delay 788 ptp4l[285.414]: master offset -19 s2 freq +14300 path delay 788 ptp4l[286.414]: master offset -26 s2 freq +14288 path delay 792 ptp4l[287.414]: master offset -30 s2 freq +14276 path delay 793 ptp4l[288.414]: master offset -1523 s2 freq +12774 path delay 793 ptp4l[289.414]: master offset -1340 s2 freq +12500 path delay 793 ptp4l[290.414]: master offset -907 s2 freq +12531 path delay 793 ptp4l[291.414]: master offset -510 s2 freq +12656 path delay 793 ptp4l[292.414]: master offset -204 s2 freq +12809 path delay 792 ptp4l[293.414]: master offset -149 s2 freq +12803 path delay 792 ptp4l[294.414]: master offset 37 s2 freq +12944 path delay 773 ptp4l[295.414]: master offset -32 s2 freq +12886 path delay 773 ptp4l[296.415]: master offset -3 s2 freq +12905 path delay 733 ptp4l[297.415]: master offset -75 s2 freq +12832 path delay 729 ptp4l[298.415]: master offset -34 s2 freq +12851 path delay 713 ptp4l[299.415]: master offset -45 s2 freq +12830 path delay 713 ptp4l[300.415]: master offset -59 s2 freq +12802 path delay 711 ptp4l[301.415]: master offset 40 s2 freq +12884 path delay 711 ptp4l[302.415]: master offset 1445 s2 freq +14301 path delay 644 ptp4l[303.415]: master offset 1371 s2 freq +14660 path delay 644 ptp4l[304.415]: master offset 955 s2 freq +14655 path delay 647 ptp4l[305.415]: master offset 435 s2 freq +14422 path delay 729 ptp4l[306.415]: master offset 216 s2 freq +14333 path delay 749 ptp4l[307.415]: master offset 121 s2 freq +14303 path delay 755 ptp4l[308.415]: master offset 22 s2 freq +14240 path delay 788 That seems like a bug in either the driver implementation, or in the hardware. Might be. I am not using own hardware, though. I saw this behaviour on a desktop PC running Linux Mint 19 (Kernel 4.15.0-38-generic) with onboard Intel 82579V (rev 05) NIC as well as on a similar system running openSUSE Leap 42.1 (Kernel 4.1.39-56-default) with the same NIC. Maybe this is a problem with that specific ethernet controller? I just found out, that I am not seeing it on my HP notebook running Ubuntu 18.10 (Kernel 4.18.0-12-generic) with Intel I219-V (rev 21) NIC. All of the NICs are using the Intel e1000e driver. While trying to find the mistake in my implementation, I found out that the same thing happens if I use linuxptp and phc2sys. As long as phc2sys is not running, linuxptp reports a pretty stable offset below 200ns, but as soon as I start phc2sys, the offset increases to about 1us for a short time and is corrected by linuxptp a few seconds later. Stopping phc2sys again results in the same behaviour, now with -1us offset. From what I have seen in the code of phc2sys, it reads the PHC only five times in a row, which explains that the offset is only 1us, compared to my 2.5us. phc2sys is likely to call gettime at least once, so this I think is the same problem as above. My conclusion is, that each call of clock_gettime with the PHC ID, delays the time of the clock by around 200-300ns. Is this a bug or expected behaviour? Or am I doing something wrong? I suggest you read your hardware spec sheet and see if somehow reading the clock time causes it to be paused? That, or you've got something weird going on in your
Re: [Linuxptp-devel] PHC delay when calling clock_gettime
> -Original Message- > From: Thomas Behn [mailto:thomas.b...@meinberg.de] > Sent: Monday, December 03, 2018 11:31 AM > To: linuxptp-devel@lists.sourceforge.net > Subject: [Linuxptp-devel] PHC delay when calling clock_gettime > > Hi everyone, > > > While developing a PTP slave using the Linux PHC, I noticed, that everytime I > call > clock_gettime with the PHC ID, the time on the PHC jumps or at least changes > by > around 200-300ns. I noticed this, because I have implemented an algorithm > reading > the PHC time ten times in a row to find a best possible assumption of the > offset to the > system time (as done in Ohly's timecompare). The second before the ten > readings, my > measured PTP offset is a few ns, while directly after the readings it is > around 2.5us. > The offset is stable below 200ns when not reading the PHC and leaving the > system > time unsynchronized. > > That seems like a bug in either the driver implementation, or in the hardware. > While trying to find the mistake in my implementation, I found out that the > same > thing happens if I use linuxptp and phc2sys. As long as phc2sys is not > running, linuxptp > reports a pretty stable offset below 200ns, but as soon as I start phc2sys, > the offset > increases to about 1us for a short time and is corrected by linuxptp a few > seconds > later. Stopping phc2sys again results in the same behaviour, now with -1us > offset. > From what I have seen in the code of phc2sys, it reads the PHC only five > times in a > row, which explains that the offset is only 1us, compared to my 2.5us. phc2sys is likely to call gettime at least once, so this I think is the same problem as above. > > > My conclusion is, that each call of clock_gettime with the PHC ID, delays the > time of > the clock by around 200-300ns. Is this a bug or expected behaviour? Or am I > doing > something wrong? > I suggest you read your hardware spec sheet and see if somehow reading the clock time causes it to be paused? That, or you've got something weird going on in your gettime implementation. Unfortunately, none of us on the list are going to be experts in your hardware or software. I would be incredibly surprised if this was a bug in linuxptp or the PTP kernel subsystem. > > Thanks in advance for your help! > > Good luck! Regards, Jake ___ Linuxptp-devel mailing list Linuxptp-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linuxptp-devel