Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-06 Thread Keller, Jacob E
> -Original Message-
> From: Thomas Behn [mailto:thomas.b...@meinberg.de]
> Sent: Wednesday, December 05, 2018 10:01 PM
> To: Keller, Jacob E ; 
> linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime
> 
> Am 05.12.18 um 18:16 schrieb Keller, Jacob E:
> > I've never heard of this symptom being reported before.
> >
> > My gut reaction is that this is caused by code in e1000e_read_systim, which 
> > reads
> the SYSTIME multiple times. I am suspicious that multiple readings is somehow
> impacting the clock time, which would give the results that you see above.
> >
> > You mentioned another board that didn't have this problem using the same 
> > driver?
> >
> > Thanks,
> > Jake
> 
> Yes, my HP Elitebook 830 G5 doesn't show this problem.
> 
> 
> Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
> 
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4)
> I219-V (rev 21)
>      Subsystem: Hewlett-Packard Company Ethernet Connection (4) I219-V
>      Flags: bus master, fast devsel, latency 0, IRQ 123
>      Memory at b640 (32-bit, non-prefetchable) [size=128K]
>      Capabilities: 
>      Kernel driver in use: e1000e
>      Kernel modules: e1000e
> 
> Linux Kernel 4.18.0-12-generic (Ubuntu 18.10)
> 
> 
> Best Regards,
> Thomas
> 

Ok. The code I looked at in the reading of systime appears for all device 
versions (no hw type check there). However, it's possible maybe it's only 
triggering on that older hardware..

I'm not really sure how to verify that assumption easily though.

Thanks,
Jake

___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-06 Thread Frantisek Rysanek
Dear gentlemen,

I've been following your debate, and while it's a sin to comment off 
topic, it's difficult for me to hold back the following smirk:

i82579V/LM brings back a recollection of a pretty famous bug
discovered a few years ago, where PCI-e ASPM did not work properly, 
with the net result originally described as "low double digit per 
cent" of packet loss.
Possibly originally reported for this particular chip... maybe 
because it was so ubiquitous: the i82579 is the external PHY chip of 
the LOM MAC integrated in the 6-series Intel south bridges (companion 
to SandyBridge CPU's). But actually the ASPM bug used to plague 
several Intel gigabit adaptor models of that era.

I can't seem to find the bugzilla entry or mailing list thread...
but I'm pretty sure someone close to the Linux kernel development 
finally nailed the bug after several months of its first report, and 
patched it in the kernel by disabling ASPM in the Intel NIC driver.
Not sure which one it was, e1000e or igb or what, and what kernel 
version.
A month or so later, this also got "fixed" in the Windows driver by 
Intel.

I surely agree that a problem waking up the PCI-e lane from shallow 
sleep doesn't sound like something that would freeze the on-chip PHC
for a fixed time quantum every time it gets asked :-) especially if 
the workaround is just to disable ASPM altogether.

While googling in vain for traces of that bug, I've found a 
*different* bug report related to the i82579:

https://bugzilla.redhat.com/show_bug.cgi?id=713315

Again... I'm not saying that this is necessarily related.
Just that the 82579 probably had its share of issues ;-)
that possibly got worked around in drivers.
So it seems that I superficially agree with Mr. Keller on that one...

(P.S.: not mentioning the "occasional cfg EEPROM invalidation issue",
which seems totally unrelated and generic across the Intel NIC 
product line.)

Don't get me wrong - I'm a great fan of Intel x86 silicon, including 
the accompanying chipsets and peripheral stuff.
Feels like home to me.
Bugs get fixed. I've seen worse elsewhere.

Frank Rysanek



___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-05 Thread Thomas Behn

Am 05.12.18 um 18:16 schrieb Keller, Jacob E:

I've never heard of this symptom being reported before.

My gut reaction is that this is caused by code in e1000e_read_systim, which 
reads the SYSTIME multiple times. I am suspicious that multiple readings is 
somehow impacting the clock time, which would give the results that you see 
above.

You mentioned another board that didn't have this problem using the same driver?

Thanks,
Jake


Yes, my HP Elitebook 830 G5 doesn't show this problem.


Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) 
I219-V (rev 21)

    Subsystem: Hewlett-Packard Company Ethernet Connection (4) I219-V
    Flags: bus master, fast devsel, latency 0, IRQ 123
    Memory at b640 (32-bit, non-prefetchable) [size=128K]
    Capabilities: 
    Kernel driver in use: e1000e
    Kernel modules: e1000e

Linux Kernel 4.18.0-12-generic (Ubuntu 18.10)


Best Regards,
Thomas




___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-05 Thread Keller, Jacob E
> -Original Message-
> From: Thomas Behn [mailto:thomas.b...@meinberg.de]
> Sent: Tuesday, December 04, 2018 10:04 PM
> To: Keller, Jacob E ; 
> linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime
> 
> Am 04.12.18 um 19:54 schrieb Keller, Jacob E:
> >> -Original Message-
> >> From: Thomas Behn [mailto:thomas.b...@meinberg.de]
> >> Sent: Tuesday, December 04, 2018 12:46 AM
> >> To: Keller, Jacob E ; linuxptp-
> de...@lists.sourceforge.net
> >> Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime
> > Ah. Ok. From the sound of the original statement it sounded like you were
> developing your own hardware.
> >
> > I suspect that there is a work around in the 82579V NIC that causes this, 
> > but I don't
> know offhand. I'll have to go dig a bit, the e1000e driver isn't one I worked 
> on much
> myself.
> >
> > Thanks,
> > Jake
> 
> Thanks again for your help!
> 
> In case this really is caused by a work around in the 82579V, are there
> other NICs, which have similar impairments?
> I am asking, because I want to make my implementation as robust as
> possible for all kinds of hardware.
> 
> Thanks,
> Thomas
> 

I've never heard of this symptom being reported before.

My gut reaction is that this is caused by code in e1000e_read_systim, which 
reads the SYSTIME multiple times. I am suspicious that multiple readings is 
somehow impacting the clock time, which would give the results that you see 
above.

You mentioned another board that didn't have this problem using the same driver?

Thanks,
Jake

___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-04 Thread Thomas Behn

Am 04.12.18 um 19:54 schrieb Keller, Jacob E:

-Original Message-
From: Thomas Behn [mailto:thomas.b...@meinberg.de]
Sent: Tuesday, December 04, 2018 12:46 AM
To: Keller, Jacob E ; 
linuxptp-devel@lists.sourceforge.net
Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime

Ah. Ok. From the sound of the original statement it sounded like you were 
developing your own hardware.

I suspect that there is a work around in the 82579V NIC that causes this, but I 
don't know offhand. I'll have to go dig a bit, the e1000e driver isn't one I 
worked on much myself.

Thanks,
Jake


Thanks again for your help!

In case this really is caused by a work around in the 82579V, are there 
other NICs, which have similar impairments?
I am asking, because I want to make my implementation as robust as 
possible for all kinds of hardware.


Thanks,
Thomas




___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-04 Thread Keller, Jacob E
> -Original Message-
> From: Thomas Behn [mailto:thomas.b...@meinberg.de]
> Sent: Tuesday, December 04, 2018 12:46 AM
> To: Keller, Jacob E ; 
> linuxptp-devel@lists.sourceforge.net
> Subject: Re: [Linuxptp-devel] PHC delay when calling clock_gettime

> > That seems like a bug in either the driver implementation, or in the 
> > hardware.
> Might be. I am not using own hardware, though. I saw this behaviour on a
> desktop PC running Linux Mint 19 (Kernel 4.15.0-38-generic) with onboard
> Intel 82579V (rev 05) NIC as well as on a similar system running
> openSUSE Leap 42.1 (Kernel 4.1.39-56-default) with the same NIC. Maybe
> this is a problem with that specific ethernet controller?
> 
> I just found out, that I am not seeing it on my HP notebook running
> Ubuntu 18.10 (Kernel 4.18.0-12-generic) with Intel I219-V (rev 21) NIC.
> All of the NICs are using the Intel e1000e driver.

Ah. Ok. From the sound of the original statement it sounded like you were 
developing your own hardware.

I suspect that there is a work around in the 82579V NIC that causes this, but I 
don't know offhand. I'll have to go dig a bit, the e1000e driver isn't one I 
worked on much myself.

Thanks,
Jake

___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel


Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-04 Thread Thomas Behn

Am 03.12.18 um 22:41 schrieb Keller, Jacob E:

-Original Message-
From: Thomas Behn [mailto:thomas.b...@meinberg.de]
Sent: Monday, December 03, 2018 11:31 AM
To: linuxptp-devel@lists.sourceforge.net
Subject: [Linuxptp-devel] PHC delay when calling clock_gettime

Hi everyone,


While developing a PTP slave using the Linux PHC, I noticed, that everytime I 
call
clock_gettime with the PHC ID, the time on the PHC jumps or at least changes by
around 200-300ns. I noticed this, because I have implemented an algorithm 
reading
the PHC time ten times in a row to find a best possible assumption of the 
offset to the
system time (as done in Ohly's timecompare). The second before the ten 
readings, my
measured PTP offset is a few ns, while directly after the readings it is around 
2.5us.
The offset is stable below 200ns when not reading the PHC and leaving the system
time unsynchronized.



Just for further information, this is an excerpt of the linuxptp output.
phc2sys has been started before 288.414 and stopped before 302.415.

ptp4l[280.414]: master offset 43 s2 freq +14385 path delay   787
ptp4l[281.414]: master offset -2 s2 freq  +14353 path 
delay   791
ptp4l[282.414]: master offset    -42 s2 freq  +14312 path 
delay   791
ptp4l[283.414]: master offset    -39 s2 freq  +14302 path 
delay   791
ptp4l[284.414]: master offset    -35 s2 freq  +14295 path 
delay   788
ptp4l[285.414]: master offset    -19 s2 freq  +14300 path 
delay   788
ptp4l[286.414]: master offset    -26 s2 freq  +14288 path 
delay   792
ptp4l[287.414]: master offset    -30 s2 freq  +14276 path 
delay   793
ptp4l[288.414]: master offset  -1523 s2 freq  +12774 path 
delay   793
ptp4l[289.414]: master offset  -1340 s2 freq  +12500 path 
delay   793
ptp4l[290.414]: master offset   -907 s2 freq  +12531 path 
delay   793
ptp4l[291.414]: master offset   -510 s2 freq  +12656 path 
delay   793
ptp4l[292.414]: master offset   -204 s2 freq  +12809 path 
delay   792
ptp4l[293.414]: master offset   -149 s2 freq  +12803 path 
delay   792
ptp4l[294.414]: master offset 37 s2 freq  +12944 path 
delay   773
ptp4l[295.414]: master offset    -32 s2 freq  +12886 path 
delay   773
ptp4l[296.415]: master offset -3 s2 freq  +12905 path 
delay   733
ptp4l[297.415]: master offset    -75 s2 freq  +12832 path 
delay   729
ptp4l[298.415]: master offset    -34 s2 freq  +12851 path 
delay   713
ptp4l[299.415]: master offset    -45 s2 freq  +12830 path 
delay   713
ptp4l[300.415]: master offset    -59 s2 freq  +12802 path 
delay   711
ptp4l[301.415]: master offset 40 s2 freq  +12884 path 
delay   711
ptp4l[302.415]: master offset   1445 s2 freq  +14301 path 
delay   644
ptp4l[303.415]: master offset   1371 s2 freq  +14660 path 
delay   644
ptp4l[304.415]: master offset    955 s2 freq  +14655 path 
delay   647
ptp4l[305.415]: master offset    435 s2 freq  +14422 path 
delay   729
ptp4l[306.415]: master offset    216 s2 freq  +14333 path 
delay   749
ptp4l[307.415]: master offset    121 s2 freq  +14303 path 
delay   755
ptp4l[308.415]: master offset 22 s2 freq  +14240 path 
delay   788



That seems like a bug in either the driver implementation, or in the hardware.
Might be. I am not using own hardware, though. I saw this behaviour on a 
desktop PC running Linux Mint 19 (Kernel 4.15.0-38-generic) with onboard 
Intel 82579V (rev 05) NIC as well as on a similar system running 
openSUSE Leap 42.1 (Kernel 4.1.39-56-default) with the same NIC. Maybe 
this is a problem with that specific ethernet controller?


I just found out, that I am not seeing it on my HP notebook running 
Ubuntu 18.10 (Kernel 4.18.0-12-generic) with Intel I219-V (rev 21) NIC.

All of the NICs are using the Intel e1000e driver.

While trying to find the mistake in my implementation, I found out that the same
thing happens if I use linuxptp and phc2sys. As long as phc2sys is not running, 
linuxptp
reports a pretty stable offset below 200ns, but as soon as I start phc2sys, the 
offset
increases to about 1us for a short time and is corrected by linuxptp a few 
seconds
later. Stopping phc2sys again results in the same behaviour, now with -1us 
offset.
 From what I have seen in the code of phc2sys, it reads the PHC only five times 
in a
row, which explains that the offset is only 1us, compared to my 2.5us.

phc2sys is likely to call gettime at least once, so this I think is the same 
problem as above.



My conclusion is, that each call of clock_gettime with the PHC ID, delays the 
time of
the clock by around 200-300ns. Is this a bug or expected behaviour? Or am I 
doing
something wrong?


I suggest you read your hardware spec sheet and see if somehow reading the 
clock time causes it to be paused? That, or you've got something weird going on 
in your

Re: [Linuxptp-devel] PHC delay when calling clock_gettime

2018-12-03 Thread Keller, Jacob E
> -Original Message-
> From: Thomas Behn [mailto:thomas.b...@meinberg.de]
> Sent: Monday, December 03, 2018 11:31 AM
> To: linuxptp-devel@lists.sourceforge.net
> Subject: [Linuxptp-devel] PHC delay when calling clock_gettime
> 
> Hi everyone,
> 
> 
> While developing a PTP slave using the Linux PHC, I noticed, that everytime I 
> call
> clock_gettime with the PHC ID, the time on the PHC jumps or at least changes 
> by
> around 200-300ns. I noticed this, because I have implemented an algorithm 
> reading
> the PHC time ten times in a row to find a best possible assumption of the 
> offset to the
> system time (as done in Ohly's timecompare). The second before the ten 
> readings, my
> measured PTP offset is a few ns, while directly after the readings it is 
> around 2.5us.
> The offset is stable below 200ns when not reading the PHC and leaving the 
> system
> time unsynchronized.
> 
> 

That seems like a bug in either the driver implementation, or in the hardware.

> While trying to find the mistake in my implementation, I found out that the 
> same
> thing happens if I use linuxptp and phc2sys. As long as phc2sys is not 
> running, linuxptp
> reports a pretty stable offset below 200ns, but as soon as I start phc2sys, 
> the offset
> increases to about 1us for a short time and is corrected by linuxptp a few 
> seconds
> later. Stopping phc2sys again results in the same behaviour, now with -1us 
> offset.
> From what I have seen in the code of phc2sys, it reads the PHC only five 
> times in a
> row, which explains that the offset is only 1us, compared to my 2.5us.

phc2sys is likely to call gettime at least once, so this I think is the same 
problem as above.

> 
> 
> My conclusion is, that each call of clock_gettime with the PHC ID, delays the 
> time of
> the clock by around 200-300ns. Is this a bug or expected behaviour? Or am I 
> doing
> something wrong?
> 

I suggest you read your hardware spec sheet and see if somehow reading the 
clock time causes it to be paused? That, or you've got something weird going on 
in your gettime implementation.

Unfortunately, none of us on the list are going to be experts in your hardware 
or software. I would be incredibly surprised if this was a bug in linuxptp or 
the PTP kernel subsystem.

> 
> Thanks in advance for your help!
> 
> 

Good luck!

Regards,
Jake


___
Linuxptp-devel mailing list
Linuxptp-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linuxptp-devel