Re: [ntp:questions] NTP vs RADclock?

unruh Sun, 10 Jun 2012 16:30:09 -0700

On 2012-06-10, Rick Jones <[email protected]> wrote:
> unruh <[email protected]> wrote:
>> On 2012-06-08, Rick Jones <[email protected]> wrote:
>> > I would suggest then trying disabling of the interrupt coalescing
>> > via ethtool on the 1GbE NIC of your server and a few select
>> > clients and see what that does.  If things start to look cleaner
>> > then you know it is an implementation-specific detail of one or
>> > more GbE NICs.
>
>> It looks to me that interrupt coalescing is not enables according to
>> ethtools.
>
> I'd like to see the full output of ethtool, ethtool -i and ethtool -c
> for your interfaces if I may.  Feel free to send as direct email if
> you prefer.


info:10.0[unruh]>ethtool -i eth0                   
driver: e1000                                      
version: 7.3.21-k8-NAPI                            
firmware-version: N/A                              
bus-info: 0000:06:00.0                             
info:10.0[unruh]>ethtool -c eth0                   
Coalesce parameters for eth0:                      
Adaptive RX: off  TX: off                          
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 3
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 0
tx-frames: 0
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0



>
>> > If it is possible to connect a client "back-to-back" to your server at
>> > the same time (via a second port) - still with interrupt coalescing
>> > disabled at both ends that would be an excellent addition.  That will
>> > help evaluate the switch.
>> >
>> > I trust there were no OS changes when going from 100BT to GbE?  Though
>> > even if not, there is still the prospect of the drivers for the 100BT
>> > cards not doing what linux calls "napi" and the drivers for the GbE
>> > cards doing it, which may introduce some timing changes.
>
>> What is napi?
>
> Napi is a mechanism whereby interrupts on a NIC get disabled, and
> packets are polled for for a certain length of time.
>
> http://www.linuxfoundation.org/collaborate/workgroups/networking/napi
> http://en.wikipedia.org/wiki/New_API
>
>> >> So yes, I think it is the Gb technology that is causing trouble. 
>> >
>> > I split what may seem a hair between Gb technology being the IEEE
>> > specification and Gb implementation being what specific NIC vendors
>> > do.  So, to me, interrupt coalescing is implementation not technology.  
>
>> For me, I do not care what which it is, it is all Gb. 
>
> I suspect that my caring about Gb technology/specification vs Gb
> implementation may be not all that far from a timekeeper's desire to
> distinguish between accuracy and precision, even when laypeople start
> to mix the two :)
>
>> Note that on one of the clients, there are two separate clusters of
>> roundtrip delays, one from .15 to about .4ms, and the other from
>> about 1.3 to 1.6 ms. The slope within each cluster is as above but
>> the slope between the clusters is the opposite. Ie, within the
>> cluster, the client to server is being delayed, while the clusters
>> are due to a huge delay in the server to client. (if I have the
>> signs right)
>
>> In http://www.theory.physics.ubc.ca/scatter/scatter.html I have the
>> scatter plots (offset vs return time) for two clients to two
>> different servers. One of the servers is a Gb server, while the
>> other is a 100Mb server. Both servers are disciplined by a GPS PPS
>> device. The offset fluctuations on both servers is about 4 us, so
>> none of the offset fluctuations come from the server clocks
>> themselves.
>
> It would be good to include the specific card name and driver rev etc
> in subsequent writeups.  Over the years there have been several Intel
> gigabit cards and 100BT cards.  I believe just about all the Intel GbE
> cards have had support for interrupt coalescing in some form or
> another.  At least those which have crossed my path.
>
> rick jones
>
> lspci -v can help if you don't already know the card name(s)
>

On the misbehaving machine
Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)


The fact that the distribution in round trip times is almost a perfect
square pulse (Ie, constant probability between the minimum 1.4us to the
max .4us) suggests that may it is polling rather than interrupt, altough
the card certainly has an interrupt 

>From dmesg

[   14.333930] e1000: Intel(R) PRO/1000 Network Driver - version
7.3.21-k8-NAPI
[   14.333936] e1000: Copyright (c) 1999-2006 Intel Corporation.
[   14.334031] e1000 0000:06:00.0: PCI INT A -> GSI 21 (level, low) ->
IRQ 21
[   14.766662] e1000 0000:06:00.0: eth0: (PCI:33MHz:32-bit)
00:1b:21:1d:c0:2d
[   14.766675] e1000 0000:06:00.0: eth0: Intel(R) PRO/1000 Network
Connection
[   68.420253] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: RX
[   68.812100] e1000: eth0 NIC Link is Down
[   79.713724] e1000: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow
Control: RX

(I have no idea what that change to 100Mbps means. I am checking that
the switch has not been configured to force 100Mbs on this port. I still
do not see how this could explain the problem, but I will check.)

The ethernet controller on the first client in the scatterplots is 
Intel Corporation 82562EZ 10/100 Ethernet Controller (rev 01)

The controller on the second one ( the one with the two clusters) is
Intel Corporation 82557/8/9 Ethernet Pro 100 (rev 08)



_______________________________________________
questions mailing list
[email protected]
http://lists.ntp.org/listinfo/questions

Re: [ntp:questions] NTP vs RADclock?

Reply via email to