Re: Corrected gettimeofday() test code
On Sat, 29 Nov 2003, Kris Kennaway wrote: > I forwarded the reports of timecounter problems to phk, and he asked > that people who are seeing timecounter problems provide FULL details > of their system configuration, including: Please note that there are quite a few broken timecounters out there, a lot of them break when HZ != 100. I encountered this with a newer revision of the dell 1550, the old revision was fine in this respect oddly enough. These problems do however show up quickly when running ntpd which will complain about a borked system clock every 10 minutes or so. Make sure that your hardware is good before blaming software :). HTH, HAND. -- Sten Spans There is a crack in everything that's how the light gets in. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Corrected gettimeofday() test code
In message <[EMAIL PROTECTED]>, "Marc G. Fournier" writes: >On Sat, 29 Nov 2003, Don Bowman wrote: > >> For this config (below), kern.timecounter.method=0 reproduces the >> problem, kern.timecounter.method=1 does not. > >Can anyone comment on what effect setting method to 1 has on the system? >Like, would one notice a degradation in performance? Setting it to 1 forces the timecounter to update every HZ. You can say that 0 was an optimization which it seems may not be safe on certain hardware combinations... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Corrected gettimeofday() test code
In message <[EMAIL PROTECTED]>, "Marc G. Fournier" writes: Hmm, I'm not saying this makes sense, but at least there is an emergent pattern... Col#1 is duration of the shift, sorted in increasing order. Col#2 is the delta to the line above. Notice stratification on 50msec boundaries: 695.426189 - 695.426190 0.01 695.426191 0.01 695.426191 0.00 695.426192 0.01 695.476192 0.05 695.476193 0.01 695.476194 0.01 695.526193 0.04 695.526194 0.01 695.526196 0.02 695.526196 0.00 695.526198 0.02 695.526201 0.03 695.576193 0.049992 695.576195 0.02 695.626195 0.05 695.676192 0.049997 695.676195 0.03 The only place I can think of 50msec in relation to the timecounter is NTIMECOUNTER * 1/HZ. Try to modify some of that, and see if you can affect the results. In particular, try to yank NTIMECOUNTER _way_ up, like a thousand and see if the problem goes away. Another (uglier) option is that ACPI/SMI/APM or similar BIOS magic screws up the i8254 on a regular basis, or even that the latch functionality of the i8254 doesn't work properly. This is not unheard off. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Corrected gettimeofday() test code
On Sat, 29 Nov 2003, Don Bowman wrote: > For this config (below), kern.timecounter.method=0 reproduces the > problem, kern.timecounter.method=1 does not. Can anyone comment on what effect setting method to 1 has on the system? Like, would one notice a degradation in performance? Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Corrected gettimeofday() test code
From: Kris Kennaway [mailto:[EMAIL PROTECTED] > > I forwarded the reports of timecounter problems to phk, and he asked > that people who are seeing timecounter problems provide FULL details > of their system configuration, including: > > * dmesg > > * kernel configuration > > * compiler options > > * time-related system configuration (whether ntpd/timed/ntpdate is > running, and if so whether it's correcting for a seriously drifting > clock) > > * The kernel timecounter configuration, e.g. the > kern.timecounter.method and kern.timecounter.hardware sysctls, and > whether changing them has any effect. > > * The exact output of the corrected test program below (the original > would give spurious errors if it didn't run at least once a second, > which may have been confusing some people if their systems were > sufficiently loaded). > > * The system status when the problem is observed (i.e. does it only > occur under load; what else is running at the time) > For this config (below), kern.timecounter.method=0 reproduces the problem, kern.timecounter.method=1 does not. Output in 'error' case: 1070147643.248866 1070147651.028646 1070147643.248866 1070147651.028646 1070147656.287818 1070147664.067692 1070147656.287818 1070147664.067692 1070147659.326429 1070147667.106238 1070147659.326429 1070147667.106238 1070147668.370071 1070147676.149884 1070147668.370071 1070147676.149884 1070147681.433111 1070147689.212926 1070147681.433111 1070147689.212926 1070147683.418743 1070147691.198632 1070147683.418743 1070147691.198632 problem shows up within ~30s of starting the test program, and the messages will come out about once per 1-5s period after that, not regularly. kern.timecounter.hardware: TSC on this machine, i have others which are i8254 which do it too. hw.ncpu=1 compiler flags: COPTFLAGS= -O2 -pipe -malign-loops=4 -malign-jumps=4 -malign-functions=4 -mcpu=i686 -march=i686 -fno-gcse -g machine is running 4.7-RELEASE-p2. dmesg, kernel config attached. Intel-specific functions: Version 0673: Type 0 - Original OEM Family 6 - Pentium Pro Model 7 - Pentium III/Pentium III Xeon - external L2 cache Stepping 3 Reserved 0 from cpuid. ntpd -p /var/run/ntpd.pid -b -g -A runs. Clock does not drift much, less than 2s/day. Clock is stepped when system boots. There is a chimer on our router which broadcasts ntp time every minute. ntpd is not observed to step clock. On the 8254 machine [a dual 0f27 xeon, 533 FSB with HTT enabled] 1070147645.119531 1070148340.497729 1070148339.-693880469 1070148340.497729 both systems were unloaded. This was with the corrected program you included. --don dmesg.boot Description: Binary data ___# ___# GENERIC -- Generic kernel configuration file for FreeBSD/i386 ___# ___# For more information on this file, please read the handbook section on ___# Kernel Configuration Files: ___# ___#http://www.FreeBSD.org/handbook/kernelconfig-config.html ___# ___# The handbook is also available locally in /usr/share/doc/handbook ___# if you've installed the doc distribution, otherwise always see the ___# FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the ___# latest information. ___# ___# An exhaustive list of options and more detailed explanations of the ___# device lines is also present in the ./LINT configuration file. If you are ___# in doubt as to the purpose or necessity of a line, check first in LINT. ___# ___# $FreeBSD: src/sys/i386/conf/GENERIC,v 1.246.2.39 2002/03/24 13:19:10 wilko Exp $ ___machine i386 ___#cpu I386_CPU ___#cpu I486_CPU ___cpu I586_CPU ___cpu I686_CPU ___identTPC ___maxusers 0 ___makeoptions DEBUG=-g#Build kernel with gdb(1) debug symbols ___#options MATH_EMULATE#Support for x87 emulation ___options INET#InterNETworking ___#options INET6 #IPv6 communications protocols ___options FFS #Berkeley Fast Filesystem ___options FFS_ROOT#FFS usable as root device [keep this!] ___options SOFTUPDATES #Enable FFS soft updates support ___options UFS_DIRHASH #Improve performance on big directories ___options MFS #Memory Filesystem ___options MD_ROOT #MD is a potential root device ___options NFS #Network Filesystem ___options NFS_ROOT#NFS usable as root device, NFS required ___options MSDOSFS #MSDOS Filesystem ___options CD9660 #ISO 9660 Filesystem ___options CD9660_ROOT #CD-ROM usable as root, CD9660 required ___options PROCFS #Process filesystem ___options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] ___options SCSI_DELAY=2000 #Delay (in ms) before probing SCSI ___options UCONSOLE#Allow users to grab t
Re: Corrected gettimeofday() test code
On Sat, 29 Nov 2003, Kris Kennaway wrote: > * dmesg Attached ... > * kernel configuration Attached ... > * compiler options CFLAGS= -O -mpentium -pipe -g -DKVA_PAGES=512 COPTFLAGS= -O -mpentium -pipe -DKVA_PAGES=512 > * time-related system configuration (whether ntpd/timed/ntpdate is > running, and if so whether it's correcting for a seriously drifting > clock) ntpdate running twice a day (11:59 & 23:59) > * The kernel timecounter configuration, e.g. the > kern.timecounter.method and kern.timecounter.hardware sysctls, and > whether changing them has any effect. neptune# sysctl kern.timecounter kern.timecounter.method: 0 kern.timecounter.hardware: i8254 > * The exact output of the corrected test program below (the original > would give spurious errors if it didn't run at least once a second, > which may have been confusing some people if their systems were > sufficiently loaded). neptune# ./timetest 1070147310.273323 1070148005.799516 1070147310.273323 1070148005.799516 1070147313.623382 1070148009.299574 1070147313.623382 1070148009.299574 1070147320.223573 1070148015.699766 1070147320.223573 1070148015.699766 1070147351.373946 1070148046.800136 1070147351.373946 1070148046.800136 1070147361.174180 1070148056.700378 1070147361.174180 1070148056.700378 1070147374.524295 1070148070.100490 1070147374.524295 1070148070.100490 1070147378.674361 1070148074.100552 1070147378.674361 1070148074.100552 1070147396.124712 1070148091.700905 1070147396.124712 1070148091.700905 1070147413.724886 1070148109.401081 1070147413.724886 1070148109.401081 1070147447.925401 1070148143.401593 1070147447.925401 1070148143.401593 1070147450.475440 1070148146.101635 1070147450.475440 1070148146.101635 1070147453.175562 1070148148.701758 1070147453.175562 1070148148.701758 1070147474.275892 1070148169.702084 1070147474.275892 1070148169.702084 1070147527.826607 1070148223.302801 1070147527.826607 1070148223.302801 1070147531.876673 1070148227.302864 1070147531.876673 1070148227.302864 1070147531.876665 1070148227.402859 1070147531.876665 1070148227.402859 1070147545.176941 1070148240.703142 1070147545.176941 1070148240.703142 1070147616.077946 1070148311.504135 1070147616.077946 1070148311.504135 1070147616.178065 1070148311.704261 1070147616.178065 1070148311.704261 > * The system status when the problem is observed (i.e. does it only > occur under load; what else is running at the time) neptune# uptime 7:24PM up 47 days, 6:34, 5 users, load averages: 1.98, 1.23, 0.79 neptune# ps aux | wc -l 346 Anything else that I can provide, let me know ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 kernel.gz Description: Binary data dmesg.boot.gz Description: Binary data ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Corrected gettimeofday() test code
I forwarded the reports of timecounter problems to phk, and he asked that people who are seeing timecounter problems provide FULL details of their system configuration, including: * dmesg * kernel configuration * compiler options * time-related system configuration (whether ntpd/timed/ntpdate is running, and if so whether it's correcting for a seriously drifting clock) * The kernel timecounter configuration, e.g. the kern.timecounter.method and kern.timecounter.hardware sysctls, and whether changing them has any effect. * The exact output of the corrected test program below (the original would give spurious errors if it didn't run at least once a second, which may have been confusing some people if their systems were sufficiently loaded). * The system status when the problem is observed (i.e. does it only occur under load; what else is running at the time) It's hard to track down this kind of problem when people don't provide this level of detail, and half-reports just confuse the issue. Thanks, Kris - Forwarded message from Poul-Henning Kamp <[EMAIL PROTECTED]> - X-Original-To: [EMAIL PROTECTED] Delivered-To: [EMAIL PROTECTED] Delivered-To: [EMAIL PROTECTED] Date: Sat, 29 Nov 2003 23:48:12 +0100 From: Poul-Henning Kamp <[EMAIL PROTECTED]> Subject: Re: gettimeofday() test code In-reply-to: "Your message of Sat, 29 Nov 2003 14:21:24 PST." <[EMAIL PROTECTED]> To: Kris Kennaway <[EMAIL PROTECTED]> X-UIDL: 1f9b02cde6c0ab3cc62d0309b681fcc3 X-Bogosity: No, tests=bogofilter, spamicity=0.00, version=0.15.7 Have them run this one instead... #include #include int main() { struct timeval prevtime; struct timeval curtime; double last, now; gettimeofday(&prevtime, NULL); last = prevtime.tv_sec + prevtime.tv_usec * 1e-6; for (;;) { gettimeofday(&curtime, NULL); now = curtime.tv_sec + curtime.tv_usec * 1e-6; if (now < last) { printf("%f %f %d.%06d %d.%06d\n", now, last, curtime.tv_sec, curtime.tv_usec, prevtime.tv_sec, prevtime.tv_usec); } last = now; prevtime = curtime; } return 0; } -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 [EMAIL PROTECTED] | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. - End forwarded message - pgp0.pgp Description: PGP signature