Re: clock jump in 0.2.0.23-rc, too

2008-04-05 Thread Scott Bennett
 Here is some of yesterday's notice-level log from my tor (0.2.0.23-rc)
server:

Apr 04 05:34:37.285 [warn] Your system clock just jumped 133 seconds forward; 
assuming established circuits no longer work.
Apr 04 05:43:23.629 [warn] Your system clock just jumped 140 seconds forward; 
assuming established circuits no longer work.
Apr 04 09:30:27.029 [warn] Your system clock just jumped 121 seconds forward; 
assuming established circuits no longer work.
Apr 04 09:35:30.627 [warn] Your system clock just jumped 104 seconds forward; 
assuming established circuits no longer work.
Apr 04 09:49:34.378 [warn] Your system clock just jumped 106 seconds forward; 
assuming established circuits no longer work.
Apr 04 10:00:42.032 [warn] Your system clock just jumped 111 seconds forward; 
assuming established circuits no longer work.
Apr 04 10:24:09.684 [warn] Your system clock just jumped 110 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:17:36.702 [warn] Your system clock just jumped 126 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:21:22.129 [warn] Your system clock just jumped 133 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:25:26.342 [warn] Your system clock just jumped 163 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:30:26.061 [warn] Your system clock just jumped 117 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:33:36.188 [warn] Your system clock just jumped 126 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:43:34.353 [warn] Your system clock just jumped 110 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:49:46.641 [warn] Your system clock just jumped 112 seconds forward; 
assuming established circuits no longer work.
Apr 04 11:57:48.956 [warn] Your system clock just jumped 116 seconds forward; 
assuming established circuits no longer work.
Apr 04 12:08:02.819 [warn] Your system clock just jumped 122 seconds forward; 
assuming established circuits no longer work.
Apr 04 12:23:38.767 [warn] Your system clock just jumped 118 seconds forward; 
assuming established circuits no longer work.
Apr 04 12:27:52.359 [warn] Your system clock just jumped 193 seconds forward; 
assuming established circuits no longer work.
Apr 04 12:34:25.515 [warn] Your system clock just jumped 149 seconds forward; 
assuming established circuits no longer work.

 Each time one of those messages occurs, tor kills all its open circuits,
so you can see that it was being basically pretty useless as a server.  I gave
it a SIGINT to shut it down, and once again it took approximately *55 seconds*
to acknowledge receipt of SIGINT in the log.

Apr 04 13:16:18.867 [notice] Interrupt: will shut down in 1500 seconds. 
Interrupt again to exit now.
Apr 04 13:41:18.700 [notice] Clean shutdown finished. Exiting.

 Once the shutdown had completed, I swapped 0.2.0.23-rc for 0.2.0.19-alpha,
the last release that did *not* have the clock jump bug, to the best of my
recollection.  I have since given it a SIGHUP once to get it to change circuits
for me for a long download.

Apr 04 14:01:33.196 [notice] Tor 0.2.0.19-alpha (r13450) opening log file.
Apr 04 14:01:33.196 [warn] Can't log to stdout with RunAsDaemon set; skipping 
stdout
Apr 04 14:01:33.767 [notice] Your Tor server's identity key fingerprint is 
'MYCROFTsOtherChild 0232 2050 5A55 0D6F DF0C 20FF 7C48 E66B A06A 49A6'
Apr 04 14:01:34.177 [warn] Please upgrade! This version of Tor (0.2.0.19-alpha) 
is not recommended, according to the directory authorities. Recommended 
versions are: 0.1.2.19,0.2.0.22-rc,0.2.0.23-rc
Apr 04 14:01:42.424 [notice] We now have enough directory information to build 
circuits.
Apr 04 14:01:42.558 [warn] Nickname list includes 'yourd0f670b45a' which isn't 
a known router.
Apr 04 14:01:44.518 [notice] Self-testing indicates your ORPort is reachable 
from the outside. Excellent. Publishing server descriptor.
Apr 04 14:01:45.345 [notice] Tor has successfully opened a circuit. Looks like 
client functionality is working.
Apr 04 14:02:45.827 [notice] Self-testing indicates your DirPort is reachable 
from the outside. Excellent.
Apr 04 14:03:34.106 [notice] Performing bandwidth self-test...done.
Apr 04 20:15:05.533 [notice] Client asked for the mirrored directory, but we 
don't have a good one yet. Sending 503 Dir not available.
Apr 04 20:48:48.228 [notice] Received reload signal (hup). Reloading config.
Apr 04 20:48:48.233 [notice] Tor 0.2.0.19-alpha (r13450) opening log file.
Apr 04 20:48:48.234 [warn] Can't log to stdout with RunAsDaemon set; skipping 
stdout
Apr 04 20:48:48.234 [warn] Can't log to stdout with RunAsDaemon set; skipping 
stdout
Apr 04 20:48:48.666 [warn] Nickname list includes 'yourd0f670b45a' which isn't 
a known router.
Apr 04 21:21:11.997 [warn] Please upgrade! This version of Tor (0.2.0.19-alpha) 
is not recommended, according to the directory authorities. 

Re: clock jump in 0.2.0.23-rc, too

2008-04-05 Thread coderman
On Sat, Apr 5, 2008 at 1:47 PM, Scott Bennett [EMAIL PROTECTED] wrote:
  ... Here is some of yesterday's notice-level log from my tor (0.2.0.23-rc) 
 ...
  Those are all of the messages issued thus far, so 0.2.0.19-alpha has been
  running with no problems for over 25 hours.  I will continue to watch the 
 list
  for news of a new release to try, but for now, the *-rc versions are quite
  unusable as servers.

i'm curious to know if your configure script is perhaps doing
something wonky; can you determine if there are changes in the link
flags, config.h, or -D defines between these two versions that
configure may be generating?

in particular, i'm curious if something in a failed configure setup is
causing your tor_gettimeofday wrapper to use ftime() instead of
gettimeofday().

best regards,


Re: clock jump in 0.2.0.23-rc, too

2008-04-02 Thread phobos
On Tue, Apr 01, 2008 at 04:58:22AM -0500, [EMAIL PROTECTED] wrote 4.5K bytes in 
68 lines about:
:   hellas # sysctl machdep.hyperthreading_allowed=0
:   machdep.hyperthreading_allowed: 1 - 0

Just for a test, have you tried disabling HT in the bios?  I'll setup a
fbsd-7-stable sytem with 0.2.0.23-rc on an HT system to see if I can
replicate.  

-- 
Andrew


Re: clock jump in 0.2.0.23-rc, too

2008-04-02 Thread Scott Bennett
On Wed, 2 Apr 2008 09:21:44 -0400 [EMAIL PROTECTED] wrote:
On Tue, Apr 01, 2008 at 04:58:22AM -0500, [EMAIL PROTECTED] wrote 4.5K bytes 
in 68 lines about:
:  hellas # sysctl machdep.hyperthreading_allowed=0
:  machdep.hyperthreading_allowed: 1 - 0

Just for a test, have you tried disabling HT in the bios?  I'll setup a

 No, I haven't.  Now that I think of it, I don't remember *ever* doing it
that way, probably because I had no reason to play around with it, and a reboot
would be required to change it and then again to change it back.  I suppose I
could also build a non-SMP kernel to try, as well.  The GENERIC kernel is a
non-SMP kernel, and it recognizes the two logical CPUs at boot time, but then
ignores cpu1 from that point on.

fbsd-7-stable sytem with 0.2.0.23-rc on an HT system to see if I can
replicate.  

 Okay, but that sounds like an inordinate amount of hassle.  Also, I'm
running 6.3-STABLE, not 7.0-STABLE.
 Shortly after midnight, my Internet connection went down, and it didn't
come back up till about 10 a.m.  While the link was down and more than 21 hours
after I had disabled hyperthreading in the kernel, tor put out the following
messages:

Apr 02 02:01:22.178 [warn] Your system clock just jumped 115 seconds forward; 
assuming established circuits no longer work.
Apr 02 02:03:14.235 [warn] Your system clock just jumped 112 seconds forward; 
assuming established circuits no longer work.

Notice that, once again, the second message is timestamped the same number of
seconds later than the first message as the number of seconds complained of in
the second message.  I.e., no time actually elapsed between the two messages.
(Note also that these messages *always* say the clock has jumped *forward*,
never backward, which also argues against the idea of being whipsawed between
two out-of-sync clocks on a SMP system.)  It did this at a time that was about
14 - 16 minutes before ntpdate would next have been run by cron, and while the
network connection was down, meaning that ntpdate would not have changed the
time anyway, and while hyperthreading was disabled.
 So I reenabled hyperthreading because it clearly is not the problem.  All
the evidence suggests, but does not prove, that some change was introduced in
0.2.0.21-rc that is causing the problem.  The change might possibly have been
made earlier than that, but with the symptoms not appearing for some reason
until 0.2.0.21-rc.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at cs.niu.edu  *
**
* A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army.   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**


Re: clock jump in 0.2.0.23-rc, too

2008-04-01 Thread Scott Bennett
 On Mon, 31 Mar 2008 22:49:05 -0700 Lucky Greeen [EMAIL PROTECTED]
wrote:
Scott Bennett wrote:
  It appears that the clock jump problem does persist into 0.2.0.23-rc.
 Here are the notice-level log messages since I started up the new version
 this afternoon:
 [remainder eldided]
   
Scott,
If  you are using a dual-core or multiprocessor system, it is likely
that you too are experiencing issues with the TSC being out of syc
between the cores. The TSC counters on physical CPUs can get out of sync
just as CPUs on virtual CPUs can. Physical CPUs just experience this
issue so rarely that the most users will never encounter it. The
likelihood that the TSCs between two cores on the same system get out of
sync appears to be system/motherboard/CPU dependent.

 It's actually a hyperthreading 3.4 GHz P4.  FreeBSD supports such chips
as SMP systems, although it's really a single core with two logical CPUs vying
for pipeline slots.  Running systat's vmstat display with two-second updates,
the time interrupt counts for cpu0 and cpu1 hover around 2000 and are usually
equal.  Only rarely do they differ between them by more than a single interrupt
in each two-second interval, so it's difficult to imagine the kernel tolerating
an accumulated difference of over 100 K interrupts in the two counts occurring.
It's also hard to imagine how it could happen when there's really almost no
load on the system.

See http://kerneltrap.org/node/14003 on the rationale for adding the
notsc option to the Linux kernel. Also see

 I notice it refers to a problem in the LINUX x86_64 support for dual core
systems.  Mine is only hyperthreading-capable, not dual cored, and is only
a 32-bit chip, so I run FreeBSD's i386 versions, not the amd64 versions.

http://lkml.org/lkml/2005/11/4/173 for a detailed explanation by an AMD
engieer of why and under which conditions the TSCs might go out of sync.

 Fascinating, indeed, but probably irrelevant to a single-cored, HTT-
enabled P4 chip.  sysctl reveals quite a few variables that are available on
such a system for the chip as a whole, but not for individual (logical) CPUs.

At least in the past FreeBSD was also impacted by TSCs getting out of
sync, causing the time to jump as the the process migrates between CPUs.
See
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2004-08/1105.html

 Interesting thread.  Thanks.

There is a very simple way for you to test if the time jumps you have
been seeing are caused by the TSCs between two CPU cores being out of
sync: simply temporarily disable SMP in your kernel. If the problem goes
away, the TSCs likely are out of sync. If so, you may want to find out
of there is a way to disable FreeBSD's use of the TSC similar to the
notsc option in Linux and see if that addresses the problem.

 Here's a bit more information, the clue for which came from the second
URL's article above.

[hellas] 333 % sysctl kern.timecounter
kern.timecounter.stepwarnings: 0
kern.timecounter.nbinuptime: 751724079
kern.timecounter.nnanouptime: 5273
kern.timecounter.nmicrouptime: 353617
kern.timecounter.nbintime: 295482161
kern.timecounter.nnanotime: 157914288
kern.timecounter.nmicrotime: 137569526
kern.timecounter.ngetbinuptime: 2285019
kern.timecounter.ngetnanouptime: 18897431
kern.timecounter.ngetmicrouptime: 133591454
kern.timecounter.ngetbintime: 0
kern.timecounter.ngetnanotime: 70820
kern.timecounter.ngetmicrotime: 602672586
kern.timecounter.nsetclock: 4
kern.timecounter.hardware: ACPI-fast
kern.timecounter.choice: TSC(-100) ACPI-fast(1000) i8254(0) dummy(-100)
kern.timecounter.tick: 1
kern.timecounter.smp_tsc: 0
[hellas] 334 % sysctl hw.acpi
hw.acpi.supported_sleep_state: S1 S3 S4 S5
hw.acpi.power_button_state: S5
hw.acpi.sleep_button_state: S1
hw.acpi.lid_switch_state: NONE
hw.acpi.standby_state: S1
hw.acpi.suspend_state: S3
hw.acpi.sleep_delay: 1
hw.acpi.s4bios: 1
hw.acpi.verbose: 0
hw.acpi.disable_on_reboot: 0
hw.acpi.handle_reboot: 0
hw.acpi.reset_video: 0
hw.acpi.cpu.cx_lowest: C1
hw.acpi.acline: 1
hw.acpi.battery.life: 100
hw.acpi.battery.time: -1
hw.acpi.battery.state: 0
hw.acpi.battery.units: 1
hw.acpi.battery.info_expire: 5
hw.acpi.thermal.min_runtime: 0
hw.acpi.thermal.polling_rate: 10
hw.acpi.thermal.user_override: 0
hw.acpi.thermal.tz0.temperature: 55.5C
hw.acpi.thermal.tz0.active: -1
hw.acpi.thermal.tz0.passive_cooling: 0
hw.acpi.thermal.tz0.thermal_flags: 0
hw.acpi.thermal.tz0._PSV: -1
hw.acpi.thermal.tz0._HOT: -1
hw.acpi.thermal.tz0._CRT: 89.0C
hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
hw.acpi.thermal.tz0._TC1: -1
hw.acpi.thermal.tz0._TC2: -1
hw.acpi.thermal.tz0._TSP: -1
[hellas] 335 %

Now, I make no claim to knowing the significance of all of the above.  My
guess is that the value of kern.timecounter.hardware means that the kernel is
not using the TSC anyway, so the problem you describe above would not be what
is happening here.  Another guess is that kern.timecounter.hardware may be set
that way because 

Re: clock jump in 0.2.0.23-rc, too

2008-04-01 Thread Scott Bennett
 On Mon, 31 Mar 2008 22:49:05 -0700 Lucky Greeen [EMAIL PROTECTED]
wrote:
[much deleted  --SB]

There is a very simple way for you to test if the time jumps you have
been seeing are caused by the TSCs between two CPU cores being out of
sync: simply temporarily disable SMP in your kernel. If the problem goes
away, the TSCs likely are out of sync. If so, you may want to find out
of there is a way to disable FreeBSD's use of the TSC similar to the
notsc option in Linux and see if that addresses the problem.

 Okay.  First off, here is the latest batch of messages.

Mar 31 22:30:27.110 [warn] Your system clock just jumped 101 seconds forward; 
assuming established circuits no longer work.
Apr 01 00:11:46.119 [notice] Tor has successfully opened a circuit. Looks like 
client functionality is working.
Apr 01 03:33:23.582 [warn] Your system clock just jumped 105 seconds forward; 
assuming established circuits no longer work.
Apr 01 03:35:44.206 [warn] Your system clock just jumped 141 seconds forward; 
assuming established circuits no longer work.
Apr 01 03:37:45.574 [warn] Your system clock just jumped 121 seconds forward; 
assuming established circuits no longer work.
Apr 01 03:38:43.885 [notice] Tor has successfully opened a circuit. Looks like 
client functionality is working.
Apr 01 04:15:30.446 [warn] Your system clock just jumped 110 seconds forward; 
assuming established circuits no longer work.
Apr 01 04:16:27.713 [notice] Tor has successfully opened a circuit. Looks like 
client functionality is working.
Apr 01 04:18:59.194 [warn] Your system clock just jumped 148 seconds forward; 
assuming established circuits no longer work.
Apr 01 04:21:19.749 [warn] Your system clock just jumped 140 seconds forward; 
assuming established circuits no longer work.
Apr 01 04:25:01.172 [warn] Your system clock just jumped 127 seconds forward; 
assuming established circuits no longer work.

 Right about this point--I don't know the exact time, but a date command
entered soon after showed Tue Apr  1 04:25:46 CDT 2008--I gave it a

hellas # sysctl machdep.hyperthreading_allowed=0
machdep.hyperthreading_allowed: 1 - 0

However, another message soon arrived:

Apr 01 04:27:34.551 [warn] Your system clock just jumped 153 seconds forward; 
assuming established circuits no longer work.
Apr 01 04:29:33.772 [notice] Tried for 120 seconds to get a connection to 
[scrubbed]:80. Giving up.
Apr 01 04:29:56.323 [notice] Tor has successfully opened a circuit. Looks like 
client functionality is working.

So it would appear that the SMP issue is not operative here.
 Also, I'd like to reiterate that I never saw this behavior before running
0.2.0.21-rc, and now I've seen it with 0.2.0.22-rc and 0.2.0.23-rc.  I've been
using tor on my FreeBSD system since sometime in 2005 and as a server on the
same system since April 2007, well before 0.2.0.21-rc was made available.  I
don't know what is causing it, but it definitely is a problem.  I'll leave
hyperthreading disabled for a while to see what else happens, but eventually
I do want to turn it back on again.  Despite all the bashing by AMD fanatics,
hyperthreading really does get a lot more work done per core in a given time,
provided that work is broken into multiple threads or processes that are safe
to run in parallel.  It will be interesting indeed to see the difference in
throughput on Intel's forthcoming eight-cored chips for hyperthreading enabled
vs. disabled.
 Another tidbit of information is that I checked systat's vmstat display
once again after disabling hyperthreading and found that the total of ~4000
timer interrupts in each two-second interval are no longer split evenly between
cpu0 and cpu1.  Instead, all ~4000 interrupts occur on cpu0 and none at all
occur on cpu1.  Hmmm...I suppose I ought to comment out the NumCPUs 2 line
in torrc until I reenable hyperthreading, but with the machine running at over
99% idle most of the day, it probably makes no real difference anyway.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at cs.niu.edu  *
**
* A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army.   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**


clock jump in 0.2.0.23-rc, too

2008-03-31 Thread Scott Bennett
 It appears that the clock jump problem does persist into 0.2.0.23-rc.
Here are the notice-level log messages since I started up the new version
this afternoon:

Mar 31 15:18:56.468 [notice] Tor 0.2.0.23-rc (r14173) opening log file.
Mar 31 15:18:56.468 [warn] Can't log to stdout with RunAsDaemon set; skipping 
stdout
Mar 31 15:18:56.756 [notice] Your Tor server's identity key fingerprint is 
'MYCROFTsOtherChild 0232 2050 5A55 0D6F DF0C 20FF 7C48 E66B A06A 49A6'
Mar 31 15:19:04.921 [notice] We now have enough directory information to build 
circuits.
Mar 31 15:19:05.311 [warn] Nickname list includes 'yourd0f670b45a' which isn't 
a known router.
Mar 31 15:19:11.443 [notice] Self-testing indicates your ORPort is reachable 
from the outside. Excellent. Publishing server descriptor.
Mar 31 15:19:15.489 [notice] Tor has successfully opened a circuit. Looks like 
client functionality is working.
Mar 31 15:20:09.312 [notice] Self-testing indicates your DirPort is reachable 
from the outside. Excellent.
Mar 31 15:21:25.926 [notice] Performing bandwidth self-test...done.

 [At this point, I SIGHUPped tor to reload torrc with ExcludeNodes abutor
added.]

Mar 31 16:12:18.016 [notice] Received reload signal (hup). Reloading config.
Mar 31 16:12:18.020 [notice] Tor 0.2.0.23-rc (r14173) opening log file.
Mar 31 16:12:18.021 [warn] Can't log to stdout with RunAsDaemon set; skipping 
stdout
Mar 31 16:12:18.021 [warn] Can't log to stdout with RunAsDaemon set; skipping 
stdout
Mar 31 16:42:39.312 [warn] Nickname list includes 'yourd0f670b45a' which isn't 
a known router.
Mar 31 22:30:27.110 [warn] Your system clock just jumped 101 seconds forward; 
assuming established circuits no longer work.

 So I now have one example of the same problem with the new release
candidate that was present in 0.2.0.22-rc.
 Does anyone have any ideas/thoughts/suggestions?  Are any others running
tor under FreeBSD 6.3-STABLE or any other FreeBSD release seeing these
messages?


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at cs.niu.edu  *
**
* A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army.   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**


Re: clock jump in 0.2.0.23-rc, too

2008-03-31 Thread Lucky Greeen
Scott Bennett wrote:
  It appears that the clock jump problem does persist into 0.2.0.23-rc.
 Here are the notice-level log messages since I started up the new version
 this afternoon:
 [remainder eldided]
   
Scott,
If  you are using a dual-core or multiprocessor system, it is likely
that you too are experiencing issues with the TSC being out of syc
between the cores. The TSC counters on physical CPUs can get out of sync
just as CPUs on virtual CPUs can. Physical CPUs just experience this
issue so rarely that the most users will never encounter it. The
likelihood that the TSCs between two cores on the same system get out of
sync appears to be system/motherboard/CPU dependent.

See http://kerneltrap.org/node/14003 on the rationale for adding the
notsc option to the Linux kernel. Also see
http://lkml.org/lkml/2005/11/4/173 for a detailed explanation by an AMD
engieer of why and under which conditions the TSCs might go out of sync.
At least in the past FreeBSD was also impacted by TSCs getting out of
sync, causing the time to jump as the the process migrates between CPUs.
See
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2004-08/1105.html

There is a very simple way for you to test if the time jumps you have
been seeing are caused by the TSCs between two CPU cores being out of
sync: simply temporarily disable SMP in your kernel. If the problem goes
away, the TSCs likely are out of sync. If so, you may want to find out
of there is a way to disable FreeBSD's use of the TSC similar to the
notsc option in Linux and see if that addresses the problem.

--Lucky Green [EMAIL PROTECTED]