Hi Pete,

In your first mail about this problem you wrote:
There has long been a bug in the getRulebase script using wget which 
causes the rulebase file that is downloaded to have the local system's 
timestamp. Under normal circumstances this does not cause a problem 
because most system clocks are synchronized and the local timestamp is 
generally newer than the timestamp of the rulebase file on our servers.

What I was getting at:
If the rulebase with the old wget software were to get a local timestamp on my 
server when downloaded, mine would always be "far" into the future from your 
original as my server is at GMT+1 or +2 during DST.
So if your server is at GMT-5 my rulebase would get a timestamp of the original 
+6 hours. So it would then NOT download another rulebase for the next 6 hours 
as every new rulebase would still be in it's past.

Or.... should wget have compensated for timezones as should curl? Because my 
rulebase files on my server seem to have a local timestamp.
However, this is where we probably get beond my techlevel.
Does Windows allways use UTC internally and then calculate the local time when 
displaying the timestamp for a file?
Is that what I'm missing? Because I think I've read that somewhere about 
problems with timestamps on FAT and NTFS.

Met vriendelijke groet,
Bonno Bloksma
senior systeembeheerder

tio 

hogeschool hospitality en toerisme 
begijnenhof 8-12 / 5611 el eindhoven
t 040 296 28 28 / f 040 237 35 20

b.blok...@tio.nl  / www.tio.nl 


----- Original Message ----- 
  From: Pete McNeil 
  To: Message Sniffer Community 
  Sent: Thursday, March 12, 2009 3:33 PM
  Subject: [sniffer] Re: New IMPROVED getRulebase.cmd script


  Bonno Bloksma wrote: 
    Hi Pete,

    I get what you said. But:
     I'm nowhere near your timezone, I'm at GMT+1 or +2. So should there not 
have been a problem long before where my system would see older files at your 
system several times a day when in fact there would be a newer one?
    Does that mean my system has been getting only two or three updates a day 
where it should have gotten over a dozen?

  If two systems agree on the time, and then only one of them advances their 
clock by an hour the two clocks will still be different. Anyway - we've learned 
more since then (below)



    I've switched curl so everything should work ok by now. According to my 
logs I'm getting a new rulebase about every hour.

  Once per hour is just about right. 
  Pacing is currently set to 55 minutes.

  ---

  More that has been learned (technical stuff) and a story (skip if you like, 
but some might find this interesting):

  Yesterday while working on this problem and testing on one of our inbound 
spamtrap processors I noticed that things still weren't quite right. This 
discovery led me to break a paradigm in my thinking and begin to see another 
problem (perhaps the key problem). 

  Paradigm: I had been very focused on the one hour time difference, DST, and 
the obvious coincidence with the "DST storm" -- Our countermeasures at the 
server and deployment of the new getRulebase script had essentially mitigated 
the problem... so I was expecting everything to work fine.

  Having loaded the new getRulebase script on the system I was monitoring it 
didn't make sense that there was still a problem. Even worse, the telemetry was 
showing timestamps that were close, but off by a few minutes -- as if the 
server had picked up the time shifted file instead of the original posting... 
but that didn't make sense. I wondered if something else was going on and so I 
loaded up the UTC as a reference:

  http://www.worldtimeserver.com/current_time_in_UTC.aspx

  To my wonder and amazement the telemetry I was looking at showed the UTC 
reference for the ruelbase on the server in the future by one hour! "That can't 
be right", I said to myself, and then I checked the timestamp again on the 
delivery server. I rechecked the math and sure enough the timestamp on the 
delivery server was correct! I hate a mystery.

  I went to the main SYNC server to see if something had happened to it -- Why 
would it report the file's timestamp in the future when the timestamp on the 
file system is correct? We hadn't made any changes to the software. The only 
thing that had happened was DST.

  I made my priority getting the reported timestamp correct, and I made the 
assumption that there might be some obscure DST bug in this version of RedHat 
or one of the libraries that I would solve later. I began looking for a way to 
tweak the SYNC server code to adjust the time stamp before reporting it when 
these conditions were detected... A way to work around the bug. I would fix the 
bug later.

  Of course, to do this tweak I would need to find a way to detect the 
condition so I started to look for ways to do that reliably. I know it's a 
funny notion -- looking for a reliable way to leverage a system that you have 
already determined is unreliable... but that is the nature of what we do. 
Nothing is perfect and a lot of software development for high availability is 
figuring out how to "stay solid on shifting sands"... but I digress.

  One of the first things I did was list a directory of the rulebase files from 
that system... Then I saw something weird that started to break my paradigm. 
Breaking a paradigm always requires new information in some form ;-)

  Some of the files listed had times -- and others only had dates! I'd never 
seen that before. Digging deeper I determined that the ones that had dates had 
current timestamps at the delivery server and the ones that had times had been 
pushed back. (Recall that one of the things we did to mitigate this problem was 
to push the timestamp on a rulebase back by one hour after it had been posted 
for 5 minutes. This would prevent systems with DST conflicts from seeing the 
files as perpetually in the future after (at most) one or two downloads).

  Now the paradigm started to unravel... The timestamps that were seen at the 
SYNC server were one hour in the future... So the files seen without times 
(only dates) might be so far in the future that the ls software can't make 
sense of them (or something like that anyway).

  Now I was onto a different path. Why would the SYNC server see the timestamps 
differently.

  Some technical background (it does matter, bear with me..):

  In order to deliver rulebase files at high volumes I had decided that the 
best solution would be a shared RAM drive that could be fed by the rulebase 
compiler bots and consumed by the delivery servers. No sense putting a rulebase 
on a physical disc when it would be thrown away minutes later -- let alone 
thousands of them!

  At the time I was planning this upgrade it was determined that using a 
hardware based RAM drive was too experimental for the hosting guys. We could 
use it-- but they would not support it. I hit upon an alternate solution: tmpfs!

  Any file system on a linux box can be turned into a RAM drive using tmpfs. 
It's a fantastic piece of software and it's ubiquitous in linux distros.

  Trouble is-- NFS cannot export a tmpfs file system -- or at least it couldn't 
at the time. I haven't checked recently.

  I discovered that samba / cifs CAN export tmpfs file systems -- SO a new 
solution was born. We built a system with plenty of RAM, set up a tmpfs file 
system to hold rulebase files, and exported that to our cluster of servers via 
samba over our private network.

  It works beautifully and everything is "off the shelf" so ordinary hosting 
folks know how to manage it.

  Here is where that becomes important.

  There is a bug in samba! (It took some "googling" to find this)

  Samba apparently calculates the difference between the local clock and utc 
when it starts up and then NEVER CHECKS IT AGAIN. As a result, if samba is 
started before DST begins then when DST starts samba will report file 
timestamps one hour into the future! 

  Presumably the opposite is also true-- If samba is started during DST then 
when DST ends samba will report file timestamps one hour in the past. We shall 
see this fall -- or rather, we won't see it because I plan to make sure we 
restart samba at the close of DST so that it has no impact, of course :-)

  In case you missed it-- that was the fix. Restarting the samba server 
software caused it to re-calculate it's time reference and as a result it began 
reporting the correct timestamp. The SYNC server software got accurate 
timestamps; the telemetry returned to normal; and everything has been fine 
since.

  Best,

  _M

Reply via email to