Hi Pete, In your first mail about this problem you wrote: There has long been a bug in the getRulebase script using wget which causes the rulebase file that is downloaded to have the local system's timestamp. Under normal circumstances this does not cause a problem because most system clocks are synchronized and the local timestamp is generally newer than the timestamp of the rulebase file on our servers.
What I was getting at: If the rulebase with the old wget software were to get a local timestamp on my server when downloaded, mine would always be "far" into the future from your original as my server is at GMT+1 or +2 during DST. So if your server is at GMT-5 my rulebase would get a timestamp of the original +6 hours. So it would then NOT download another rulebase for the next 6 hours as every new rulebase would still be in it's past. Or.... should wget have compensated for timezones as should curl? Because my rulebase files on my server seem to have a local timestamp. However, this is where we probably get beond my techlevel. Does Windows allways use UTC internally and then calculate the local time when displaying the timestamp for a file? Is that what I'm missing? Because I think I've read that somewhere about problems with timestamps on FAT and NTFS. Met vriendelijke groet, Bonno Bloksma senior systeembeheerder tio hogeschool hospitality en toerisme begijnenhof 8-12 / 5611 el eindhoven t 040 296 28 28 / f 040 237 35 20 [email protected] / www.tio.nl ----- Original Message ----- From: Pete McNeil To: Message Sniffer Community Sent: Thursday, March 12, 2009 3:33 PM Subject: [sniffer] Re: New IMPROVED getRulebase.cmd script Bonno Bloksma wrote: Hi Pete, I get what you said. But: I'm nowhere near your timezone, I'm at GMT+1 or +2. So should there not have been a problem long before where my system would see older files at your system several times a day when in fact there would be a newer one? Does that mean my system has been getting only two or three updates a day where it should have gotten over a dozen? If two systems agree on the time, and then only one of them advances their clock by an hour the two clocks will still be different. Anyway - we've learned more since then (below) I've switched curl so everything should work ok by now. According to my logs I'm getting a new rulebase about every hour. Once per hour is just about right. Pacing is currently set to 55 minutes. --- More that has been learned (technical stuff) and a story (skip if you like, but some might find this interesting): Yesterday while working on this problem and testing on one of our inbound spamtrap processors I noticed that things still weren't quite right. This discovery led me to break a paradigm in my thinking and begin to see another problem (perhaps the key problem). Paradigm: I had been very focused on the one hour time difference, DST, and the obvious coincidence with the "DST storm" -- Our countermeasures at the server and deployment of the new getRulebase script had essentially mitigated the problem... so I was expecting everything to work fine. Having loaded the new getRulebase script on the system I was monitoring it didn't make sense that there was still a problem. Even worse, the telemetry was showing timestamps that were close, but off by a few minutes -- as if the server had picked up the time shifted file instead of the original posting... but that didn't make sense. I wondered if something else was going on and so I loaded up the UTC as a reference: http://www.worldtimeserver.com/current_time_in_UTC.aspx To my wonder and amazement the telemetry I was looking at showed the UTC reference for the ruelbase on the server in the future by one hour! "That can't be right", I said to myself, and then I checked the timestamp again on the delivery server. I rechecked the math and sure enough the timestamp on the delivery server was correct! I hate a mystery. I went to the main SYNC server to see if something had happened to it -- Why would it report the file's timestamp in the future when the timestamp on the file system is correct? We hadn't made any changes to the software. The only thing that had happened was DST. I made my priority getting the reported timestamp correct, and I made the assumption that there might be some obscure DST bug in this version of RedHat or one of the libraries that I would solve later. I began looking for a way to tweak the SYNC server code to adjust the time stamp before reporting it when these conditions were detected... A way to work around the bug. I would fix the bug later. Of course, to do this tweak I would need to find a way to detect the condition so I started to look for ways to do that reliably. I know it's a funny notion -- looking for a reliable way to leverage a system that you have already determined is unreliable... but that is the nature of what we do. Nothing is perfect and a lot of software development for high availability is figuring out how to "stay solid on shifting sands"... but I digress. One of the first things I did was list a directory of the rulebase files from that system... Then I saw something weird that started to break my paradigm. Breaking a paradigm always requires new information in some form ;-) Some of the files listed had times -- and others only had dates! I'd never seen that before. Digging deeper I determined that the ones that had dates had current timestamps at the delivery server and the ones that had times had been pushed back. (Recall that one of the things we did to mitigate this problem was to push the timestamp on a rulebase back by one hour after it had been posted for 5 minutes. This would prevent systems with DST conflicts from seeing the files as perpetually in the future after (at most) one or two downloads). Now the paradigm started to unravel... The timestamps that were seen at the SYNC server were one hour in the future... So the files seen without times (only dates) might be so far in the future that the ls software can't make sense of them (or something like that anyway). Now I was onto a different path. Why would the SYNC server see the timestamps differently. Some technical background (it does matter, bear with me..): In order to deliver rulebase files at high volumes I had decided that the best solution would be a shared RAM drive that could be fed by the rulebase compiler bots and consumed by the delivery servers. No sense putting a rulebase on a physical disc when it would be thrown away minutes later -- let alone thousands of them! At the time I was planning this upgrade it was determined that using a hardware based RAM drive was too experimental for the hosting guys. We could use it-- but they would not support it. I hit upon an alternate solution: tmpfs! Any file system on a linux box can be turned into a RAM drive using tmpfs. It's a fantastic piece of software and it's ubiquitous in linux distros. Trouble is-- NFS cannot export a tmpfs file system -- or at least it couldn't at the time. I haven't checked recently. I discovered that samba / cifs CAN export tmpfs file systems -- SO a new solution was born. We built a system with plenty of RAM, set up a tmpfs file system to hold rulebase files, and exported that to our cluster of servers via samba over our private network. It works beautifully and everything is "off the shelf" so ordinary hosting folks know how to manage it. Here is where that becomes important. There is a bug in samba! (It took some "googling" to find this) Samba apparently calculates the difference between the local clock and utc when it starts up and then NEVER CHECKS IT AGAIN. As a result, if samba is started before DST begins then when DST starts samba will report file timestamps one hour into the future! Presumably the opposite is also true-- If samba is started during DST then when DST ends samba will report file timestamps one hour in the past. We shall see this fall -- or rather, we won't see it because I plan to make sure we restart samba at the close of DST so that it has no impact, of course :-) In case you missed it-- that was the fix. Restarting the samba server software caused it to re-calculate it's time reference and as a result it began reporting the correct timestamp. The SYNC server software got accurate timestamps; the telemetry returned to normal; and everything has been fine since. Best, _M
