[sniffer] Re: New IMPROVED getRulebase.cmd script
Bonno Bloksma wrote: Hi Pete, In your first mail about this problem you wrote: /There has long been a bug in the getRulebase script using wget which causes the rulebase file that is downloaded to have the local system's timestamp. Under normal circumstances this does not cause a problem because most system clocks are synchronized and the local timestamp is generally newer than the timestamp of the rulebase file on our servers. / What I was getting at: If the rulebase with the old wget software were to get a local timestamp on my server when downloaded, mine would always be "far" into the future from your original as my server is at GMT+1 or +2 during DST. So if your server is at GMT-5 my rulebase would get a timestamp of the original +6 hours. So it would then NOT download another rulebase for the next 6 hours as every new rulebase would still be in it's past. When comparing timestamps of files across systems the apparent UTC timestamp is used (generally). When displaying timestamps of files the underlying UTC timestamp is converted to the local time (generally). The first problem with using the local timestamp when downloading a rulebase file is that the timestamp on the local file will be later than the timestamp of the file on the server. This can delay your updates (even when the system clocks are synchronized and working correctly). Consider: New rulebase is created at UTC 0100 Download occurs near 0130 and rulebase is given local timestamp of 0131. New rulebase is created at UTC 0130 Download does not occur because local timestamp is later than server timestamp. That scenario is clearly going to be a rare occurrence when system clocks are aligned and download processing is quick... but if a clock is not synchronized and the time on the local server is shifted in some way the window of opportunity for missed updates increases. I realize this is splitting hairs a bit-- After all a missed rulebase update will only increase leakage a bit and a later update will fix that soon enough. None the less, it is more correct to stamp the downloaded file with it's original time and thereby eliminate any possibility of problems associated with system clock differences or download processing time. That is the best solution. Or should wget have compensated for timezones as should curl? Because my rulebase files on my server seem to have a local timestamp. However, this is where we probably get beond my techlevel. Does Windows allways use UTC internally and then calculate the local time when displaying the timestamp for a file? Is that what I'm missing? Because I think I've read that somewhere about problems with timestamps on FAT and NTFS. Is this what you mean? http://support.microsoft.com/kb/127830 I believe that windows uses UTC internally and checks that against the system's RTC every hour or so to see if the time is accurate. What you see when you look at a timestamp is converted to local time based on your settings. _M
[sniffer] Re: New IMPROVED getRulebase.cmd script
Stefan Paege wrote: Pete, to make your long (and interesting) story short: There is no need to use the "updated improved Script"! We can continue to use the "old" WGet/NonCURL script! Correct? Almost, but not quite. The new script is still better and should be used instead of the old script. Otherwise time differences on the local system can cause problems. The problems are likely to be less severe than what we've seen recently though. Please do updgrade to the new script. Use of the old script is deprecated. _M # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to To switch to the INDEX mode, E-mail to Send administrative queries to
[sniffer] Re: New IMPROVED getRulebase.cmd script
Hi Pete, In your first mail about this problem you wrote: There has long been a bug in the getRulebase script using wget which causes the rulebase file that is downloaded to have the local system's timestamp. Under normal circumstances this does not cause a problem because most system clocks are synchronized and the local timestamp is generally newer than the timestamp of the rulebase file on our servers. What I was getting at: If the rulebase with the old wget software were to get a local timestamp on my server when downloaded, mine would always be "far" into the future from your original as my server is at GMT+1 or +2 during DST. So if your server is at GMT-5 my rulebase would get a timestamp of the original +6 hours. So it would then NOT download another rulebase for the next 6 hours as every new rulebase would still be in it's past. Or should wget have compensated for timezones as should curl? Because my rulebase files on my server seem to have a local timestamp. However, this is where we probably get beond my techlevel. Does Windows allways use UTC internally and then calculate the local time when displaying the timestamp for a file? Is that what I'm missing? Because I think I've read that somewhere about problems with timestamps on FAT and NTFS. Met vriendelijke groet, Bonno Bloksma senior systeembeheerder tio hogeschool hospitality en toerisme begijnenhof 8-12 / 5611 el eindhoven t 040 296 28 28 / f 040 237 35 20 b.blok...@tio.nl / www.tio.nl - Original Message - From: Pete McNeil To: Message Sniffer Community Sent: Thursday, March 12, 2009 3:33 PM Subject: [sniffer] Re: New IMPROVED getRulebase.cmd script Bonno Bloksma wrote: Hi Pete, I get what you said. But: I'm nowhere near your timezone, I'm at GMT+1 or +2. So should there not have been a problem long before where my system would see older files at your system several times a day when in fact there would be a newer one? Does that mean my system has been getting only two or three updates a day where it should have gotten over a dozen? If two systems agree on the time, and then only one of them advances their clock by an hour the two clocks will still be different. Anyway - we've learned more since then (below) I've switched curl so everything should work ok by now. According to my logs I'm getting a new rulebase about every hour. Once per hour is just about right. Pacing is currently set to 55 minutes. --- More that has been learned (technical stuff) and a story (skip if you like, but some might find this interesting): Yesterday while working on this problem and testing on one of our inbound spamtrap processors I noticed that things still weren't quite right. This discovery led me to break a paradigm in my thinking and begin to see another problem (perhaps the key problem). Paradigm: I had been very focused on the one hour time difference, DST, and the obvious coincidence with the "DST storm" -- Our countermeasures at the server and deployment of the new getRulebase script had essentially mitigated the problem... so I was expecting everything to work fine. Having loaded the new getRulebase script on the system I was monitoring it didn't make sense that there was still a problem. Even worse, the telemetry was showing timestamps that were close, but off by a few minutes -- as if the server had picked up the time shifted file instead of the original posting... but that didn't make sense. I wondered if something else was going on and so I loaded up the UTC as a reference: http://www.worldtimeserver.com/current_time_in_UTC.aspx To my wonder and amazement the telemetry I was looking at showed the UTC reference for the ruelbase on the server in the future by one hour! "That can't be right", I said to myself, and then I checked the timestamp again on the delivery server. I rechecked the math and sure enough the timestamp on the delivery server was correct! I hate a mystery. I went to the main SYNC server to see if something had happened to it -- Why would it report the file's timestamp in the future when the timestamp on the file system is correct? We hadn't made any changes to the software. The only thing that had happened was DST. I made my priority getting the reported timestamp correct, and I made the assumption that there might be some obscure DST bug in this version of RedHat or one of the libraries that I would solve later. I began looking for a way to tweak the SYNC server code to adjust the time stamp before reporting it when these conditions were detected... A way to work around the bug. I would fix the bug later. Of course, to do this tweak I would need to find a way to detect the condition so I started to look for ways to do that reliably. I know it's a funny notion -- looking for a reliable way to leverage a system that you have already determined is unreliable... but that is the nature of wha
[sniffer] Re: New IMPROVED getRulebase.cmd script
Pete, to make your long (and interesting) story short: There is no need to use the "updated improved Script"! We can continue to use the "old" WGet/NonCURL script! Correct? -- elektronik-labor CARLS GmbH & Co. KG Stefan Paege Fon: +49 5973 9497-23 Fax: +49 5973 9497-19 elektronik-labor CARLS GmbH & Co. KG Kommanditgesellschaft: Sitz Neuenkirchen, Registergericht Steinfurt HRA 3310 Persönlich haftende Gesellschafterin: elektronik-labor CARLS, Beteiligungsgesellschaft mbH, Sitz Neuenkirchen, Registergericht Steinfurt HRB 4175 Geschäftsführer: Irmgard Carls, Joachim Schulte # This message is sent to you because you are subscribed to the mailing list . To unsubscribe, E-mail to: To switch to the DIGEST mode, E-mail to To switch to the INDEX mode, E-mail to Send administrative queries to
[sniffer] Re: New IMPROVED getRulebase.cmd script
Bonno Bloksma wrote: Hi Pete, I get what you said. But: / I'm nowhere near your timezone, I'm at GMT+1 or +2. So should there not have been a problem long before where my system would see older files at your system several times a day when in fact there would be a newer one?/ /*Does that mean my system has been getting only two or three updates a day where it should have gotten over a dozen?*/ If two systems agree on the time, and then only one of them advances their clock by an hour the two clocks will still be different. Anyway - we've learned more since then (below) I've switched curl so everything should work ok by now. According to my logs I'm getting a new rulebase about every hour. Once per hour is just about right. Pacing is currently set to 55 minutes. --- More that has been learned (technical stuff) and a story (skip if you like, but some might find this interesting): Yesterday while working on this problem and testing on one of our inbound spamtrap processors I noticed that things still weren't quite right. This discovery led me to break a paradigm in my thinking and begin to see another problem (perhaps the key problem). Paradigm: I had been very focused on the one hour time difference, DST, and the obvious coincidence with the "DST storm" -- Our countermeasures at the server and deployment of the new getRulebase script had essentially mitigated the problem... so I was expecting everything to work fine. Having loaded the new getRulebase script on the system I was monitoring it didn't make sense that there was still a problem. Even worse, the telemetry was showing timestamps that were close, but off by a few minutes -- as if the server had picked up the time shifted file instead of the original posting... but that didn't make sense. I wondered if something else was going on and so I loaded up the UTC as a reference: http://www.worldtimeserver.com/current_time_in_UTC.aspx To my wonder and amazement the telemetry I was looking at showed the UTC reference for the ruelbase on the server in the future by one hour! "That can't be right", I said to myself, and then I checked the timestamp again on the delivery server. I rechecked the math and sure enough the timestamp on the delivery server was correct! I hate a mystery. I went to the main SYNC server to see if something had happened to it -- Why would it report the file's timestamp in the future when the timestamp on the file system is correct? We hadn't made any changes to the software. The only thing that had happened was DST. I made my priority getting the reported timestamp correct, and I made the assumption that there might be some obscure DST bug in this version of RedHat or one of the libraries that I would solve later. I began looking for a way to tweak the SYNC server code to adjust the time stamp before reporting it when these conditions were detected... A way to work around the bug. I would fix the bug later. Of course, to do this tweak I would need to find a way to detect the condition so I started to look for ways to do that reliably. I know it's a funny notion -- looking for a reliable way to leverage a system that you have already determined is unreliable... but that is the nature of what we do. Nothing is perfect and a lot of software development for high availability is figuring out how to "stay solid on shifting sands"... but I digress. One of the first things I did was list a directory of the rulebase files from that system... Then I saw something weird that started to break my paradigm. Breaking a paradigm always requires new information in some form ;-) Some of the files listed had times -- and others only had dates! I'd never seen that before. Digging deeper I determined that the ones that had dates had current timestamps at the delivery server and the ones that had times had been pushed back. (Recall that one of the things we did to mitigate this problem was to push the timestamp on a rulebase back by one hour after it had been posted for 5 minutes. This would prevent systems with DST conflicts from seeing the files as perpetually in the future after (at most) one or two downloads). Now the paradigm started to unravel... The timestamps that were seen at the SYNC server were one hour in the future... So the files seen without times (only dates) might be so far in the future that the ls software can't make sense of them (or something like that anyway). Now I was onto a different path. Why would the SYNC server see the timestamps differently. Some technical background (it does matter, bear with me..): In order to deliver rulebase files at high volumes I had decided that the best solution would be a shared RAM drive that could be fed by the rulebase compiler bots and consumed by the delivery servers. No sense putting a rulebase on a physical disc when it would be thrown away minutes later -- let alone thousands of them! At the time I wa
[sniffer] Re: New IMPROVED getRulebase.cmd script
Hi Pete, I get what you said. But: I'm nowhere near your timezone, I'm at GMT+1 or +2. So should there not have been a problem long before where my system would see older files at your system several times a day when in fact there would be a newer one? Does that mean my system has been getting only two or three updates a day where it should have gotten over a dozen? I've switched curl so everything should work ok by now. According to my logs I'm getting a new rulebase about every hour. Met vriendelijke groet, Bonno Bloksma senior systeembeheerder tio hogeschool hospitality en toerisme begijnenhof 8-12 / 5611 el eindhoven t 040 296 28 28 / f 040 237 35 20 b.blok...@tio.nl / www.tio.nl - Original Message - From: Pete McNeil To: Message Sniffer Community Sent: Wednesday, March 11, 2009 1:57 PM Subject: [sniffer] Re: New IMPROVED getRulebase.cmd script Bonno Bloksma wrote: Why does this problem start just now with a DST shift somewhere? I'n nowhere near your timezone (GMT+1 or +2) so should there not have been a problem long before where my system would see older files at your system several times a day when in fact there would be a newer one? Does that mean my system has been getting only two or three updates a day where it should have gotten over a dozen? Unfortunately I disabled logging a while ago when everything seemed to run smoothly. :-( Someone to your west would have seen a new rulebase every time they checked no matter what DST. Or is it just that you finally noticed it due to the DST shift? The reason DST is an issue is because the previous wget based script stamps the downloaded rulebase with the local clock instead of the timestamp that came with the file from the delivery server. As a result the timestamps might not agree. The recent change in the start of DST in the US is not reflected everywhere AND some locations use different DST start dates. The result of this is that when using the old script the local timestamp created using the local clock is likely to be behind the delivery server's timestamp by an hour. The new update-script mechanism in SNFServer compares the local file's timestamp to the timestamp reported by the delivery server once every minute. When the local timestamp is used and the local time is behind the clock on the delivery server then the freshly downloaded rulebase file _appears_ to be an hour old and this does not change no matter how many times the file is downloaded. Before DST the local clock and the delivery server's clock would generally agree and so there was no problem. Hope this helps, _M