wget -N -nr ftp://ftp.eps.gov/*
The ftp site is "FEDBIZOPPS" that contains a set of daily files of business opportunities (solicitations) issued by the U.S. Government. Mostly (routinely) the daily file has a time stamp of somewhere between 12:00 a.m. and 12:10 a.m. It is a text file filled with a concatenation of synopses posted by various government agencies the previous day. Right after the change back to standard time, I noticed that all of the files beginning with March 31 of this year (which, curiously, is one week before the time change) had their time stamps changed to 11:5x of the day that the postings were made, presumably exactly one hour earlier. (but I had not kept a copy of the old dir) Logging onto the ftp site with a gui ftp client shows that the time stamps on all files older than about six months have been set to 00:00, rather than their original time stamp, a practice I think I've seen before. I guess the file time stamp comparison (-N) compares only the dates and not the time of day in this instance.
The old files never change. A new file is added once each day. Another solution would be to have a second switch that orders downloading only files later than a specified date.
Another issue is that WGET deletes the old file and then begins to download the new file, such that if the process is manually aborted when the problem is noted, the old file is already lost (but restored from backup). It would be nice if it kept the old file until the new file is completely downloaded. And even an option to rename the old file with .bak extension or some other protocol.
I'm going to have to research the topic some more. I'm not a programmer except for dabbling in Visual Basic and VBA. I've found WGET to be a marvelous tool for getting the newest file(s) from several ftp sites. Instead of launching a gui ftp client, I just double-click on a batch file to run the check (which I do manually but could run automatically with some sort of scheduler).
I've had this problem (continue to have this problem) in updating archives with ARJ, using the "chapter" archive feature that adds only new/newer files to the archive.
I'll run the tests you describe below tomorrow. I have to run to a meeting.
Thanks for your help.
Fred Holmes
At 05:02 PM 11/4/2003, Hrvoje Niksic wrote:
Fred Holmes <[EMAIL PROTECTED]> writes:
> It appears to be a [default] characteristic of Windows [Win2k] that > on the change from daylight savings time to standard time (and > reverse) Windows changes the indicated time stamp of local files to > reflect the time change, at least for all files that are time > stamped less than one year ago.
It is reasonable for the indication to change. The modification and access time is most likely stored in UTC or another form of offset from the epoch, and the time shift changes that.
But none of this should affect Wget, at least in theory. The file time stamps are read using _stat(), which returns times in UTC. The web server returns GMT. Comparing the two should work regardless of time zones. That it doesn't work might indicate a bug.
Could you check whether the information the web server reports is consistent with what is on the file system? For example, use `-S' and mail us the contents of the `Last-Modified' header. Also paste the time stamp reported by DIR or by the file manager.
It would be really nice if Wget's debugging output reported the value of st_mtime and the return value of http_atotm, but it doesn't. If you have a compiler, you might want to change it to do so.
In case you're wondering what Wget might possibly be doing wrong with the time zones, take a look at this lovely function:
/* Converts struct tm to time_t, assuming the data in tm is UTC rather than local timezone.
mktime is similar but assumes struct tm, also known as the "broken-down" form of time, is in local time zone. mktime_from_utc uses mktime to make the conversion understanding that an offset will be introduced by the local time assumption.
mktime_from_utc then measures the introduced offset by applying gmtime to the initial result and applying mktime to the resulting "broken-down" form. The difference between the two mktime results is the measured offset which is then subtracted from the initial mktime result to yield a calendar time which is the value returned.
tm_isdst in struct tm is set to 0 to force mktime to introduce a consistent offset (the non DST offset) since tm and tm+o might be on opposite sides of a DST change.
Some implementations of mktime return -1 for the nonexistent localtime hour at the beginning of DST. In this event, use mktime(tm - 1hr) + 3600.
Schematically mktime(tm) --> t+o gmtime(t+o) --> tm+o mktime(tm+o) --> t+2o t+o - (t+2o - t+o) = t
Note that glibc contains a function of the same purpose named `timegm' (reverse of gmtime). But obviously, it is not universally available, and unfortunately it is not straightforwardly extractable for use here. Perhaps configure should detect timegm and use it where available.
Contributed by Roger Beeman <[EMAIL PROTECTED]>, with the help of Mark Baushke <[EMAIL PROTECTED]> and the rest of the Gurus at CISCO. Further improved by Roger with assistance from Edward J. Sabol based on input by Jamie Zawinski. */
static time_t mktime_from_utc (struct tm *t) { time_t tl, tb; struct tm *tg;
tl = mktime (t); if (tl == -1) { t->tm_hour--; tl = mktime (t); if (tl == -1) return -1; /* can't deal with output from strptime */ tl += 3600; } tg = gmtime (&tl); tg->tm_isdst = 0; tb = mktime (tg); if (tb == -1) { tg->tm_hour--; tb = mktime (tg); if (tb == -1) return -1; /* can't deal with output from gmtime */ tb += 3600; } return (tl - (tb - tl)); }
> P.S. I wonder what happens when someone takes a notebook into a > different time zone and changes the time zone that the notebook OS > reflects?
I assume that the time on the files would appear to change, but that's a feature because the files really were changed at a different time, in the local time zones.
