On Tue, Jul 28, 2009 at 7:19 AM, Anamika Jindal<anamika.jin...@tcs.com> wrote: > Hi, > > We have an open audit issue regarding the files that are pulled from > external interfaces. We download these files using wget utility. wget > commands are being called from Pro*C batches e.g. for reference, code is > something like > << sprintf (WGET, "%s%s%s/%s.%s", "wget -P ",FEEDFILE_PATH," > ftp://username:passw...@host", FileName, "Z");>> > > Now, the audit issue is to ensure the data integrity and data completeness > for the file that has been downloaded using wget. > Option 1-> Recommended option is ofcourse checksum approach, in which we > can get the checksum (any checksum e.g. MD5, SH1)of the file on remote > server. After that, we can get the checksum of file on local server(just > downloaded using wget). Then we can compare checksum to ensure the file > has been successfully(and completely) downloaded. I checked on google/wget > manual. wget does not provide any option to get the checksum but there > were functions like gnu_md5.c, don't know why these are used.. > > Option 2 -> is to check the File size on remote FTP server. After > retrieving the file (using wget), our application can compare this file > size with the file size of retrieved file. If file size does not match, > error will be raised. Now wget does not provide any direct option for > getting the file size. But it gives that information in the output message > > *snip* > > Now, my requirement is very simple. To ensure the data > completeness/integrity. Can somebody please suggest which options I should > use or I can use?? My first preference is to compare checksum.
hi Anamika, as you know, file size has nothing to do with integrity or matching checksums, except that you know if the file size is different then the checksums can't match... the easiest solution if you're in control of the server would probably be to use the Content-MD5 header and a download program that supports it. I don't know if wget does; probably not. another (biased) solution is to use metalinks, which are XML files which lists mirrors, checksums, & signatures. metalink clients (wget does not support it yet) are numerous, & there are GUI and lightweight command line clients like metalink-checker (python), mulk (libcurl based) and aria2. here's an example metalink: <?xml version="1.0" encoding="UTF-8"?> <metalink version="3.0" xmlns="http://www.metalinker.org"> <files> <file name="example.ext"> <verification> <hash type="md5">example-md5-hash</hash> <hash type="sha1">example-sha1-hash</hash> </verification> <resources> <url type="ftp" location="uk" preference="90">ftp://ftp.example.net/example.ext</url> <url type="http" location="us" preference="90">http://example.com/example.ext</url> </resources> </file> </files> </metalink> more info at http://en.wikipedia.org/wiki/Metalink -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads