I wrote:

I use wget to mirror the contents of a remote directory (containing patches for SuSE Linux, if you want to know the details).

It works quite well, but I can't find an option that makes wget remove files locally that are no longer on the server.

Example: If the file foo-1.2.3-45.rpm is replaced by foo-1.2.3-46.rpm, wget happily downloads the new file, but the old one remains locally.

For now, I have to remove the old files semi-automagically to avoid cramming the disk.

I wrote a workaround outside of wget: a python script that recurses through a directory and removes all local files that are not in '.listing' after wget is done.


This means of course that it works only for ftp URLs and with the option -nr "don't remove listing" (or -m).

If anybody is interested:

The python script is attached at the bottom and is used like in this example:

#### BEGIN mirror-suse-patches.bat ####
REM Mirror the SuSE 9.1 patches to a windoze PC
REM needs python 2.3 from www.python.org
wget -o wget.log -nH -m -X /suse/i386/update/9.1/rpm/src ftp://ftp.suse.com/pub/suse/i386/update/9.1/
python cleanup-wget.py suse >> wget.log
#### END mirror-suse-patches.bat ####


NOTE: USE AT OWN RISK!

The script is pretty crude and most likely doesn't work for you. It assumes that the file names in the listing start at column 56, and it cannot deal with symlinks. I tested it only for the batch file above.

You have been warned.

Regards,

Heiko

#### BEGIN cleanup-wget.py ####
"""
Clean up a directory downloaded with wget.

The problem is that wget doesn't remove local files
if they no longer exist remotely. This leads to a pileup of old files
and a waste of disk space.

This script scans the files '.listing' in each directory and removes all files
(except '.listing') that are not there.


This works only for ftp downloads (otherwise there would be no '.listing').
"""

def fileNameFromLine(line):
   """
   Extract a file name from a line of listing.

   It is assumed that the file name starts at the 56th character.
   This is probably a bit crude, because the listing may be formatted
   in a different way. For the moment, I don't have examples of other
   .listings, so let it be.
   """

   # truncate the first 55 characters
   fileName=line[55:]

   # remove a trailing '\n' (which is a whitespace character)
   fileName=fileName.rstrip()

   return fileName

def parseListing(filename):
   """
   Open a file and extract a filename from each line,
   returning a list of strings.
   """

   listingFile=file(filename,'r')

   returnList=[]

   while True:
       line=listingFile.readline()
       if line=='':
           break
       returnList.append(fileNameFromLine(line))

   listingFile.close()

   return returnList

import os
import sys

def recurse(dir):
   print 'entering',dir
   # get all files in local directory
   localFileList=os.listdir(dir)
   # go through local file list
   for filename in localFileList:
       subdir=dir+os.sep+filename
       # treat all subdirectories
       if os.path.isdir(subdir):
           recurse(subdir)

   # if there is a .listing
   if '.listing' in localFileList:
       # extract a list of remote files from it
       remoteFileList=parseListing(dir+os.sep+'.listing')
       # remove all local files that have no remote counterpart
       for listFile in localFileList:
           if not listFile in remoteFileList:
               if listFile!='.listing':
                   print 'removing',listFile
                   os.remove(dir+os.sep+listFile)

if __name__=='__main__':
   if 2 == len(sys.argv):
       dir=sys.argv[1]
       print
       print 'removing obsolete local files...'
       recurse(dir)
   else:
       print
       print 'usage:',sys.argv[0],'<directory name>'
#### END cleanup-wget.py ####



Reply via email to