[Peterboro] Bash Script help

Mark Rogers Sun, 23 Sep 2007 07:48:19 -0700

I have a simple Windows batch file doing most of what I want (the data'son a Windows PC) but I want to learn more about Bash scripting so I wantto try the same thing in Linux (and then add a few bits to it).

The basic idea is that I have a lot (1000s) of .zip files that I need toextract, then recompress in one of a number of ways depending on thecontent. (At present they all get recompressed as Zip files, just thefile extensions vary). I need to retain file date/time information fromthe original .zip in the process. The content that I'm looking for isjust the presence of certain files (eg a mimetype file in the root ofthe .zip file usually means that the file is actually an OpenOffice .odtfile, so I rename it accordingly).

At present, in Windows, I use 7-Zip's commandline tool to extract thefiles and recompress them, but it is losing the date/time informationfrom directories so that's causing me a problem. Note that the .zipfiles have a lot of rubbish data tagged on the end of them, andInfozip's unzip doesn't seem to like them as a result. That's the reasonI need to recompress them all (to lose the garbage) not just rename them.

Background: The files result from a data recovery attempt from a corruptNTFS partition, so they've been located based on their file signatures.The resulting files are all 10M in size as the simple data recoveryalgorithm can't detect the file sizes. In many cases this means the real.zip file is probably only 1% of the total file with the rest beinggarbage. It also means that file types that "look like" zip files getrecovered as .zip files, such as OpenOffice docs, Firefox plugins, etc.After the files are extracted I can often determine the correct filetypeto use after recompression.

One other side effect of this recovery process is that a lot of thefiles actually contain duplicate data, so once the garbage has beenremoved I can check for duplicate files and dispose of all the copies.The .zip files store the file dats and times, though, so if theoriginals are lost in the process I end up with .zip files which onlydiffer in the dates, but that's enough that a hash of the .zip file usedto detect duplicates will find them to be different. A potentialsolution to this is to just "touch" all the directories to a specificdate, which is better than the status quo when it comes to detectingduplicates, but still not as good as retaining the original timestamps).


Mark

_______________________________________________
Peterboro mailing list
[email protected]
https://mailman.lug.org.uk/mailman/listinfo/peterboro

[Peterboro] Bash Script help

Reply via email to