On Tue, Mar 29, 2011 at 10:14 PM, Jay Hacker <[email protected]> wrote: > On Tue, Mar 29, 2011 at 11:20 AM, Hans Schou <[email protected]> wrote: >> On Tue, 29 Mar 2011, Jay Hacker wrote: >> >>> I have a large gzipped tar archive containing many small files; just >>> untarring it takes a lot of time and space. I'd like to be able to process >>> each file in the archive, ideally without untarring the whole thing first, : >> tar xvf big-file.tar.gz | parallel echo "Proc this file {}" >> >> Parallel will start when the first file is untared. : > That is a great idea. However, can I be sure the file is completely > written to disk before tar prints the filename?
While I loved Hans' idea, it does indeed have a race condition. This should run 'ls -l' on each file after decompressing and clearly fails now and then: $ tar xvf ../i.tgz | parallel ls -l > ls-l ls: cannot access 1792: No such file or directory ls: cannot access 209: No such file or directory ls: cannot access 21: No such file or directory ls: cannot access 2256: No such file or directory ls: cannot access 2349: No such file or directory ls: cannot access 2363: No such file or directory ls: cannot access 246: No such file or directory ls: cannot access 2712: No such file or directory But you could unpack in a new dir and use: http://www.gnu.org/software/parallel/man.html#example__gnu_parallel_as_dir_processor That seems to work. /Ole
