Hmmm use tar-t to extract the filenames pipe that into parallel to call tar again to extract just that file and pipe it to some other command
tar -t big-file.tar.gz | parallel tar -f big-file.tar.gz - '|' someCommandThatReadsFromStdIn Malcolm Cook Stowers Institute for Medical Research - Bioinformatics Kansas City, Missouri USA > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf > Of Ole Tange > Sent: Tuesday, March 29, 2011 4:14 PM > To: Jay Hacker > Cc: [email protected] > Subject: Re: Processing files from a tar archive in parallel > > On Tue, Mar 29, 2011 at 10:14 PM, Jay Hacker > <[email protected]> wrote: > > On Tue, Mar 29, 2011 at 11:20 AM, Hans Schou <[email protected]> wrote: > >> On Tue, 29 Mar 2011, Jay Hacker wrote: > >> > >>> I have a large gzipped tar archive containing many small > files; just > >>> untarring it takes a lot of time and space. I'd like to > be able to > >>> process each file in the archive, ideally without untarring the > >>> whole thing first, > : > >> tar xvf big-file.tar.gz | parallel echo "Proc this file {}" > >> > >> Parallel will start when the first file is untared. > : > > That is a great idea. However, can I be sure the file is > completely > > written to disk before tar prints the filename? > > While I loved Hans' idea, it does indeed have a race > condition. This should run 'ls -l' on each file after > decompressing and clearly fails now and then: > > $ tar xvf ../i.tgz | parallel ls -l > ls-l > ls: cannot access 1792: No such file or directory > ls: cannot access 209: No such file or directory > ls: cannot access 21: No such file or directory > ls: cannot access 2256: No such file or directory > ls: cannot access 2349: No such file or directory > ls: cannot access 2363: No such file or directory > ls: cannot access 246: No such file or directory > ls: cannot access 2712: No such file or directory > > But you could unpack in a new dir and use: > http://www.gnu.org/software/parallel/man.html#example__gnu_par > allel_as_dir_processor > > That seems to work. > > /Ole > >
