Re: question on using tarfile to read a *.tar.gzip file

Tim Chase Sun, 07 Feb 2010 15:04:09 -0800

Is there a way to do this, without decompressing each file to a temp
dir?  Like is there a method using some tarfile interface adapter to
read a compressed file?  Otherwise I'll just access each file, extract
it,  grab the 1st and last lines and then delete the temp file.

I think you're looking for the extractfile() method of theTarFile object:


  from glob import glob
  from tarfile import TarFile
  for fname in glob('*.tgz'):
    print fname
    tf = TarFile.gzopen(fname)
    for ti in tf:
      print ' %s' % ti.name
      f = tf.extractfile(ti)
      if not f: continue
      fi = iter(f) # f doesn't natively support next()
      first_line = fi.next()
      for line in fi: pass
      f.close()
      print "  First line: %r" % first_line
      print "  Last line: %r" % line
    tf.close()

If you just want the first & last lines, it's a little morecomplex if you don't want to scan the entire file (like I do withthe for-loop), but the file-like object returned by extractfile()is documented as supporting seek() so you can skip to the end andthen read backwards until you have sufficient lines. I wrote a"get the last line of a large file using seeks from the EOF"function which you can find at [1] which should handle the oddedge cases of $BUFFER_SIZE containing more or less than a fullline and then reading backwards in chunks (if needed) until youhave one full line, handling a one-line file, and otherodd/annoying edge-cases. Hope it helps.


-tkc

[1]
http://mail.python.org/pipermail/python-list/2009-January/1186176.html


--
http://mail.python.org/mailman/listinfo/python-list

Re: question on using tarfile to read a *.tar.gzip file

Reply via email to