Is there a way to do this, without decompressing each file to a temp
dir?  Like is there a method using some tarfile interface adapter to
read a compressed file?  Otherwise I'll just access each file, extract
it,  grab the 1st and last lines and then delete the temp file.

I think you're looking for the extractfile() method of the TarFile object:

  from glob import glob
  from tarfile import TarFile
  for fname in glob('*.tgz'):
    print fname
    tf = TarFile.gzopen(fname)
    for ti in tf:
      print ' %s' % ti.name
      f = tf.extractfile(ti)
      if not f: continue
      fi = iter(f) # f doesn't natively support next()
      first_line = fi.next()
      for line in fi: pass
      f.close()
      print "  First line: %r" % first_line
      print "  Last line: %r" % line
    tf.close()

If you just want the first & last lines, it's a little more complex if you don't want to scan the entire file (like I do with the for-loop), but the file-like object returned by extractfile() is documented as supporting seek() so you can skip to the end and then read backwards until you have sufficient lines. I wrote a "get the last line of a large file using seeks from the EOF" function which you can find at [1] which should handle the odd edge cases of $BUFFER_SIZE containing more or less than a full line and then reading backwards in chunks (if needed) until you have one full line, handling a one-line file, and other odd/annoying edge-cases. Hope it helps.

-tkc

[1]
http://mail.python.org/pipermail/python-list/2009-January/1186176.html


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to