Is there an efficient way to multi-slice a fixed with string
into individual fields that's logically equivalent to the way
one would slice a delimited string using .split()? Background:
I'm parsing some very large, fixed line-width text files that
have weekly columns of data (52 data columns plus related
data). My current strategy is to loop through a list of slice()'s to build a list of the specific field values for
each line. This is fine for small files, but seems
inefficient. I'm hoping that there's a built-in (C based)

I'm not sure if it's more efficient, but there's the struct module[1]:

  from struct import unpack
  for line in file('sample.txt'):
    (num, a, b, c, nl) = unpack("2s9s7s4sc", line)
    print "num:", repr(num)
    print "a:", repr(a)
    print "b:", repr(b)
    print "c:", repr(c)

Adjust the formatting string for your data (the last "c" is the newline character -- you might be able to use "x" here to just ignore the byte so it doesn't get returned). The sample data I threw was 2/9/7/4 character data. The general pattern would be

  lengths = [3,18,24,5,1,8]
  FORMAT_STR = (
    ''.join("%ss" % length for length in lengths) +
    'c')
  for line in file(INFILE):
    (f1, f2,..., fn, _) = unpack(FORMAT_STR, line)


-tkc

[1]
http://docs.python.org/library/struct.html







--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to