Re: Efficient multi-slicing technique?

Tim Chase Sun, 25 Jan 2009 16:48:11 -0800

Is there an efficient way to multi-slice a fixed with string
into individual fields that's logically equivalent to the way
one would slice a delimited string using .split()? Background:
I'm parsing some very large, fixed line-width text files that
have weekly columns of data (52 data columns plus related

data). My current strategy is to loop through a list ofslice()'s to build a list of the specific field values for

each line. This is fine for small files, but seems
inefficient. I'm hoping that there's a built-in (C based)

I'm not sure if it's more efficient, but there's the structmodule[1]:


  from struct import unpack
  for line in file('sample.txt'):
    (num, a, b, c, nl) = unpack("2s9s7s4sc", line)
    print "num:", repr(num)
    print "a:", repr(a)
    print "b:", repr(b)
    print "c:", repr(c)

Adjust the formatting string for your data (the last "c" is thenewline character -- you might be able to use "x" here to justignore the byte so it doesn't get returned). The sample data Ithrew was 2/9/7/4 character data. The general pattern would be


  lengths = [3,18,24,5,1,8]
  FORMAT_STR = (
    ''.join("%ss" % length for length in lengths) +
    'c')
  for line in file(INFILE):
    (f1, f2,..., fn, _) = unpack(FORMAT_STR, line)


-tkc

[1]
http://docs.python.org/library/struct.html







--
http://mail.python.org/mailman/listinfo/python-list

Re: Efficient multi-slicing technique?

Reply via email to