Re: text file parsing (awk -> python)

bearophileHUGS Wed, 22 Nov 2006 10:06:05 -0800

Peter Otten, your solution is very nice, it uses groupby splitting on
empty lines, so it doesn't need to read the whole files into memory.


But Daniel Nogradi says:
> But the names of the fields (node, x, y) keeps changing from file to
> file, even their number is not fixed, sometimes it is (node, x, y, z).

Your version with the converters dict fails to convert the number of
node, z fields, etc. (generally using such converters dict is an
elegant solution, it allows to define string, float, etc fields):

> converters = dict(
>     x=int,
>     y=int
> )


I have created a version with a RE, but it's probably too much rigid,
it doesn't handle files with the z field, etc:

data = """node 10
y 1
x -1

node 11
x -2
y 1
z 5

node 12
x -3
y 1
z 6"""

import re
unpack = re.compile(r"(\D+)   \s+  ([-+]?  \d+) \s+" * 3, re.VERBOSE)

result = []
for obj in unpack.finditer(data):
    block = obj.groups()
    d = dict((block[i], int(block[i+1])) for i in xrange(0, 6, 2))
    result.append(d)

print result


So I have just modified and simplified your quite nice solution (I have
removed the pprint, but it's the same):

def open(filename):
    from cStringIO import StringIO
    return StringIO(data)

from itertools import groupby

records = []
for empty, record in groupby(open("records.txt"), key=str.isspace):
    if not empty:
        pairs = ([k, int(v)] for k,v in map(str.split, record))
        records.append(dict(pairs))

print records

Bye,
bearophile

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: text file parsing (awk -> python)

Reply via email to