<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I'm parsing some data of the form: > > OuterName1 InnerName1=5,InnerName2=7,InnerName3=34; > OuterName2 InnerNameX=43,InnerNameY=67,InnerName3=21; > OuterName3 .... > and so on.... > I wrote pyparsing for just this kind of job. Using pyparsing, you can both parse the data and build up structured results - even results with keyed fields or dictionary-type access.
Here's the complete pyparsing program to parse your data: --------------------------- data = """ OuterName1 InnerName1=5,InnerName2=7,InnerName3=34; OuterName2 InnerNameX=43,InnerNameY=67,InnerName3=21; """ # or data = file(inputname).read() from pyparsing import * EQ = Literal("=").suppress() SEMI = Literal(";").suppress() ident = Word(alphas, alphanums+"_") integer = Word(nums) value = integer | quotedString innerentry = Group(ident + EQ + value) vallist = Dict(delimitedList(innerentry)) outerentry = Group(ident + vallist + SEMI) datalist = Dict( ZeroOrMore(outerentry) ) vals = datalist.parseString(data) print vals.keys() print vals["OuterName1"]["InnerName2"] print vals.OuterName2.InnerNameY --------------------------- Prints: ['OuterName2', 'OuterName1'] 7 67 Here's the same program, with a few more comments to explain what's going on: --------------------------- from pyparsing import * # define expressions for some basic elements - use pyparsing's basic # building blocks, Literal and Word EQ = Literal("=").suppress() SEMI = Literal(";").suppress() ident = Word(alphas, alphanums+"_") integer = Word(nums) # expand this list to include other items you end up finding in values value = integer | quotedString # define the format of the list of InnerName entries innerentry = Group(ident + EQ + value) # delimitedList is a pyparsing helper for a list of expressions, separated by # some delimiter - default delimiter is a comma vallist = delimitedList(innerentry) # lastly, define the overall datalist outerentry = Group(ident + vallist + SEMI) datalist = ZeroOrMore( outerentry ) # extract the data into a structure using parseString vals = datalist.parseString(data) # prettyprint the results import pprint pprint.pprint(vals.asList()) print # Refinement: have pyparsing build keyed results while # it parses (accessible like a dict) vallist = Dict(delimitedList(innerentry)) outerentry = Group(ident + vallist + SEMI) datalist = Dict( ZeroOrMore(outerentry) ) # reparse using modified grammar vals = datalist.parseString(data) # view results using dict functions print vals.keys() print vals["OuterName1"]["InnerName2"] # if keys are valid Python identifiers, can also access results # like object fields print vals.OuterName2.InnerNameY --------------------------- Prints: [['OuterName1', ['InnerName1', '5'], ['InnerName2', '7'], ['InnerName3', '34']], ['OuterName2', ['InnerNameX', '43'], ['InnerNameY', '67'], ['InnerName3', '21']]] ['OuterName2', 'OuterName1'] 7 67 Download pyparsing at http://pyparsing.sourceforge.net. (I'm also making a couple of pyparsing presentations at PyCon, next month.) -- Paul -- http://mail.python.org/mailman/listinfo/python-list