On Tue, 2005-03-29 at 15:33 -0600, Greg Lindstrom wrote: > Hello- > I have a general (I guess) xml parsing question that I hope has an > answer. I am busy parsing health care claim records using xpath and do > not see a way to parse the following (stripped down) file (I've added > lines to group my problem...) > > 1. + <seg id='ST'> > 2. + <loop id='HEADER'> > 3. - <loop id='DETAIL'> > 4. - <loop id='2000A'> > 5. + <seg id='HL'> > 6. + <loop id='2000AA'> > 7. + <loop id='2000B'> > 8. + <seg id='HL'> --------+ > 9. + <seg id='SBR'> | > 10. + <loop id='2010BA'> | Group 1 > 11. + <loop id='2010BB'> | > 12. + <loop id='2300'> -----+ > 13. + <seg id='HL'> ---------+ > 14. + <seg id='SBR'> | > 15. + <loop id='2010BA'> | > 16. + <loop id='2010BB'> | Group 2 > 17. + <loop id='2300'> -----+ > 18. </loop> > 19. </loop> > 20. </loop> > > What I need to do is process the records from lines 8-12 as a group, > then the records from lines 13-17 as another group. Each of the "HL" > segments indicates the beginning of a new set of records to process. I > would think that the xml should (would/could) be defined so that each of > the HL statements would start a new loop structure, but that's not how > it's defined and I can't change it. There is no way of knowing how many > lines will be in each set of records, or how many HL segments will be > beneath the 2000B loop, so is there a way I can logically group the > record segments together to form a packet of record to process? > > Thanks for any attention/help you can pass my way.
You've heard a lot of suggestions, and they're all good, but I couldn't help posting a neat Amara recipe for such grouping: -- % -- from amara import binderytools XML="""\ <doc> <!-- Each a element is an implicit group extending to the next a --> <a id="1"/> <b id="1.1"/> <c id="1.2"/> <a id="2"/> <b id="2.1"/> <c id="2.2"/> <a id="3"/> <b id="3.1"/> <c id="3.2"/> </doc> """ top = binderytools.create_document(u"doc") container = None for e in binderytools.pushbind('/doc/*', string=XML): if e.nodeName == u"a": container = e top.doc.xml_append(e) else: container.xml_append(e) print top.xml(indent=u"yes") -- % -- The output is: <?xml version="1.0" encoding="UTF-8"?> <doc> <a id="1"> <b id="1.1"/> <c id="1.2"/> </a> <a id="2"> <b id="2.1"/> <c id="2.2"/> </a> <a id="3"> <b id="3.1"/> <c id="3.2"/> </a> </doc> -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://4Suite.org http://fourthought.com Use CSS to display XML, part 2 - http://www-128.ibm.com/developerworks/edu/x-dw-x-xmlcss2-i.html Writing and Reading XML with XIST - http://www.xml.com/pub/a/2005/03/16/py-xml.html Use XSLT to prepare XML for import into OpenOffice Calc - http://www.ibm.com/developerworks/xml/library/x-oocalc/ Be humble, not imperial (in design) - http://www.adtmag.com/article.asp?id=10286 State of the art in XML modeling - http://www.ibm.com/developerworks/xml/library/x-think30.html _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig