I'd just use a SAX interface. When you see id=HL as an attribute,
close the old record and start a new one. Do the same thing at
end of file. Done.
Generally, if the structure is fairly fixed and you are extracting
the data, think about using SAX. If the shape of the structure
carries a lot of the information, you might need a DOM.
wunder
--On Tuesday, March 29, 2005 03:00:36 PM -0800 Dan Gunter <[EMAIL PROTECTED]>
wrote:
I can suggest where to start. You could use XSLT to transform it first,
ie into <HL> ..stuff.. </HL> sections. The XSLT cookbook (o'reilly)
recipe 6.8 "Deepening an XML hierarchy", should help. Or you could
stream the tree through (eg PullDOM or elementtree) and write the
program logic to transform it in Python (assuming what's between HL tags
fits into memory, but probably the XSLT approach has the same
limitation). Hope that helps.
-Dan
Greg Lindstrom wrote:
Hello-
I have a general (I guess) xml parsing question that I hope has an
answer. I am busy parsing health care claim records using xpath and
do not see a way to parse the following (stripped down) file (I've
added lines to group my problem...)
1. + <seg id='ST'>
2. + <loop id='HEADER'>
3. - <loop id='DETAIL'>
4. - <loop id='2000A'>
5. + <seg id='HL'>
6. + <loop id='2000AA'>
7. + <loop id='2000B'>
8. + <seg id='HL'> --------+
9. + <seg id='SBR'> |
10. + <loop id='2010BA'> | Group 1
11. + <loop id='2010BB'> |
12. + <loop id='2300'> -----+
13. + <seg id='HL'> ---------+
14. + <seg id='SBR'> |
15. + <loop id='2010BA'> |
16. + <loop id='2010BB'> | Group 2
17. + <loop id='2300'> -----+
18. </loop>
19. </loop>
20. </loop>
What I need to do is process the records from lines 8-12 as a group,
then the records from lines 13-17 as another group. Each of the "HL"
segments indicates the beginning of a new set of records to process.
I would think that the xml should (would/could) be defined so that
each of the HL statements would start a new loop structure, but that's
not how it's defined and I can't change it. There is no way of
knowing how many lines will be in each set of records, or how many HL
segments will be beneath the 2000B loop, so is there a way I can
logically group the record segments together to form a packet of
record to process?
Thanks for any attention/help you can pass my way.
--greg
_______________________________________________
XML-SIG maillist - XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig
--
Walter Underwood
Principal Architect
Verity Ultraseek
_______________________________________________
XML-SIG maillist - XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig