larry.martell...@gmail.com, 25.11.2013 23:22: > I have an XML file that has an element called "Node". These can be nested to > any depth and the depth of the nesting is not known to me. I need to parse > the file and preserve the nesting. For exmaple, if the XML file had: > > <Node Name="A"> > <Node Name="B"> > <Node Name="C"> > <Node Name="D"> > <Node Name="E"> > > When I'm parsing Node "E" I need to know I'm in A/B/C/D/E. Problem is I don't > know how deep this can be. This is the code I have so far: > > nodes = [] > > def parseChild(c): > if c.tag == 'Node': > if 'Name' in c.attrib: > nodes.append(c.attrib['Name']) > for c1 in c: > parseChild(c1) > else: > for node in nodes: > print node, > print c.tag > > for parent in tree.getiterator(): > for child in parent: > for x in child: > parseChild(x)
This seems hugely redundant. tree.getiterator() already returns a recursive iterable, and then, for each nodes in your document, you are running recursively over its entire subtree. Meaning that you'll visit each node as many times as its depth in the tree. > My problem is that I don't know when I'm done with a node and I should > remove a level of nesting. I would think this is a fairly common > situation, but I could not find any examples of parsing a file like > this. Perhaps I'm going about it completely wrong. Your recursive traversal function tells you when you're done. If you drop the getiterator() bit, reaching the end of parseChild() means that you're done with the element and start backing up. So you can simply pass down a list of element names that you append() at the beginning of the function and pop() at the end, i.e. a stack. That list will then always give you the current path from the root node. Alternatively, if you want to use lxml.etree instead of ElementTree, you can use it's iterwalk() function, which gives you the same thing but without recursion, as a plain iterator. http://lxml.de/parsing.html#iterparse-and-iterwalk Stefan -- https://mail.python.org/mailman/listinfo/python-list