Jimmy Retzlaff wrote: > I'm using cElementTree.iterparse to iterate over an XML file. I think > iterparse is a wonderful idea - I've found it to be much more convenient > than SAX for iterative processing. I have come across a problem > though... > > For the majority of my elements, both the start and end events contain > the text of the element (i.e., element.text). For a handful of the > elements, the text is only in the end event (i.e., element.text is None > in the start event but it is not None in the end event). The text is > found without any problem when using cElementTree.parse on the file > instead.
> Am I misunderstanding something or is this perhaps a bug? it needs more documentation ;-) here's what the comment in the CHANGES document says: The elem object is the current element; for "start" events, the element itself has been created (including attributes), but its contents may not be complete; for "end" events, all child elements has been processed as well. You can use "start" tags to count elements, check attributes, and check if certain tags are present in a tree. For all other purposes, use "end" handlers instead. in that text, "may not" really means "may or may not". that is, the contents may be complete, but that's nothing you can or should rely on. the reason for this is that events don't fire in perfect lockstep with the build process; in the current version, the parser may be up to 16k further ahead. this means that when you get a "start" event, the parser has often processed everything inside the event (especially if it's small enough), but you cannot rely on that. or in other words, for a start event, the following attributes are valid: elem.tag elem.attrib tags and attributes for parent elements (use a stack if you need to track them) (not elem.text) (not elem.tail) (not elem[:]) you may modify the tag and attrib attributes you may stop parsing and for an end event, the following applies: elem.tag elem.attrib elem.text elem[:] (i.e. the children) complete contents for all children (including the tail) (not elem.tail) (but all child tails) you may modify all attributes, except elem.tail you may reorder/update children you may remove children (e.g. calling elem.clear() to mark that you're done with this level) you may stop parsing clearer? I think I need to draw a couple of diagrams... </F> _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig