Fredrik Lundh wrote: > > Jimmy Retzlaff wrote: > > > I'm using cElementTree.iterparse to iterate over an XML file. I think > > iterparse is a wonderful idea - I've found it to be much more convenient > > than SAX for iterative processing. I have come across a problem > > though... > > > > For the majority of my elements, both the start and end events contain > > the text of the element (i.e., element.text). For a handful of the > > elements, the text is only in the end event (i.e., element.text is None > > in the start event but it is not None in the end event). The text is > > found without any problem when using cElementTree.parse on the file > > instead. > > > Am I misunderstanding something or is this perhaps a bug? > > it needs more documentation ;-) > > here's what the comment in the CHANGES document says: > > The elem object is the current element; for "start" events, > the element itself has been created (including attributes), but its > contents may not be complete; for "end" events, all child elements > has been processed as well. You can use "start" tags to count > elements, check attributes, and check if certain tags are present > in a tree. For all other purposes, use "end" handlers instead. > > in that text, "may not" really means "may or may not". that is, the > contents may be complete, but that's nothing you can or should rely on. > > the reason for this is that events don't fire in perfect lockstep with the > build process; in the current version, the parser may be up to 16k further > ahead.
... > clearer? Yes, thanks! Just a thought... would it be better to artificially hide the attributes that can't be counted on in a start event or are the tradeoffs in doing so too ugly? With small elements like mine and a buffer as large as 16KB then things will almost always be available in the start event. That'll lead learn-by-trail-and-error folks (i.e., those of us who don't read :) to miss the distinction altogether. I was lucky enough to have a unit test that noticed I had ~10 or so empty values out of many thousands, but otherwise I wouldn't have known about the problem (especially if empty values were occasionally expected). Thanks for all the wonderful libraries. Jimmy _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig