On Wed, 2008-03-26 at 15:12 +0800, Timothy Wu wrote: > Hi, I post the following in the Python mailing list but no one > responded. So I'm posting here again. > > ------------ > > Hi, > > I have created a very, very simple parser for an XML. > > class FindGoXML2(ContentHandler): > def characters(self, content): > print content > > I have made it simple because I want to debug. This prints out any > content enclosed by tags (right?). > > The XML is publicly available here: > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml > > I show a few line embedded in this XML: > > <Gene-commentary_source> > <Other-source> > <Other-source_src> > <Dbtag> > <Dbtag_db>GO</Dbtag_db> > <Dbtag_tag> > <Object-id> > <Object-id_id>3824</Object-id > _id> > </Object-id> > </Dbtag_tag> > </Dbtag> > </Other-source_src> > <Other-source_anchor>catalytic > activity</Other-source_anchor> > <Other-source_post-text>evidence: > IEA</Other-source_post-text> > </Other-source> > </Gene-commentary_source> > > Notice the third line before the last. I expect my content printout to > print out "evidence:IEA". > However this is what I get. > > ------------------------- > catalytic activity ==> this is the print out the line before > > > > e > vidence: IEA > ------------------------- > > I don't understand why a few blank lines were printed after "catalytic > activity". But that > doesn't matter. What matters is where the string "evidence: IEA" is > split into two printouts. > First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs > without a problem, > this occurs on my 826th XML. > > Any explanations??
The parser will retrieve input in chunks of unspecified size. There is no guarantee that a text block will all get returned at once. You are seeing this problem because the print statement adds a newline after it prints. If you want to see the text itself, without phantom newlines, try replacing print with sys.stdout.write(). Cheers, Cliff _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig