> Hi, > > [EMAIL PROTECTED] wrote: > > <node> > > This text <thistag> is completely crap </thistag> because <anothertag> > > blabla > > </anothertag> > > </node> > > <node> > > This is another <thisnotag> node </thisnotag> with <anothertaggy> random > > tags > > </anothertaggy> > > </node> > > > > I would like to retrieve what is between the tags <node> ...</node> into > > strings, the "subelements" being considered as simple string and not > > processed > > by elelement tree. > > You are trying to make an XML parser not parse XML, that's bound to fail. > > > > In other words, this could be badly formed HTML not processed embeded into > > well formed xml tags. > > If you really have something like "embedded HTML", it must be escaped in your > data to be parsable. There is no way an XML parser can return what you want > without modifying your 'data' (at least loosing whitespace etc.). > > I think the easiest option (if you have it) is to talk to the idiots who sent > you the data and have them fix it. > > Stefan > Thanks for you help, The real problem is not about "badly formed HTML" : each node will correspond to a leaf of a wx.TreeCtrl and the data associated to the leaf will be the content of a wx.RichTextCtrl. When saving the whole tree content in one file, I want to be able to get the structure of the tree and relocate the data to each leaf and definitely not touch the content which is parse the wxrichTxtCtrl. I was hoping Elementtree could help with this.. but maybe I am wrong and should think of a simplier system of tags in the text.
_______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig