On Tue, Aug 28, 2007 at 01:00:53PM -0400, Daniel Corbe wrote: > > Let me try asking the question a different way. > > I'm working with a pre-formated (human generated) XML file, so there's > text all through out the document consisting of things like "\n\n\n\t" > and " \n" etc. > > When I run into these characters, I see them as children of whichever > node I happen to be working in and they're of the type XML_TEXT_NODE > > > When I run calls to xmlDocDumpFormat(), it seems to be treating these > nodes as if they contained more than white spaces, newlines and tabs. > > > Is there a work-around for this? Something that's a bit more > intelligent than XmlDocDumpFormat()?
You and only you can know if those space are important for the application or not. Don't hope or expect the parser can actually do it for you. People tried for more of a decade to infere such rules in SGML and failed, as a result in XML all white space in content are significant and must be reported to the application (or saved). Experience proves that 'intelligent' out of context detection of white spaces did not work, I doubt this has changed in the last 10 years. > If not, I'm thinking any of the following would be the best course of > action (looking for a recommendation): > > 1) Go through each node and their children one by one and simply > remove any XML_TEXT_NODE node types that contain only white spaces, > newlines and tabs. Then simply call xmlDocDumpFormat() > > 2) Crawl through each node and their child and manually ADD these > XML_TEXT_NODEs and call xmlDocDump() > > 3) ??? You know what text nodes containing spaces are significant to your application, there is no heuristic. If you don't need them remove them, if you want them add them, the API allows both. Libxml2 serializer, when asked to indent will try to do it, *but* if it discover an existing text node which is not a leaf, it will stop doing that to avoid breaking element which contain 'mixed content' i.e. both text and elements. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
