Hi Bob, Your question is not at all ignorant or simple.
In XML, all characters between a "<tag...>" and its counterpart "</tag>" are relevant, either being this element's content, or a subordinate element. Therefore, you cannot decide, just by looking at some content text, whether a blank or a newline is content as set by the XML text creator - or merely a formatting quirk. Therefore, it's only possible by taking the kind of document and element into account, or by being assisted by an XML schema's information, that XML processing can handle the content adequately. If you are dealing with XHTML, the content of the paragraph element <p> (and also some others) should be interpreted by trimming leading and trailing whitespace and collapsing embedded runs of white space to a single blank. (This is XML schema's processing facet "collapse".) With (X)HTML, it's the task of a renderer (printer or browser) - possibly assisted by style sheets - to supply spacing before and after a paragraph's text, indentation of the first line, alignment, line breaks, etc. Moreover, notice that <body> has "content", too - the result of all the characters surrounding the contained <p>-elements. But the interpretation of <body> does not require processing of its content value at all. -W From: Bob Sabiston <[email protected]> To: [email protected] Subject: [xml] how to interpret/reproduce this type of xml? Message-ID: <[email protected]> Content-Type: text/plain; charset="us-ascii"; Format="flowed"; DelSp="yes" Hello, I am new to xml and libxml, so please forgive me if the following is an ignorant or simple question. I'm trying to write some code that reads and writes to a pre-existing xml file format. I'm having trouble with one part of the file, where elements of text make up individual elements. See below, where the text "Notes 1", "Notes 2", "Notes 3" are each contained within <p></p> brackets? I am having trouble figuring out how to write in that format or read it, because the content is text, but between the brackets there is also text that is NOT part of the content. By that I mean that the first <p> is followed by a newline and then some number of characters due to indenting. The end of the text is followed by another newline and more spaces. So it's all due to the formatting that I'm having trouble, but does anyone know how to do this? Specifically, if I'm reading the file and I get the text between the brackets, how do I know where the formatting ends and the real text starts? If I'm writing the file, what do I do to write it in this format? <richcontent TYPE="NOTE"><html> <head> </head> <body> <p> Notes 1 </p> <p> Notes 2 </p> <p> Notes 3 </p> </body> </html> </richcontent> I really appreciate any help anyone can offer! Thanks Bob _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
