DO NOT REPLY [Bug 4455] - parser problem!

bugzilla Thu, 13 Dec 2001 12:50:13 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4455>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=4455

parser problem!





------- Additional Comments From [EMAIL PROTECTED]  2001-12-13 12:51 -------
>1. Entity substitution: according to my understanding of the xml spec,
> "&amp;" is not an entity reference.

The DOM WG had to check this. The official answer was that &amp; _is_ an entity 
reference, not a numeric character reference... but that since parsers are 
premitted to "flatten" (fully expand) entity references before presenting the 
document to the user, it's entirely reasonable for it to flatten this one even 
if it doesn't flatten user-defined entities.

>2. The string is splitted into 3 strings: xml parsers are free to group 
>characters in chunks.

In SAX, this is definitely true. SAX may break text into multiple characters() 
calls for many reasons, and SAX applications have to be written so they can deal 
with that. Standard solution if you need a single string is to accumulate 
incoming characters() data until the first non-characters() call, and process 
the collected data at that time.

In the DOM, the anwer is somewhat different. The DOM spec says that the initial 
state of a DOM as delivered by an "XML processor" (by which the XML spec means 
"parser") should be as if the normalize() operation had been called -- in other 
words, any adjacent non-CDATASection text should be coalesced into a single Text 
node.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 4455] - parser problem!

Reply via email to