Aleksander Slominski wrote:
> document entity) on input, before parsing, by translating both the
> two-character sequence #xD #xA and any #xD that is not followed by
> #xA to a single #xA character. (...)
According to the wording of the spec and the behavior of Xerces
1.x, this seems to be a bug. It seems strange to me, though, that
DOS newline sequences are normalized to a single newline character,
whereas Mac newline sequences are not. (I haven't used a Mac in a
long time so could someone confirm for me that Mac newlines are
#x0A #x0D? or are they just #x0D?)
Anyway, I've fixed the problem and committed the changes to CVS.
Now, the output from Xerces2 using your sample file is the
following:
setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorProxy@b66cc)
startDocument()
startElement(uri="",localName="t",qname="t",attributes={})
characters(text="-")
characters(text="\n-")
characters(text="\n-")
characters(text="\n-")
characters(text="\n\n-")
endElement(uri="",localName="t",qname="t")
endDocument()
--
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]