hi,
the problem is gone for the element content but it seems that still EOL normalization
is not
working for attribute values: according to
http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize
after EOL normalization parser must assume attribute is CDATA and do whitespace
normalization
to just space...
however it seems that Xerces2 is not doing it?
i have used the same test file but changed to use attribute value (i attach it too):
$ od --format x1 test_attr_simple_r.xml
0000000 3c 74 65 73 74 20 61 3d 22 2d 0a 2d 0d 2d 0d 0a
0000020 2d 0a 0d 2d 22 2f 3e 0d 0a
0000031
$ od -c test_attr_simple_r.xml
0000000 < t e s t a = " - \n - \r - \r \n
0000020 - \n \r - " / > \r \n
0000031
so according to the XML 1.0 spec expected value should be "- - - - -" but it is
something
like "- - \r- \r- -" with multiple occurrences of \r that should have been normalized
when
input was read...
it does not matter if validation is on/off in sax :
java sax.DocumentTracer -V src\etc\test_attr_simple_r.xml
setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorPr
oxy@867e89)
startDocument()
startElement(uri="",localName="test",qname="test",attributes={{uri=null,localNa
me="a",qname="a",type="CDATA",value="- - \r- \r- -"}})
endElement(uri="",localName="test",qname="test")
endDocument()
java -cp classes.cpr sax.DocumentTracer -v src\etc\test_attr_simple_r.xml
setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorPr
oxy@ab95e6)
startDocument()
startElement(uri="",localName="test",qname="test",attributes={{uri=null,localNa
me="a",qname="a",type="CDATA",value="- - \r- \r- -"}})
endElement(uri="",localName="test",qname="test")endDocument()
for XNI non normalized value looks also very interesting...
java xni.DocumentTracer -V src\etc\test_attr_simple_r.xml
...
emptyElement(element={prefix=null,localpart="test",rawname="test",uri=null},att
ributes={name={prefix=null,localpart="a",rawname="a",uri=null},type="CDATA",valu
e="- - \r- \r- -",nonNormalizedValue="-\r\r -"}})
endDocument()
java xni.DocumentTracer src\etc\test_attr_simple_r.xml
...
emptyElement(element={prefix=null,localpart="test",rawname="test",uri=null},att
ributes={name={prefix=null,localpart="a",rawname="a",uri=null},type="CDATA",valu
e="- - \r- \r- -",nonNormalizedValue="-\r\r -"}})
endDocument()
i have checked out and recompiled the latest source code from CVS - and here are
results.
thanks,
alek
Andy Clark wrote:
> Aleksander Slominski wrote:
> > document entity) on input, before parsing, by translating both the
> > two-character sequence #xD #xA and any #xD that is not followed by
> > #xA to a single #xA character. (...)
>
> According to the wording of the spec and the behavior of Xerces
> 1.x, this seems to be a bug. It seems strange to me, though, that
> DOS newline sequences are normalized to a single newline character,
> whereas Mac newline sequences are not. (I haven't used a Mac in a
> long time so could someone confirm for me that Mac newlines are
> #x0A #x0D? or are they just #x0D?)
>
> Anyway, I've fixed the problem and committed the changes to CVS.
> Now, the output from Xerces2 using your sample file is the
> following:
>
>
>
>setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorProxy@b66cc)
> startDocument()
> startElement(uri="",localName="t",qname="t",attributes={})
> characters(text="-")
> characters(text="\n-")
> characters(text="\n-")
> characters(text="\n-")
> characters(text="\n\n-")
> endElement(uri="",localName="t",qname="t")
> endDocument()
>
> --
> Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
<test a="-
-
-
-
-"/>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]