hi,

the problem is gone for the element content but it seems that still EOL normalization 
is not
working for attribute values: according to
http://www.w3.org/TR/2000/REC-xml-20001006#AVNormalize
after EOL normalization parser must assume attribute is CDATA and do whitespace 
normalization
to just space...

however it seems that Xerces2 is not doing it?

i have used the same test file but changed to use attribute value (i attach it too):

$ od --format x1 test_attr_simple_r.xml
0000000 3c 74 65 73 74 20 61 3d 22 2d 0a 2d 0d 2d 0d 0a
0000020 2d 0a 0d 2d 22 2f 3e 0d 0a
0000031
$ od -c test_attr_simple_r.xml
0000000   <   t   e   s   t       a   =   "   -  \n   -  \r   -  \r  \n
0000020   -  \n  \r   -   "   /   >  \r  \n
0000031

so according to the XML 1.0 spec expected value should be "- - - -  -" but it is 
something
like "- - \r- \r-  -" with multiple occurrences of \r that should have been normalized 
when
input was read...

it does not matter if validation is on/off in sax :


     java sax.DocumentTracer -V src\etc\test_attr_simple_r.xml
     setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorPr
     oxy@867e89)
     startDocument()
      startElement(uri="",localName="test",qname="test",attributes={{uri=null,localNa
     me="a",qname="a",type="CDATA",value="- - \r- \r-  -"}})
      endElement(uri="",localName="test",qname="test")
     endDocument()

     java -cp classes.cpr sax.DocumentTracer -v src\etc\test_attr_simple_r.xml
     setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorPr
     oxy@ab95e6)
     startDocument()
      startElement(uri="",localName="test",qname="test",attributes={{uri=null,localNa
     me="a",qname="a",type="CDATA",value="- - \r- \r-  -"}})
      endElement(uri="",localName="test",qname="test")endDocument()

for XNI non normalized value looks also very interesting...


     java xni.DocumentTracer -V src\etc\test_attr_simple_r.xml
     ...
      emptyElement(element={prefix=null,localpart="test",rawname="test",uri=null},att
     ributes={name={prefix=null,localpart="a",rawname="a",uri=null},type="CDATA",valu
     e="- - \r- \r-  -",nonNormalizedValue="-\r\r  -"}})
     endDocument()
     java  xni.DocumentTracer src\etc\test_attr_simple_r.xml
     ...
      emptyElement(element={prefix=null,localpart="test",rawname="test",uri=null},att
     ributes={name={prefix=null,localpart="a",rawname="a",uri=null},type="CDATA",valu
     e="- - \r- \r-  -",nonNormalizedValue="-\r\r  -"}})
     endDocument()



i have checked out and recompiled the latest source code from CVS - and here are 
results.

thanks,

alek


Andy Clark wrote:

> Aleksander Slominski wrote:
> >      document entity) on input, before parsing, by translating both the
> >      two-character sequence #xD #xA and any #xD that is not followed by
> >      #xA to a single #xA character. (...)
>
> According to the wording of the spec and the behavior of Xerces
> 1.x, this seems to be a bug. It seems strange to me, though, that
> DOS newline sequences are normalized to a single newline character,
> whereas Mac newline sequences are not. (I haven't used a Mac in a
> long time so could someone confirm for me that Mac newlines are
> #x0A #x0D? or are they just #x0D?)
>
> Anyway, I've fixed the problem and committed the changes to CVS.
> Now, the output from Xerces2 using your sample file is the
> following:
>
>
> 
>setDocumentLocator(locator=org.apache.xerces.parsers.AbstractSAXParser$LocatorProxy@b66cc)
>   startDocument()
>    startElement(uri="",localName="t",qname="t",attributes={})
>     characters(text="-")
>     characters(text="\n-")
>     characters(text="\n-")
>     characters(text="\n-")
>     characters(text="\n\n-")
>    endElement(uri="",localName="t",qname="t")
>   endDocument()
>
> --
> Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]



<test a="-
-
-
-
-"/>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to