DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12772>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=12772

Xerces J2 is not correctly treating UTF-8 encoded characters in patterns.

           Summary: Xerces J2 is not correctly treating UTF-8 encoded
                    characters in patterns.
           Product: Xerces2-J
           Version: 2.0.1
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: Normal
          Priority: Other
         Component: XML Schema datatypes
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


Xerces J2, and Xerces J1, are not correctly treating UTF-8 encoded characters in
patterns.

Errant behaviour observed in use of pattern, and encoding of euro character
(files attached).  The schema pattern is recognised if encoded as an entity
reference, but the UTF-8 encoded euro character is split into two characters and
the file validated as though the pattern consisted of these two characters,
rather than the single, UTF-8 encoded, euro character.


So, with
   1)  a pattern in a schema consisting of a euro in UTF-8 encoding, surrounded 
       by square brackets -  [e] where e is UTF-8 euro,
 and 
   2) a euro in an instance coded either as an entity reference, &#8364; or as    
      UTF-8, 
 then
the instance is not seen as matching the pattern.

If the pattern is [&#x20ac;] then the instance is validated correctly.


Result from validating attached notEuros2.xml against attached notEuros.xsd

[Error] file: null notEuros2.xml:3:25: cvc-type.3.1.3: The value '?' of element
'AsUTF8' is not valid.

thanks
Reuben

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to