[ 
http://issues.apache.org/jira/browse/LOG4NET-22?page=comments#action_62521 ]
     
Nicko Cadell commented on LOG4NET-22:
-------------------------------------

The System.Xml.XmlTextWriter does not know which version XML is being 
generated. There is no API to configure it one way or the other. The XmlLayout 
does not generate a full XML document, only a fragment which must be included 
in a document.

If the XML output in included in an XML 1.1 document then the numeric character 
references in the additional ranges allowed by the 1.1 spec will be valid. 
However this is outside of the scope of log4net to enforce.

The XmlLayout must be told which XML version is being targeted and must default 
to 1.0 not to 1.1.

For invalid characters such as 0x1e there are 3 possible solutions:

1) Discard the character from the output.

2) Replace the character with a numeric representation e.g. "0x1E".

3) Replace the character with an XML element e.g. <char code="30"/>

Regardless of the output version (1.0 or 1.1) selected one of the above choices 
will need to be made. XML version 1.1 does not allow a NULL (0x0) character to 
appear un-encoded or as a numeric character reference, therefore this will need 
to be represented in some way.

Note that the invalid characters cannot be included in a CDATA block, however 
there are issues with some parsers that do allow them there when they should 
not.

I favour option 3 above because information is not lost. In options 1 and 2 
information is lost. In 2 the encoding is not reversible. With 3 the 
application reading the data requires additional smarts to pickup on the 
encoded values in element, but all the original information is preserved. If 
the app just asks for the text nodes, ignoring the child elements, then they 
will get back the same result as from 1.

> XmlLayout allows output of invalid control characters
> -----------------------------------------------------
>
>          Key: LOG4NET-22
>          URL: http://issues.apache.org/jira/browse/LOG4NET-22
>      Project: Log4net
>         Type: Bug
>   Components: Appenders
>     Versions: 1.2.9
>     Reporter: Nicko Cadell

>
> XmlLayout allows output of invalid control characters.
> Reported by Mike Blake-Knox with additional comments from Curt Arnold.
> The XmlLayout encodes the character 0x1e as &#x1E; using the standard XML 
> numeric character reference.
> This character code is in a range which is not allowed to appear in XML 1.0 
> either as a un-encoded value or as a numeric character reference.
> The valid character ranges are defined here in the XML recommendation:
> http://www.w3.org/TR/REC-xml/#charsets
> They are:
> #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> Numeric character references are not able to express characters from outside 
> these ranges.
> The System.Xml.XmlTextWriter does not verify if the unicode character is 
> valid in XML, but it does encode it as a numeric character reference if it 
> cannot be expressed in the output encoding.
> To complicate matters further XML 1.1 does allow further, so called 
> restricted characters, to be included in the output if they are encoded as 
> numeric character references. These ranges are:
> [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]
> See http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets for details.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira

Reply via email to