Using XXE Personal Edition 4.1.0 on WindowsXP SP3 Spanish.

A problem arises when using external tools to make some complex
transformations of XHTML document fragments. For instance, syntax
hilighting of program listings.

The XHTML configuration has been customized and now includes a command
for processing the selection with an external AWK script:

   <command name="awk">
     <macro>
       <sequence>
         <command name="run" parameter='"%C\awk" "%0" "%F"' />
         <command name="paste" parameter="to %_" />
       </sequence>
     </macro>
   </command>

It works OK for text fragments, but not for elements. After selecting a 
simple paragraph, the exported %F file looks like:

   <?xml version="1.0" encoding="UTF-8"?>
   <p
   xmlns="http://www.w3.org/1999/xhtml";
   xmlns:ns="http://www.w3.org/1999/xhtml";
   >WriteString("??Cantidad ?");</p
   >

If the external transformation does nothing, and just reproduce its
input, non-ASCII characters are mangled. The original paragraph:

    WriteString("?Cantidad ?");

is changed into

    WriteString("??Cantidad ?");

The output of the external command keeps the original <?xml..>
declaration, but it seems that XXE ignores it, or at least ignores the
encoding specification, and pastes the text nodes as if they were
encoded in the default platform encoding (windows-1252 ~= latin1).

Instead of using awk, the problem can be probaly reproduced by any 
external utility that just copies the input to the output.

My workaround is to externally convert the data from UTF-8 to 
ISO-8859-1, but this is just a dirty hack, probably unreliable.

Is this a bug? Please tell me if I'm missing something relevant.

Regards.
-- 
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado



Reply via email to