Manuel Collado wrote:
> Using XXE Personal Edition 4.1.0 on WindowsXP SP3 Spanish.
> 
> A problem arises when using external tools to make some complex 
> tranformations of XHTML document fragments. For instance, syntax 
> hilighting of program listings.
> 
> The XHTML configuration has been customized and now includes a command 
> for processing the selection with an external AWK script:
> 
>    <command name="awk">
>      <macro>
>        <sequence>
>          <command name="run" parameter='"%C\awk" "%0" "%F"' />
>          <command name="paste" parameter="to %_" />
>        </sequence>
>      </macro>
>    </command>
> 
> It works OK for text fragments, but not for elements. The exported %F 
> file looks like:
> 
>    <?xml version="1.0" encoding="UTF-8"?>
>    <p
>    xmlns="http://www.w3.org/1999/xhtml";
>    xmlns:ns="http://www.w3.org/1999/xhtml";
>    >WriteString("??Cantidad ?");</p
>    >
> 
> If the external transformation does nothig, and just reproduce its 
> input, non-ASCII characters are mangled. The original paragraph:
> 
>     WriteString("?Cantidad ?");
> 
> is changed into
> 
>     WriteString("??Cantidad ?");
> 
> The output of the external command keeps the original <?xml..> 
> declaration, but it seems that XXE ignores it, or at least ignores the 
> encoding specification, and pastes the text nodes as if they were 
> encoded in the default platform encoding (windows-1252 ~= latin1).

--> There is no bug here.

The run command which captures the output of your awk command assumes
that what is written on stdout uses the default platform encoding.

The value of the %_ variable is the Java string captured by the run
command.

A Java string being a sequence of unicode char has no concept of encoding.

That's why XXE completely ignores <?xml...encoding="XXX"?> when it
parses a Java string.

We do not intend to change this behavior because we think it is the
correct one.



--> You need to write a slightly more complex macro as a workaround.

[1] Your awk script must save its output to a temporary file.

[2] Invoke a "process" command just containing a "read" child element.
See http://www.xmlmind.com/xmleditor/_distrib/doc/commands/read.html

- Specify the absolute filename of the above temporary file as its
"file" attribute.

- Specify "UTF-8" in its "encoding" attribute (yes, you must hardwire an
encoding).

[3] The end of the macro remains unchanged:
<command name="paste" parameter="to %_" />



--> Invoking an XSLT stylesheet wrapped in a process command rather than
using awk allows you to do what you want without having to resort to
dirty tricks. (Even free Personal Edition allows you to write such
process command.)



Reply via email to