Manuel Collado wrote: > Using XXE Personal Edition 4.1.0 on WindowsXP SP3 Spanish. > > A problem arises when using external tools to make some complex > tranformations of XHTML document fragments. For instance, syntax > hilighting of program listings. > > The XHTML configuration has been customized and now includes a command > for processing the selection with an external AWK script: > > <command name="awk"> > <macro> > <sequence> > <command name="run" parameter='"%C\awk" "%0" "%F"' /> > <command name="paste" parameter="to %_" /> > </sequence> > </macro> > </command> > > It works OK for text fragments, but not for elements. The exported %F > file looks like: > > <?xml version="1.0" encoding="UTF-8"?> > <p > xmlns="http://www.w3.org/1999/xhtml" > xmlns:ns="http://www.w3.org/1999/xhtml" > >WriteString("??Cantidad ?");</p > > > > If the external transformation does nothig, and just reproduce its > input, non-ASCII characters are mangled. The original paragraph: > > WriteString("?Cantidad ?"); > > is changed into > > WriteString("??Cantidad ?"); > > The output of the external command keeps the original <?xml..> > declaration, but it seems that XXE ignores it, or at least ignores the > encoding specification, and pastes the text nodes as if they were > encoded in the default platform encoding (windows-1252 ~= latin1).
--> There is no bug here. The run command which captures the output of your awk command assumes that what is written on stdout uses the default platform encoding. The value of the %_ variable is the Java string captured by the run command. A Java string being a sequence of unicode char has no concept of encoding. That's why XXE completely ignores <?xml...encoding="XXX"?> when it parses a Java string. We do not intend to change this behavior because we think it is the correct one. --> You need to write a slightly more complex macro as a workaround. [1] Your awk script must save its output to a temporary file. [2] Invoke a "process" command just containing a "read" child element. See http://www.xmlmind.com/xmleditor/_distrib/doc/commands/read.html - Specify the absolute filename of the above temporary file as its "file" attribute. - Specify "UTF-8" in its "encoding" attribute (yes, you must hardwire an encoding). [3] The end of the macro remains unchanged: <command name="paste" parameter="to %_" /> --> Invoking an XSLT stylesheet wrapped in a process command rather than using awk allows you to do what you want without having to resort to dirty tricks. (Even free Personal Edition allows you to write such process command.)

