Thank you , very much, for your quick response.

Hussein Shafie escribi?:
> Manuel Collado wrote:
>> Using XXE Personal Edition 4.1.0 on WindowsXP SP3 Spanish.
>>  ...
>> The XHTML configuration has been customized and now includes a command
>> for processing the selection with an external AWK script:
>>
>>    <command name="awk">
>>      <macro>
>>        <sequence>
>>          <command name="run" parameter='"%C\awk" "%0" "%F"' />
> 
> The run command is expected to print on stdout some bytes. If these
> bytes are used to represent text, then the encoding of these bytes must
> be the native encoding of your platform (windows-1252).
> 
> The problem comes from the fact that awk prints UTF-8 bytes on stdout
> and command "run" thinks these are windows-1252 bytes.
> 
>>          <command name="paste" parameter="to %_" />
> 
> %_ is a Java string (that is, a sequence of 16-bit chars). There is no
> concept of encoding at this stage.
> 
> if %_ contains a string starting with "<?xml version="1.0"
> encoding="UTF-8"?>...", then encoding="UTF-8" is ignored because such
> specification makes no sense in this context.
> 
>>        </sequence>
>>      </macro>
>>    </command>
>> ...
>> My workaround is to externally convert the data from UTF-8 to 
>> ISO-8859-1, but this is just a dirty hack, probably unreliable.
>>
>> Is this a bug? Please tell me if I'm missing something relevant.
> 
> No, it works as expected. You really need to convert UTF-8 to
> windows-1252 to force awk print windows-1252 bytes on stdout.
> ---
> PS: No there is no way to make command "run" use for "%F" an encoding
> other than UTF-8.

OK. I know now how to proceed. Re-reading the documentation after your 
explanations make things much more clear.

But, with due respect, I still think XXE could be improved for easier 
interoperation with external tools.

In the 'run' command, the behaviour of exporting data via %F is twofold. 
Either plain text (for text selection) or XML fragment (XML external 
entity) otherwise. Retrieving external data via %_ should have also 
these two possibilities. Either retrieve as plain text, or as XML fragment.

Always converting the captured output to the internal Java encoding lets 
further parsing of it as XML really inconsistent.

The 'read' element of the process commands provides some support for 
this, by means of the 'encoding' parameter.

But I think a better approach is not to explicitly stating the expected 
encoding, but to select between plain text and XML for interpreting the 
captured data. The 'run' could have an optional 'XML' parameter to force 
the external tool output to be handled as an XML external entity, and 
not just as plain text. Or at least to acknowledge the encoding stated 
in the XML declaration, if any.

Anyway, thank you, again, for allowing free use of the excellent XXE 
editor for personal work.

Regards.
-- 
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado


Reply via email to