On 7 Mar 2007 at 16:09, Björn Helgason said:

> a is a file in ANSI while at2 is the same file saved in notepad as UTF-8

[...]
> Is the are utility - preferably in J - that can read file a and write it
> out like at2 in UTF-8?

What do you mean by "in ANSI"?

Can you guarantee that "a" is in ISO8859-1? If so the problem is pretty
trivial:
(a) precede each codepoint in the 192-223 range with 161;
(b) precede each codepoint in the 224-255 range with 162 and subtract 64
from it;
(c) prefix the whole with BOM, which in utf-8 is 239,187,191 (as in your
4th example) and in wchar is u:16bfeff

This seems to work:
   conv =. 8 u:(u:16bfeff),u:
   a.i.a=.'<?xml vers °C" Forma'
60 63 120 109 108 32 118 101 114 115 32 194 176 67 34 32 70 111 114 109 97
   a.i.conv a
239 187 191 60 63 120 109 108 32 118 101 114 115 32 195 130 194 176 67 34 32 70 
111 114 109 97

I leave file input and output to you ...

Note, however, that if "a" contains characters in the 128-191 range (the
Win-1252 fancy quotes, etc) then more work is needed.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to