On 7 Mar 2007 at 16:09, Björn Helgason said: > a is a file in ANSI while at2 is the same file saved in notepad as UTF-8
[...] > Is the are utility - preferably in J - that can read file a and write it > out like at2 in UTF-8? What do you mean by "in ANSI"? Can you guarantee that "a" is in ISO8859-1? If so the problem is pretty trivial: (a) precede each codepoint in the 192-223 range with 161; (b) precede each codepoint in the 224-255 range with 162 and subtract 64 from it; (c) prefix the whole with BOM, which in utf-8 is 239,187,191 (as in your 4th example) and in wchar is u:16bfeff This seems to work: conv =. 8 u:(u:16bfeff),u: a.i.a=.'<?xml vers °C" Forma' 60 63 120 109 108 32 118 101 114 115 32 194 176 67 34 32 70 111 114 109 97 a.i.conv a 239 187 191 60 63 120 109 108 32 118 101 114 115 32 195 130 194 176 67 34 32 70 111 114 109 97 I leave file input and output to you ... Note, however, that if "a" contains characters in the 128-191 range (the Win-1252 fancy quotes, etc) then more work is needed. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
