I wrote: > I'd like to be able to take an array of the Unicode > datatype and convert it to any reasonable encoding;
Raul answered: > In the general case, this is not possible. Understood. In particular, it is not possible to convert Unicode to ASCII if that Unicode contains codepoints above 127 (without losing those codepoints). But I do not envision a need for that. My use case is that I want to be able to consume characters from any data source, convert them to J's internal unicode datatype (3!:0 = 131072 aka utf16 aka wchar), do some transformation, then convert them back to the original encoding. So if the original data was in ASCII, the output would be ASCII. If the original data were utf8, the output would be utf8. If the original data were utf16 in little endian order, then the output would be utf16 in little endian order. And so on. The transformations I have in mind would be the normal parsing/extracting/reporting manipulations. In particular, I do not imagine adding non-ASCII codepoints during my transformations, if they were not already present. Put another way: the inverse function would not stand alone. The goal of the 3 verbs together is to allow me to ignore character encoding issues and just work with textual data as characters, as I'm used to. I'm aware that even these verbs are insufficient; for example # text wouldn't be the same as number_of_chars text if the text contained surrogate pairs. But they'd be a good start. -Dan ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
