I wrote:
> I'd like to be able to take an array of the Unicode 
> datatype and convert it to any reasonable encoding;

Raul answered:
> In the general case, this is not possible.

Understood.   In particular, it is not possible to convert Unicode to ASCII if 
that Unicode contains codepoints above 127 (without
losing those codepoints).   But I do not envision a need for that.   

My use case is that I want to be able to consume characters from any data 
source, convert them to J's internal unicode datatype
(3!:0 = 131072 aka utf16 aka wchar), do some transformation, then convert them 
back to the original encoding.  

So if the original data was in ASCII, the output would be ASCII.  If the 
original data were utf8, the output would be utf8.  If
the original data were utf16 in little endian order, then the output would be 
utf16 in little endian order.  And so on.  

The transformations I have in mind would be the normal 
parsing/extracting/reporting manipulations. In particular, I do not imagine
adding non-ASCII codepoints during my transformations, if they were not already 
present.

Put another way: the inverse function would not stand alone.  The goal of the 3 
verbs together is to allow me to ignore character
encoding issues and just work with textual data as characters, as I'm used to.  
I'm aware that even these verbs are insufficient;
for example   # text   wouldn't be the same as   number_of_chars text  if the 
text contained surrogate pairs.  

But they'd be a good start.

-Dan

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to