I suppose my most radical idea would be to store all the characters as integers 
which would correspond to Unicode Codepoints and only be converted to the 
specific encoding (utf-8, utf-16, or utf-32) depending on type upon display.

I am not advocating this, but just offer it up for consideration. It would be a 
fundamental change to the way J handles characters and only the implementation 
team could determine whether it is even close to worthwhile.

Cheers, bob

> On Sep 14, 2019, at 6:28 AM, bill lam <[email protected]> wrote:
> 
>> J types are atomic types
> 
> exactly. and J language primitives operate on atomic types, doesn't care
> about utf8/utf16 encodings at all.
> 
> On Sat, Sep 14, 2019, 8:07 PM Raul Miller <[email protected]> wrote:
> 
>> On Sat, Sep 14, 2019 at 1:50 AM 'robert therriault' via Programming
>> <[email protected]> wrote:
>>> For reversals, the way that I might approach that is to box the utf-16
>> code units into code points then reverse and unbox. It would involve
>> overhead, but it would allow the first and second parts of the surrogates
>> to stay in the correct relationship.
>> 
>> What I think you are getting at, here, is that unicode consortium
>> "types" are not atomic types. They are sequence types.
>> 
>> J types are atomic types.
>> 
>> Boxing lets us represent sequences as atoms.
>> 
>> In other words, I somewhat agree with what you are saying, but also
>> this is an issue that can't be hidden and instead should be
>> documented.
>> 
>> Thanks,
>> 
>> --
>> Raul
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to