I suppose my most radical idea would be to store all the characters as integers which would correspond to Unicode Codepoints and only be converted to the specific encoding (utf-8, utf-16, or utf-32) depending on type upon display.
I am not advocating this, but just offer it up for consideration. It would be a fundamental change to the way J handles characters and only the implementation team could determine whether it is even close to worthwhile. Cheers, bob > On Sep 14, 2019, at 6:28 AM, bill lam <[email protected]> wrote: > >> J types are atomic types > > exactly. and J language primitives operate on atomic types, doesn't care > about utf8/utf16 encodings at all. > > On Sat, Sep 14, 2019, 8:07 PM Raul Miller <[email protected]> wrote: > >> On Sat, Sep 14, 2019 at 1:50 AM 'robert therriault' via Programming >> <[email protected]> wrote: >>> For reversals, the way that I might approach that is to box the utf-16 >> code units into code points then reverse and unbox. It would involve >> overhead, but it would allow the first and second parts of the surrogates >> to stay in the correct relationship. >> >> What I think you are getting at, here, is that unicode consortium >> "types" are not atomic types. They are sequence types. >> >> J types are atomic types. >> >> Boxing lets us represent sequences as atoms. >> >> In other words, I somewhat agree with what you are saying, but also >> this is an issue that can't be hidden and instead should be >> documented. >> >> Thanks, >> >> -- >> Raul >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
