Dyalog stores code points directly using 1-, 2-, or 4-byte unsigned
integers. The type for a given array has to fit all the characters, and
we try to choose the smallest possible. I'm not sure how good our
surrogate pair handling is, but I think they are supposed to be combined
into single characters on input.

Marshall

On Thu, Sep 19, 2019 at 12:18:13AM +0100, Ian Clark wrote:
> Well done, Bob.
> 
> I've read the "differences between revisions" and that's a mean task you've
> completed.
> 
> I have to confess I find the new stuff totally baffling. I wrote the
> original article 2 years ago and I still have the bruises on my forehead :)
> I was ignorant of how J901 supports the newer code pages until I read it on
> this thread.
> 
> Some helpful(?) questions:
> ++ How does Dyalog APL do it?
> ++ How does Swift 5.1 do it?
> ++ How does Python 3.7 do it?
> ++ How does Javascript do it?
> …All are languages with serious pretensions to manipulating text containing
> UCPs. Maybe over 90% of application code being written in these languages
> does just that, and mostly on webpages. The writer of the Swift manuals
> published by iBooks delights in showing emojis between quotes in code
> samples. Smart stuff – but only a GUI coder or indie publisher would know
> it.
> 
> In my day-to-day programming I have little or no use for any greater
> precision than utf-8 and wide characters (…are we still calling them that?
> – how about mega-wide and giga-wide for the new precisions?) Just about the
> only use I'd have for the newer UCPs is to embed them in a PDF document via
> copy-paste. Nowadays that's more likely to be a layman's review blog than a
> learned paper. In which case I'd be at the mercy of my WP vendor to get it
> right when coding the copy/paste.
> 
> On past form, the omens are not good. From 1999 to the present day, as an
> indie publisher of books with fancy fonts, I watched Microsoft and Adobe
> completely foul-up the introduction of utf-8 to their products, notably
> export to PDF. Assuming it won't take them another 20 years to migrate to
> utf-32, I guess I can look forward to running sequential machines on emojis
> in my care home.
> 
> Ian
> 
> On Wed, 18 Sep 2019 at 20:45, 'robert therriault' via Programming <
> [email protected]> wrote:
> 
> > Hi Henry, Bill and Ian
> >
> > I have edited the wiki for the UCP page.
> >
> > The synopsis is that I included some information on how literals and utf-8
> > are related and a section on surrogate pairs. I hope I got most of this
> > right, but if I didn't please make the necessary changes and/or correct me.
> >
> > Ian, I hope that I was able to retain the spirit of what you established
> > with your excellent foundation.
> >
> > https://code.jsoftware.com/wiki/Vocabulary/UnicodeCodePoint
> >
> > Cheers, bob
> >
> > > On Sep 13, 2019, at 10:59 AM, Henry Rich <[email protected]> wrote:
> > >
> > > Detail is great, but put it towards the end of the page if possible.
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to