Re: [Jprogramming] Writing help needed: surrogate pairs

'robert therriault' via Programming Thu, 19 Sep 2019 08:54:20 -0700

Looks like Swift 5.0 moved to using utf-8 encoding in almost all cases. This 
link provides some insight into their decision.


https://swift.org/blog/utf8-string/

Cheers, bob

> On Sep 19, 2019, at 12:46 AM, Marshall Lochbaum <[email protected]> wrote:
> 
> Dyalog stores code points directly using 1-, 2-, or 4-byte unsigned
> integers. The type for a given array has to fit all the characters, and
> we try to choose the smallest possible. I'm not sure how good our
> surrogate pair handling is, but I think they are supposed to be combined
> into single characters on input.
> 
> Marshall
> 
> On Thu, Sep 19, 2019 at 12:18:13AM +0100, Ian Clark wrote:
>> Well done, Bob.
>> 
>> I've read the "differences between revisions" and that's a mean task you've
>> completed.
>> 
>> I have to confess I find the new stuff totally baffling. I wrote the
>> original article 2 years ago and I still have the bruises on my forehead :)
>> I was ignorant of how J901 supports the newer code pages until I read it on
>> this thread.
>> 
>> Some helpful(?) questions:
>> ++ How does Dyalog APL do it?
>> ++ How does Swift 5.1 do it?
>> ++ How does Python 3.7 do it?
>> ++ How does Javascript do it?
>> …All are languages with serious pretensions to manipulating text containing
>> UCPs. Maybe over 90% of application code being written in these languages
>> does just that, and mostly on webpages. The writer of the Swift manuals
>> published by iBooks delights in showing emojis between quotes in code
>> samples. Smart stuff – but only a GUI coder or indie publisher would know
>> it.
>> 
>> In my day-to-day programming I have little or no use for any greater
>> precision than utf-8 and wide characters (…are we still calling them that?
>> – how about mega-wide and giga-wide for the new precisions?) Just about the
>> only use I'd have for the newer UCPs is to embed them in a PDF document via
>> copy-paste. Nowadays that's more likely to be a layman's review blog than a
>> learned paper. In which case I'd be at the mercy of my WP vendor to get it
>> right when coding the copy/paste.
>> 
>> On past form, the omens are not good. From 1999 to the present day, as an
>> indie publisher of books with fancy fonts, I watched Microsoft and Adobe
>> completely foul-up the introduction of utf-8 to their products, notably
>> export to PDF. Assuming it won't take them another 20 years to migrate to
>> utf-32, I guess I can look forward to running sequential machines on emojis
>> in my care home.
>> 
>> Ian
>> 
>> On Wed, 18 Sep 2019 at 20:45, 'robert therriault' via Programming <
>> [email protected]> wrote:
>> 
>>> Hi Henry, Bill and Ian
>>> 
>>> I have edited the wiki for the UCP page.
>>> 
>>> The synopsis is that I included some information on how literals and utf-8
>>> are related and a section on surrogate pairs. I hope I got most of this
>>> right, but if I didn't please make the necessary changes and/or correct me.
>>> 
>>> Ian, I hope that I was able to retain the spirit of what you established
>>> with your excellent foundation.
>>> 
>>> https://code.jsoftware.com/wiki/Vocabulary/UnicodeCodePoint
>>> 
>>> Cheers, bob
>>> 
>>>> On Sep 13, 2019, at 10:59 AM, Henry Rich <[email protected]> wrote:
>>>> 
>>>> Detail is great, but put it towards the end of the page if possible.
>>> 
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> 
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Writing help needed: surrogate pairs

Reply via email to