Re: [Jprogramming] Writing help needed: surrogate pairs

Raul Miller Fri, 20 Sep 2019 06:21:43 -0700

Ugh -- good point.

So concatenate strings needs to be a library utility (perhaps homebrewed
because we're not very good at looking things up...).


Thanks,

-- 
Raul

On Thursday, September 19, 2019, bill lam <[email protected]> wrote:

> That will make J language become inconsistent, eg the shape of the result
> of concatenation will be unpredictable from the shape of left and right
> arguments alone because it will depend on the content data of arguments.
>
>
> On Fri, Sep 20, 2019, 12:06 AM Raul Miller <[email protected]> wrote:
>
> > This is  an unpleasant issue, but I can imagine that we might
> > eventually need an incompatible change to the J engine for this.
> >
> > There are contexts (like pulling data out of foreign file formats)
> > where we want literals of different sizes which are not unicode
> > literals. But the typical case of concatenating literals involves
> > unicode. And unicode is a type of sequence. So it seems like it would
> > make sense for automatic coercion to use the unicode conversion
> > instead of what's essentially a numeric conversion.
> >
> > This won't be without problems, but it seems to be where we are heading.
> >
> > Thanks,
> >
> > --
> > Raul
> >
> > On Wednesday, September 18, 2019, Don Guinn <[email protected]> wrote:
> > > On the section on mixing text types - that is, byte concatenated with
> > > unicode. It should be mentioned that the conversion from byte to
> unicode
> > > simply puts a high-order byte of zeros in front of the byte. This works
> > > fine for ASCII characters but is incorrect for any utf-8 characters.
> One
> > > needs to be careful if utf-8 characters are in the literal. A similar
> > > problem can occur between unicode and unicode-32. If any utf-8
> characters
> > > are in the literal the literal must be converted to unicode before
> > > concatenating.
> > >
> > > On Wed, Sep 18, 2019 at 5:18 PM Ian Clark <[email protected]>
> wrote:
> > >
> > > > Well done, Bob.
> > > >
> > > > I've read the "differences between revisions" and that's a mean task
> > you've
> > > > completed.
> > > >
> > > > I have to confess I find the new stuff totally baffling. I wrote the
> > > > original article 2 years ago and I still have the bruises on my
> > forehead :)
> > > > I was ignorant of how J901 supports the newer code pages until I read
> > it on
> > > > this thread.
> > > >
> > > > Some helpful(?) questions:
> > > > ++ How does Dyalog APL do it?
> > > > ++ How does Swift 5.1 do it?
> > > > ++ How does Python 3.7 do it?
> > > > ++ How does Javascript do it?
> > > > …All are languages with serious pretensions to manipulating text
> > containing
> > > > UCPs. Maybe over 90% of application code being written in these
> > languages
> > > > does just that, and mostly on webpages. The writer of the Swift
> manuals
> > > > published by iBooks delights in showing emojis between quotes in code
> > > > samples. Smart stuff – but only a GUI coder or indie publisher would
> > know
> > > > it.
> > > >
> > > > In my day-to-day programming I have little or no use for any greater
> > > > precision than utf-8 and wide characters (…are we still calling them
> > that?
> > > > – how about mega-wide and giga-wide for the new precisions?) Just
> > about the
> > > > only use I'd have for the newer UCPs is to embed them in a PDF
> > document via
> > > > copy-paste. Nowadays that's more likely to be a layman's review blog
> > than a
> > > > learned paper. In which case I'd be at the mercy of my WP vendor to
> > get it
> > > > right when coding the copy/paste.
> > > >
> > > > On past form, the omens are not good. From 1999 to the present day,
> as
> > an
> > > > indie publisher of books with fancy fonts, I watched Microsoft and
> > Adobe
> > > > completely foul-up the introduction of utf-8 to their products,
> notably
> > > > export to PDF. Assuming it won't take them another 20 years to
> migrate
> > to
> > > > utf-32, I guess I can look forward to running sequential machines on
> > emojis
> > > > in my care home.
> > > >
> > > > Ian
> > > >
> > > > On Wed, 18 Sep 2019 at 20:45, 'robert therriault' via Programming <
> > > > [email protected]> wrote:
> > > >
> > > > > Hi Henry, Bill and Ian
> > > > >
> > > > > I have edited the wiki for the UCP page.
> > > > >
> > > > > The synopsis is that I included some information on how literals
> and
> > > > utf-8
> > > > > are related and a section on surrogate pairs. I hope I got most of
> > this
> > > > > right, but if I didn't please make the necessary changes and/or
> > correct
> > > > me.
> > > > >
> > > > > Ian, I hope that I was able to retain the spirit of what you
> > established
> > > > > with your excellent foundation.
> > > > >
> > > > > https://code.jsoftware.com/wiki/Vocabulary/UnicodeCodePoint
> > > > >
> > > > > Cheers, bob
> > > > >
> > > > > > On Sep 13, 2019, at 10:59 AM, Henry Rich <[email protected]>
> > wrote:
> > > > > >
> > > > > > Detail is great, but put it towards the end of the page if
> > possible.
> > > > >
> > > > >
> > ----------------------------------------------------------------------
> > > > > For information about J forums see
> > http://www.jsoftware.com/forums.htm
> > > > >
> > > > ------------------------------------------------------------
> ----------
> > > > For information about J forums see http://www.jsoftware.com/
> forums.htm
> > > >
> > > ----------------------------------------------------------------------
> > > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Writing help needed: surrogate pairs

Reply via email to