Re: [Jprogramming] Writing help needed: surrogate pairs

'robert therriault' via Programming Wed, 18 Sep 2019 21:17:30 -0700

Don,

You are quite correct that care must be taken to convert literals in utf-8 to 
unicode before concatenating.

   [ lit3=:8 u: 3101
ఝ
    3 u: lit3
224 176 157
   [ uni_1=:7 u: 3101
ఝ
   lit3,uni_1
à°ఝ
   $ lit3,uni_1
4
   datatype lit3,uni_1
unicode
   (7 u: lit3),uni_1  NB. Displays properly when first converted
ఝఝ
   $ (7 u: lit3),uni_1
2
   datatype (7 u: lit3),uni_1
unicode

So the parting wisdom that I have is that when you are working with unicode in 
J, you should be aware of what is going on.

It may be useful for someone with knowledge to create a lab that shows the 
preferred way of dealing with conversions and encodings. I might take a run at 
it eventually, but if anyone wants to be 'my hero', they could put something 
together sooner.
My current confusion is over the number of ways that the outputs of 9&u: and 
7&u: depend on the type of their argument.

    9 u: 128512
😀
   9 u: '😀'
😀
   3 u: 9 u: '😀'
128512
   7 u: '😀'
😀
   3 u: 7 u: '😀'
55357 56832
   9 u: 55357 56832
😀

   9 u: 7 u: 55357 56832
😀
   3 u: 9 u: 7 u: 55357 56832
128512
   3 u: 7 u: 55357 56832
55357 56832
   3 u: 9 u: 55357 56832
55357 56832

Cheers, bob

> On Sep 18, 2019, at 5:58 PM, Don Guinn <[email protected]> wrote:
> 
> If any utf-8 characters
> are in the literal the literal must be converted to unicode before
> concatenating.

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Writing help needed: surrogate pairs

Reply via email to