Don,
You are quite correct that care must be taken to convert literals in utf-8 to
unicode before concatenating.
[ lit3=:8 u: 3101
ఝ
3 u: lit3
224 176 157
[ uni_1=:7 u: 3101
ఝ
lit3,uni_1
à°ఝ
$ lit3,uni_1
4
datatype lit3,uni_1
unicode
(7 u: lit3),uni_1 NB. Displays properly when first converted
ఝఝ
$ (7 u: lit3),uni_1
2
datatype (7 u: lit3),uni_1
unicode
So the parting wisdom that I have is that when you are working with unicode in
J, you should be aware of what is going on.
It may be useful for someone with knowledge to create a lab that shows the
preferred way of dealing with conversions and encodings. I might take a run at
it eventually, but if anyone wants to be 'my hero', they could put something
together sooner.
My current confusion is over the number of ways that the outputs of 9&u: and
7&u: depend on the type of their argument.
9 u: 128512
😀
9 u: '😀'
😀
3 u: 9 u: '😀'
128512
7 u: '😀'
😀
3 u: 7 u: '😀'
55357 56832
9 u: 55357 56832
😀
9 u: 7 u: 55357 56832
😀
3 u: 9 u: 7 u: 55357 56832
128512
3 u: 7 u: 55357 56832
55357 56832
3 u: 9 u: 55357 56832
55357 56832
Cheers, bob
> On Sep 18, 2019, at 5:58 PM, Don Guinn <[email protected]> wrote:
>
> If any utf-8 characters
> are in the literal the literal must be converted to unicode before
> concatenating.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm