Henry,
How can I turn down such a gracious invitation. :-)
Seriously, I will do my best, having bounced around the unicode and unicode4
aspects of J while putting jig together. I don't claim to be able match the
clear writing that you and Ian have done, but revision always welcomed and that
is what wiki's are made for.
And I do have two verbs that parse the code points which might be useful as
well.
9 u: 128512
😀
boxutf=: 3 : 0"1 NB. for literals
a=.a: [ t=.3 u: y
while. #t do.
select. s=. 127 191 223 239 I. {. t
case. (0;1) do. t=.}.t [ a=.a,< {.t
case. do. if. 0={:t1=.s{.t do. a=.a,<"0 t1-.0
elseif. 191 < >./ }.t1 do. a=.a,<"0 s{.t1 [
s=.>:@:(1 i.~ 191 < }.) t1
elseif. do. a=.a,< t1 end.
t=. s }.t
end.
end.
}.a
)
boxutf ": 9 u: 128512 NB. converted to literal
┌───────────────┐
│240 159 152 128│
└───────────────┘
240 159 152 128 { a.
😀
3 !: 0 [240 159 152 128 { a.
2 NB. tyoe literal
boxutf 2{. ": 9 u: 128512 NB. incomplete code breaks into nondisplayable
characters
┌───┬───┐
│240│159│
└───┴───┘
240 159 { a.
��
boxuni=: 3 : 0"1 NB. for unicode and unicode4
a=.a: [ t=.3 u: y
while. #t do.
select. 55295 57343 I. {. t
case. (0;2) do. t=. }. t [ a=. a , < {. t
case. do. if. (56320&> +. 57343&<:) {: t1=.2 {. t do. t=. }. t [ a=.a
, < {. t else. t=.2 }. t [ a=.a , < t1 end.
end.
end.
}.a
)
boxuni_jig_ 9 u: 128512
┌──────┐
│128512│
└──────┘
boxuni_jig_ 7 u: 128512
┌───────────┐
│55357 56832│
└───────────┘
7 u: 55357 56832
😀
7 u: 55357 NB. incomplete code breaks into nondisplayable characters.
���
The real challenge with unicode is that you get can deep into the weeds pretty
fast.
I'll try to come up with something in the next couple of days. Anyone's
suggestions on the best way to approach this are welcome.
Cheers, bob
> On Sep 3, 2019, at 7:04 PM, Henry Rich <[email protected]> wrote:
>
> The introductory page for Unicode
>
> https://code.jsoftware.com/wiki/Vocabulary/UnicodeCodePoint
>
> does not discuss 4-byte characters, or the concept of surrogate pairs with
> 2-byte characters.
>
> 4-byte precision is called unicode4 in NuVoc. If someone would add
> discussion of these to the page, they would be a Hero. I'm just saying.
>
> Henry Rich
>
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm