Henry,

How can I turn down such a gracious invitation. :-)

Seriously, I will do my  best, having bounced around the unicode and unicode4 
aspects of J while putting jig together. I don't claim to be able match the 
clear writing that you and Ian have done, but revision always welcomed and that 
is what wiki's are made for. 

And I do have two verbs that parse the code points which might be useful as 
well.

    9 u: 128512
😀 

boxutf=: 3 : 0"1  NB. for literals 
a=.a: [ t=.3 u: y
while. #t do.
 select. s=. 127 191 223 239 I. {. t
  case. (0;1) do. t=.}.t [ a=.a,< {.t                   
  case.       do. if. 0={:t1=.s{.t do. a=.a,<"0 t1-.0 
                                   elseif. 191 < >./ }.t1 do. a=.a,<"0 s{.t1 [ 
s=.>:@:(1 i.~ 191 < }.) t1
                                   elseif.                do. a=.a,< t1   end. 
t=. s }.t
 end.
end.
}.a
)
    boxutf ": 9 u: 128512  NB. converted to literal
┌───────────────┐
│240 159 152 128│
└───────────────┘
   240 159 152 128 { a. 
😀
   3 !: 0 [240 159 152 128 { a.
2  NB. tyoe literal
    boxutf 2{. ": 9 u: 128512  NB. incomplete code breaks into nondisplayable 
characters
┌───┬───┐
│240│159│
└───┴───┘
    240 159 { a. 
��

boxuni=: 3 : 0"1  NB. for unicode and unicode4
a=.a: [ t=.3 u: y
while. #t do.
 select.  55295 57343 I. {. t
  case. (0;2) do. t=. }. t [ a=. a , < {. t
  case.       do. if. (56320&> +. 57343&<:) {: t1=.2 {. t  do. t=.  }. t [ a=.a 
, < {. t else. t=.2 }. t [ a=.a , < t1 end.                                     
                      
 end.
end.
}.a
)
    boxuni_jig_  9 u: 128512
┌──────┐
│128512│
└──────┘
   boxuni_jig_  7 u: 128512
┌───────────┐
│55357 56832│
└───────────┘
    7 u: 55357 56832
😀
    7 u: 55357   NB. incomplete code breaks into nondisplayable characters.
���

The real challenge with unicode is that you get can deep into the weeds pretty 
fast. 

I'll try to come up with something in the next couple of days. Anyone's 
suggestions on the best way to approach this are welcome. 

Cheers, bob

> On Sep 3, 2019, at 7:04 PM, Henry Rich <[email protected]> wrote:
> 
> The introductory page for Unicode
> 
> https://code.jsoftware.com/wiki/Vocabulary/UnicodeCodePoint
> 
> does not discuss 4-byte characters, or the concept of surrogate pairs with 
> 2-byte characters.
> 
> 4-byte precision is called unicode4 in NuVoc.  If someone would add 
> discussion of these to the page, they would be a Hero.  I'm just saying.
> 
> Henry Rich
> 
> ---
> This email has been checked for viruses by AVG.
> https://www.avg.com
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to