Re: [Jprogramming] Writing help needed: surrogate pairs

'robert therriault' via Programming Wed, 04 Sep 2019 08:23:29 -0700

Yeah, one of my challenges is to go back and understand the choices that I made 
when developing these verbs. The simple ways can decode complete sequences, but 
I needed something that would decode incomplete sequences consistent with the 
way that J did it. Once I had that working I moved on, but now I may need to go 
back and really understand what I was doing in that process.


At one point I had tried sequential machines, but I found that they were a lot 
slower than the explicit version that I came up with and even more opaque.

Cheers, bob

> On Sep 4, 2019, at 8:16 AM, Raul Miller <[email protected]> wrote:
> 
> Interesting...
> 
> I tried for a shorter version:
> 
>   bu=: [:(a.i.8&u:)&.>9&u:
> 
> But, of course, this rejects incomplete unicode sequences with a domain 
> error...
> 
> Thanks,
> 
> -- 
> Raul
> 
> 
> On Tue, Sep 3, 2019 at 11:46 PM 'robert therriault' via Programming
> <[email protected]> wrote:
>> 
>> Henry,
>> 
>> How can I turn down such a gracious invitation. :-)
>> 
>> Seriously, I will do my  best, having bounced around the unicode and 
>> unicode4 aspects of J while putting jig together. I don't claim to be able 
>> match the clear writing that you and Ian have done, but revision always 
>> welcomed and that is what wiki's are made for.
>> 
>> And I do have two verbs that parse the code points which might be useful as 
>> well.
>> 
>>    9 u: 128512
>> 
>> 
>> boxutf=: 3 : 0"1  NB. for literals
>> a=.a: [ t=.3 u: y
>> while. #t do.
>> select. s=. 127 191 223 239 I. {. t
>>  case. (0;1) do. t=.}.t [ a=.a,< {.t
>>  case.       do. if. 0={:t1=.s{.t do. a=.a,<"0 t1-.0
>>                                   elseif. 191 < >./ }.t1 do. a=.a,<"0 s{.t1 
>> [ s=.>:@:(1 i.~ 191 < }.) t1
>>                                   elseif.                do. a=.a,< t1   
>> end. t=. s }.t
>> end.
>> end.
>> }.a
>> )
>>    boxutf ": 9 u: 128512  NB. converted to literal
>> ┌───────────────┐
>> │240 159 152 128│
>> └───────────────┘
>>   240 159 152 128 { a.
>> 
>>   3 !: 0 [240 159 152 128 { a.
>> 2  NB. tyoe literal
>>    boxutf 2{. ": 9 u: 128512  NB. incomplete code breaks into nondisplayable 
>> characters
>> ┌───┬───┐
>> │240│159│
>> └───┴───┘
>>    240 159 { a.
>> ��
>> 
>> boxuni=: 3 : 0"1  NB. for unicode and unicode4
>> a=.a: [ t=.3 u: y
>> while. #t do.
>> select.  55295 57343 I. {. t
>>  case. (0;2) do. t=. }. t [ a=. a , < {. t
>>  case.       do. if. (56320&> +. 57343&<:) {: t1=.2 {. t  do. t=.  }. t [ 
>> a=.a , < {. t else. t=.2 }. t [ a=.a , < t1 end.
>> end.
>> end.
>> }.a
>> )
>>    boxuni_jig_  9 u: 128512
>> ┌──────┐
>> │128512│
>> └──────┘
>>   boxuni_jig_  7 u: 128512
>> ┌───────────┐
>> │55357 56832│
>> └───────────┘
>>    7 u: 55357 56832
>> 
>>    7 u: 55357   NB. incomplete code breaks into nondisplayable characters.
>> ���
>> 
>> The real challenge with unicode is that you get can deep into the weeds 
>> pretty fast.
>> 
>> I'll try to come up with something in the next couple of days. Anyone's 
>> suggestions on the best way to approach this are welcome.
>> 
>> Cheers, bob
>> 
>>> On Sep 3, 2019, at 7:04 PM, Henry Rich <[email protected]> wrote:
>>> 
>>> The introductory page for Unicode
>>> 
>>> https://code.jsoftware.com/wiki/Vocabulary/UnicodeCodePoint
>>> 
>>> does not discuss 4-byte characters, or the concept of surrogate pairs with 
>>> 2-byte characters.
>>> 
>>> 4-byte precision is called unicode4 in NuVoc.  If someone would add 
>>> discussion of these to the page, they would be a Hero.  I'm just saying.
>>> 
>>> Henry Rich
>>> 
>>> ---
>>> This email has been checked for viruses by AVG.
>>> https://www.avg.com
>>> 
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> 
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Writing help needed: surrogate pairs

Reply via email to