from Wikipedia
Number
of bytesBits for
code pointFirst
code pointLast
code pointByte 1Byte 2Byte 3Byte 4
1 7 U+0000 U+007F 0xxxxxxx
2 11 U+0080 U+07FF 110xxxxx 10xxxxxx
3 16 U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
4 21 U+10000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

so it can be cut dyad using <;.1 with a mask +./ the ranges of the byte 1
of each of the 4 cases.

On Sun, Sep 15, 2019, 8:45 AM bill lam <[email protected]> wrote:

> Yes, needs to handle this situation, the value of the leading byte of a
> multi-byte utf8 will tell its length whether 2,3 or 4 bytes.
>
> On Sun, Sep 15, 2019, 8:39 AM Raul Miller <[email protected]> wrote:
>
>> And if there's several adjacent characters with representation in that
>> range?
>>
>> Thanks,
>>
>> --
>> Raul
>>
>> On Saturday, September 14, 2019, bill lam <[email protected]> wrote:
>>
>> > That's the easiest way.
>> >
>> > A harder way without conversion to unicode is the keep bytes higher 127
>> in
>> > sequence during reversing,  but beware of contiguous multi bytes. this
>> > needs to study utf8 encoding rfc or whatnot.
>> >
>> > On Fri, Sep 13, 2019, 6:30 AM Henry Rich <[email protected]> wrote:
>> >
>> > > Raul said what I was thinking.
>> > >
>> > > The point is, a literal string is UTF-8 (i. e. bytes) NOT unicode.
>> When
>> > > you reverse the bytes in a UTF-8 string you get garbage.  You need to
>> > > convert to Unicode first, as Raul showed.
>> > >
>> > > Henry Rich
>> > >
>> > > On 9/12/2019 4:54 PM, Mark Linton wrote:
>> > > > I’ve used
>> > > > a.
>> > > > And
>> > > > u: i. 255
>> > > > To create a list of characters (ASCII alphabet)
>> > > >
>> > > > Is there a way to change character-sets or code-pages or fonts to
>> > change
>> > > > the letters that are displayed when a J sentence is executed?
>> > > >
>> > > > Would it be a setting in the terminal or a verb in J?
>> > > >
>> > > > I’ve also noticed that |. has problems reversing two byte UTF-8
>> > > > characters?  What is the preferred  method for reversing  such
>> > characters
>> > > >   in a string
>> > >
>> > >
>> > > ---
>> > > This email has been checked for viruses by AVG.
>> > > https://www.avg.com
>> > >
>> > > ----------------------------------------------------------------------
>> > > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> > >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to