Re: [Jprogramming] Unicode (UTF8) string deconstruction

bill lam Thu, 16 Jun 2016 16:27:43 -0700

internal representation of utf8 array is no different from regular
character array, utf8 only applies external interface. If you want to
manipulate unicode within j, you should use the wide character data type
(131072) as suggested by Don.
On Jun 17, 2016 2:33 AM, "robert therriault" <[email protected]> wrote:


> You are quite right Don,
>
> I should change the request to displaying unicode in UTF8 I suppose.
> Converting to unicode as you have done also allows manipulation of
> characters within arrays, but I am looking ways to show the results when
> reshaping breaks UTF8 representation.
>
> Do you have a way to take a literal array in UTF8 and box the encodings
> for each character?
>
> I have seen your posts in the past and they have helped as I work through
> this process. Thank you.
>
> One of the ways that I am looking at dealing with the width issue is to
> have the character display display in a smaller font so that some of the
> unicode display width issues can be resolved.
>
> Cheers, bob
>
> > On Jun 16, 2016, at 11:25 AM, Don Guinn <[email protected]> wrote:
> >
> > You are not dealing with unicode. You have UTF8.
> >
> >   ]s=.  7 u: 'ఝ' ,'a','ఝ' NB. s is converted to unicode.
> >
> > ఝaఝ
> >
> >      $s
> >
> > 3
> >
> >   <"0 s
> >
> > +---+-+---+
> >
> > |ఝ|a|ఝ|
> >
> > +---+-+---+
> >
> >
> > But the display still is messed up because the display first converts the
> > unicode to UTF8. Then does a byte count to determine how many boxing
> > characters to put around the data. But there is still a problem as many
> > unicode/UTF8 characters beyond ASCII are proportional. Notice how wide
> the
> > first and last characters are compared to the "a".
> >
> > On Thu, Jun 16, 2016 at 12:08 PM, robert therriault <
> [email protected]>
> > wrote:
> >
> >> I am in the process of extending some of the type and shape
> visualizations
> >> that I have done in the past [0] into the realm of unicode.
> >>
> >> If you look through the archives of these message lists you will find
> that
> >> unicode can be quite confounding, but my question is relatively simple.
> >>
> >> I would like to take
> >>
> >>    [s=.  2 6 $ 'ఝ' ,'a','ఝ'  NB. � results from 224 176 157 being broken
> >> across dimensions
> >> ఝa��
> >> �ఝa�
> >>   [encode=. a. i. s       NB. shape of 2 6 refers to the encoding
> numbers
> >> not the number of characters displayed
> >> 224 176 157  97 224 176
> >> 157 224 176 157  97 224
> >>
> >> and convert encode to a form where the encoding for each character is in
> >> it's own box. Of course, this would be a verb that can work with any
> >> literal array not just the example given.
> >>
> >> [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224
> >> ┌───────────┬───────────┬───┬───┐
> >> │224 176 157│97         │224│176│
> >> ├───────────┼───────────┼───┼───┤
> >> │157        │224 176 157│97 │224│
> >> └───────────┴───────────┴───┴───┘
> >>
> >> which could be converted back to
> >>
> >>    {&a.  each r
> >> ┌───┬───┬─┬─┐
> >> │ఝ│a  │�│�│
> >> ├───┼───┼─┼─┤
> >> │�  │ఝ│a│�│
> >> └───┴───┴─┴─┘
> >>
> >> With this in place it may be possible to have the literal view of
> unicode
> >> display a little more consistently
> >>
> >>
> >> Any suggestions would be welcome.
> >>
> >> Cheers, bob
> >>
> >> [0] Video of Enhanced display of literals
> >> https://www.youtube.com/watch?v=BzjfJjGb5cs
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Unicode (UTF8) string deconstruction

Reply via email to