Re: [Jprogramming] Unicode (UTF8) string deconstruction

robert therriault Thu, 16 Jun 2016 11:34:09 -0700

You are quite right Don,

I should change the request to displaying unicode in UTF8 I suppose. Converting 
to unicode as you have done also allows manipulation of characters within 
arrays, but I am looking ways to show the results when reshaping breaks UTF8 
representation.


Do you have a way to take a literal array in UTF8 and box the encodings for 
each character?

I have seen your posts in the past and they have helped as I work through this 
process. Thank you.

One of the ways that I am looking at dealing with the width issue is to have 
the character display display in a smaller font so that some of the unicode 
display width issues can be resolved.

Cheers, bob

> On Jun 16, 2016, at 11:25 AM, Don Guinn <[email protected]> wrote:
> 
> You are not dealing with unicode. You have UTF8.
> 
>   ]s=.  7 u: 'ఝ' ,'a','ఝ' NB. s is converted to unicode.
> 
> ఝaఝ
> 
>      $s
> 
> 3
> 
>   <"0 s
> 
> +---+-+---+
> 
> |ఝ|a|ఝ|
> 
> +---+-+---+
> 
> 
> But the display still is messed up because the display first converts the
> unicode to UTF8. Then does a byte count to determine how many boxing
> characters to put around the data. But there is still a problem as many
> unicode/UTF8 characters beyond ASCII are proportional. Notice how wide the
> first and last characters are compared to the "a".
> 
> On Thu, Jun 16, 2016 at 12:08 PM, robert therriault <[email protected]>
> wrote:
> 
>> I am in the process of extending some of the type and shape visualizations
>> that I have done in the past [0] into the realm of unicode.
>> 
>> If you look through the archives of these message lists you will find that
>> unicode can be quite confounding, but my question is relatively simple.
>> 
>> I would like to take
>> 
>>    [s=.  2 6 $ 'ఝ' ,'a','ఝ'  NB. � results from 224 176 157 being broken
>> across dimensions
>> ఝa��
>> �ఝa�
>>   [encode=. a. i. s       NB. shape of 2 6 refers to the encoding numbers
>> not the number of characters displayed
>> 224 176 157  97 224 176
>> 157 224 176 157  97 224
>> 
>> and convert encode to a form where the encoding for each character is in
>> it's own box. Of course, this would be a verb that can work with any
>> literal array not just the example given.
>> 
>> [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224
>> ┌───────────┬───────────┬───┬───┐
>> │224 176 157│97         │224│176│
>> ├───────────┼───────────┼───┼───┤
>> │157        │224 176 157│97 │224│
>> └───────────┴───────────┴───┴───┘
>> 
>> which could be converted back to
>> 
>>    {&a.  each r
>> ┌───┬───┬─┬─┐
>> │ఝ│a  │�│�│
>> ├───┼───┼─┼─┤
>> │�  │ఝ│a│�│
>> └───┴───┴─┴─┘
>> 
>> With this in place it may be possible to have the literal view of unicode
>> display a little more consistently
>> 
>> 
>> Any suggestions would be welcome.
>> 
>> Cheers, bob
>> 
>> [0] Video of Enhanced display of literals
>> https://www.youtube.com/watch?v=BzjfJjGb5cs
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Unicode (UTF8) string deconstruction

Reply via email to