[Jprogramming] Unicode string deconstruction

robert therriault Thu, 16 Jun 2016 11:09:06 -0700

I am in the process of extending some of the type and shape visualizations that 
I have done in the past [0] into the realm of unicode.


If you look through the archives of these message lists you will find that 
unicode can be quite confounding, but my question is relatively simple.

I would like to take 

    [s=.  2 6 $ 'ఝ' ,'a','ఝ'  NB. � results from 224 176 157 being broken 
across dimensions 
ఝa��
�ఝa�
   [encode=. a. i. s       NB. shape of 2 6 refers to the encoding numbers not 
the number of characters displayed
224 176 157  97 224 176
157 224 176 157  97 224

and convert encode to a form where the encoding for each character is in it's 
own box. Of course, this would be a verb that can work with any literal array 
not just the example given.

 [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224
┌───────────┬───────────┬───┬───┐
│224 176 157│97         │224│176│
├───────────┼───────────┼───┼───┤
│157        │224 176 157│97 │224│
└───────────┴───────────┴───┴───┘

which could be converted back to 

    {&a.  each r
┌───┬───┬─┬─┐
│ఝ│a  │�│�│
├───┼───┼─┼─┤
│�  │ఝ│a│�│
└───┴───┴─┴─┘

With this in place it may be possible to have the literal view of unicode 
display a little more consistently


Any suggestions would be welcome.

Cheers, bob

[0] Video of Enhanced display of literals 
https://www.youtube.com/watch?v=BzjfJjGb5cs
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] Unicode string deconstruction

Reply via email to