the problem is that this is invalid utf8, I think
{. 2 6 $ 'ఝ' ,'a','ఝ'
ఝa��
(8 <@(a.i.u:)("0) 7 u: ]) every 2 # < 'ఝ' ,'a','ఝ'
┌───────────┬──┬───────────┐
│224 176 157│97│224 176 157│
├───────────┼──┼───────────┤
│224 176 157│97│224 176 157│
└───────────┴──┴───────────┘
in
3 u: 8 u:("1) 2 6 $ 'ఝ' ,'a','ఝ'
224 176 157 97 224 176
157 224 176 157 97 224
the first line cuts off the valid termination of last utf8 character.
Similarly with 2nd line.
----- Original Message -----
From: robert therriault <[email protected]>
To: [email protected]
Sent: Thursday, June 16, 2016 2:52 PM
Subject: Re: [Jprogramming] Unicode (UTF8) string deconstruction
Thanks Pascal,
Using the original example
[s=. 2 6 $ 'ఝ' ,'a','ఝ'
ఝa��
�ఝa�
8 <@(a.i.u:)("0) 7 u: s NB. Arrays need to be dealt with as rank 1
|rank error
| 8<@(a.i.u:)("0)7 u:s
8 <@(a.i.u:)("0) 7 u:"1 s NB. Issues still arise with the partial encodings
|domain error
| 8<@(a.i.u:)("0)7 u:"1 s
8 <@(a.i.u:)("0) 7 u:"1 {. {: s NB. Issue with the non valid encoding that
J displays as �
|domain error
| 8<@(a.i.u:)("0)7 u:"1{.{:s
I think that the challenge is the partial encodings. The J IDE displays these,
but the 7 u: gives errors and even using :: for error exceptions I haven't
found a nice way around the issues.
Cheers, bob
> On Jun 16, 2016, at 11:37 AM, 'Pascal Jasmin' via Programming
> <[email protected]> wrote:
>
> 8 <@(a.i.u:)("0) 7 u: 'ఝ' ,'a','ఝ'
> ┌───────────┬──┬───────────┐
> │224 176 157│97│224 176 157│
> └───────────┴──┴───────────┘
>
>
>
>
> ----- Original Message -----
> From: robert therriault <[email protected]>
> To: [email protected]
> Sent: Thursday, June 16, 2016 2:33 PM
> Subject: Re: [Jprogramming] Unicode (UTF8) string deconstruction
>
> You are quite right Don,
>
> I should change the request to displaying unicode in UTF8 I suppose.
> Converting to unicode as you have done also allows manipulation of characters
> within arrays, but I am looking ways to show the results when reshaping
> breaks UTF8 representation.
>
> Do you have a way to take a literal array in UTF8 and box the encodings for
> each character?
>
> I have seen your posts in the past and they have helped as I work through
> this process. Thank you.
>
> One of the ways that I am looking at dealing with the width issue is to have
> the character display display in a smaller font so that some of the unicode
> display width issues can be resolved.
>
> Cheers, bob
>
>> On Jun 16, 2016, at 11:25 AM, Don Guinn <[email protected]> wrote:
>>
>> You are not dealing with unicode. You have UTF8.
>>
>> ]s=. 7 u: 'ఝ' ,'a','ఝ' NB. s is converted to unicode.
>>
>> ఝaఝ
>>
>> $s
>>
>> 3
>>
>> <"0 s
>>
>> +---+-+---+
>>
>> |ఝ|a|ఝ|
>>
>> +---+-+---+
>>
>>
>> But the display still is messed up because the display first converts the
>> unicode to UTF8. Then does a byte count to determine how many boxing
>> characters to put around the data. But there is still a problem as many
>> unicode/UTF8 characters beyond ASCII are proportional. Notice how wide the
>> first and last characters are compared to the "a".
>>
>> On Thu, Jun 16, 2016 at 12:08 PM, robert therriault <[email protected]>
>> wrote:
>>
>>> I am in the process of extending some of the type and shape visualizations
>>> that I have done in the past [0] into the realm of unicode.
>>>
>>> If you look through the archives of these message lists you will find that
>>> unicode can be quite confounding, but my question is relatively simple.
>>>
>>> I would like to take
>>>
>>> [s=. 2 6 $ 'ఝ' ,'a','ఝ' NB. � results from 224 176 157 being broken
>>> across dimensions
>>> ఝa��
>>> �ఝa�
>>> [encode=. a. i. s NB. shape of 2 6 refers to the encoding numbers
>>> not the number of characters displayed
>>> 224 176 157 97 224 176
>>> 157 224 176 157 97 224
>>>
>>> and convert encode to a form where the encoding for each character is in
>>> it's own box. Of course, this would be a verb that can work with any
>>> literal array not just the example given.
>>>
>>> [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224
>>> ┌───────────┬───────────┬───┬───┐
>>> │224 176 157│97 │224│176│
>>> ├───────────┼───────────┼───┼───┤
>>> │157 │224 176 157│97 │224│
>>> └───────────┴───────────┴───┴───┘
>>>
>>> which could be converted back to
>>>
>>> {&a. each r
>>> ┌───┬───┬─┬─┐
>>> │ఝ│a │�│�│
>>> ├───┼───┼─┼─┤
>>> │� │ఝ│a│�│
>>> └───┴───┴─┴─┘
>>>
>>> With this in place it may be possible to have the literal view of unicode
>>> display a little more consistently
>>>
>>>
>>> Any suggestions would be welcome.
>>>
>>> Cheers, bob
>>>
>>> [0] Video of Enhanced display of literals
>>> https://www.youtube.com/watch?v=BzjfJjGb5cs
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm