Yes there are certainly illegal utf8 characters in 8 6$ 'ఝ' ,'a','ఝ', but what
I am attempting is to reveal the illegal characters for what they are. Along 
the lines
of the shape and type display that I had used incorporating svg. Once i have 
that information
in a format that I can separate the illegal characters from the legal and allow 
a viewer to see
the information by hovering over the character, then the reasons for 8 6$ 'ఝ' 
,'a','ఝ' looking the 
way that it does on the j display becomes more apparent. 

Also, being able to distinguish between the 
1, 2, 3, and 4 byte utf8 representations may allow a bit more consistency in 
the way that the boxed
versions of these  characters display. 

It remains to be seen how far I get with this, but the ability to show the 
representation framework of
a utf8 array is a step. :-)

Cheers, bob

> On Jun 17, 2016, at 5:13 PM, bill lam <[email protected]> wrote:
> 
> But your s contains illegal utf8 characters.
> 
> isutf8=: 1:@(7&u:) ::0:
> 
>   isutf8 'ఝ' ,'a','ఝ'
> 1
>   isutf8"1[ 8 6$ 'ఝ' ,'a','ఝ'
> 0 0 0 1 0 0 0 0
> 
>   isutf8"1[ 8 7$ 'ఝ' ,'a','ఝ'
> 1 1 1 1 1 1 1 1
> 
> Since the 3 wide characters string is a 7 byte in utf8
>  a.i.'ఝ' ,'a','ఝ'
> 224 176 157 97 224 176 157
> 8 6 $ .... is not what you would expected. perhaps you meant
> 
>   [s=: 8 6 $ 7 u: 'ఝ' ,'a','ఝ'
> ఝaఝఝaఝ
> ఝaఝఝaఝ
> ఝaఝఝaఝ
> ఝaఝఝaఝ
> ఝaఝఝaఝ
> ఝaఝఝaఝ
> ఝaఝఝaఝ
> ఝaఝఝaఝ
> On Jun 18, 2016 7:30 AM, "robert therriault" <[email protected]> wrote:
> 
>> Thanks for all the suggestions everyone.
>> 
>> In the end I took a more explicit approach than I normally would, but it
>> seems to work.
>> 
>> I am not sure if this is useful for Henry, but it is one approach.
>> 
>>    [s=.  8 6 $ 'ఝ' ,'a','ఝ'
>> ఝa��
>> �ఝa�
>> ��ఝa
>> ఝఝ
>> aఝ��
>> �aఝ�
>> ��aఝ
>> ఝa��
>>   boxutf  s
>> ┌───────────┬───────────┬───────────┬───────────┐
>> │224 176 157│97         │224        │176        │
>> ├───────────┼───────────┼───────────┼───────────┤
>> │157        │224 176 157│97         │224        │
>> ├───────────┼───────────┼───────────┼───────────┤
>> │176        │157        │224 176 157│97         │
>> ├───────────┼───────────┼───────────┼───────────┤
>> │224 176 157│224 176 157│           │           │
>> ├───────────┼───────────┼───────────┼───────────┤
>> │97         │224 176 157│224        │176        │
>> ├───────────┼───────────┼───────────┼───────────┤
>> │157        │97         │224 176 157│224        │
>> ├───────────┼───────────┼───────────┼───────────┤
>> │176        │157        │97         │224 176 157│
>> ├───────────┼───────────┼───────────┼───────────┤
>> │224 176 157│97         │224        │176        │
>> └───────────┴───────────┴───────────┴───────────┘
>>   {&a. each boxutf  s
>> ┌───┬───┬───┬───┐
>> │ఝ│a  │�  │�  │
>> ├───┼───┼───┼───┤
>> │�  │ఝ│a  │�  │
>> ├───┼───┼───┼───┤
>> │�  │�  │ఝ│a  │
>> ├───┼───┼───┼───┤
>> │ఝ│ఝ│   │   │
>> ├───┼───┼───┼───┤
>> │a  │ఝ│�  │�  │
>> ├───┼───┼───┼───┤
>> │�  │a  │ఝ│�  │
>> ├───┼───┼───┼───┤
>> │�  │�  │a  │ఝ│
>> ├───┼───┼───┼───┤
>> │ఝ│a  │�  │�  │
>> └───┴───┴───┴───┘
>>   boxutf
>> }:@utf@(3&u:)@":
>>   utf
>> 3 : 0"1
>> if. y-:'' do. return. end.
>> try.  ((utf@:((1<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (1<.#) {. ]))) y
>>   catch. try. ((utf@:((2<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (2<.#) {.
>> ]))) y
>>     catch. try. ((utf@:((3<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (3<.#) {.
>> ]))) y
>>       catch. try.  ((utf@:((4<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (4<.#)
>> {. ]))) y
>>                       catch. ({. ; utf@}.) y
>>                       end.
>>       end.
>>     end.
>>   end.
>> )
>> 
>> Row by row I am just grabbing up to 4 UTF8 numbers and boxing them.
>> Whenever the numbers are valid I box them and move on with the remaining
>> part of the row.
>> 
>> I am sure others will find a more elegant approach, but this seems to work.
>> 
>> Cheers, bob
>> 
>>> On Jun 16, 2016, at 4:27 PM, bill lam <[email protected]> wrote:
>>> 
>>> internal representation of utf8 array is no different from regular
>>> character array, utf8 only applies external interface. If you want to
>>> manipulate unicode within j, you should use the wide character data type
>>> (131072) as suggested by Don.
>>> On Jun 17, 2016 2:33 AM, "robert therriault" <[email protected]>
>> wrote:
>>> 
>>>> You are quite right Don,
>>>> 
>>>> I should change the request to displaying unicode in UTF8 I suppose.
>>>> Converting to unicode as you have done also allows manipulation of
>>>> characters within arrays, but I am looking ways to show the results when
>>>> reshaping breaks UTF8 representation.
>>>> 
>>>> Do you have a way to take a literal array in UTF8 and box the encodings
>>>> for each character?
>>>> 
>>>> I have seen your posts in the past and they have helped as I work
>> through
>>>> this process. Thank you.
>>>> 
>>>> One of the ways that I am looking at dealing with the width issue is to
>>>> have the character display display in a smaller font so that some of the
>>>> unicode display width issues can be resolved.
>>>> 
>>>> Cheers, bob
>>>> 
>>>>> On Jun 16, 2016, at 11:25 AM, Don Guinn <[email protected]> wrote:
>>>>> 
>>>>> You are not dealing with unicode. You have UTF8.
>>>>> 
>>>>> ]s=.  7 u: 'ఝ' ,'a','ఝ' NB. s is converted to unicode.
>>>>> 
>>>>> ఝaఝ
>>>>> 
>>>>>    $s
>>>>> 
>>>>> 3
>>>>> 
>>>>> <"0 s
>>>>> 
>>>>> +---+-+---+
>>>>> 
>>>>> |ఝ|a|ఝ|
>>>>> 
>>>>> +---+-+---+
>>>>> 
>>>>> 
>>>>> But the display still is messed up because the display first converts
>> the
>>>>> unicode to UTF8. Then does a byte count to determine how many boxing
>>>>> characters to put around the data. But there is still a problem as many
>>>>> unicode/UTF8 characters beyond ASCII are proportional. Notice how wide
>>>> the
>>>>> first and last characters are compared to the "a".
>>>>> 
>>>>> On Thu, Jun 16, 2016 at 12:08 PM, robert therriault <
>>>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> I am in the process of extending some of the type and shape
>>>> visualizations
>>>>>> that I have done in the past [0] into the realm of unicode.
>>>>>> 
>>>>>> If you look through the archives of these message lists you will find
>>>> that
>>>>>> unicode can be quite confounding, but my question is relatively
>> simple.
>>>>>> 
>>>>>> I would like to take
>>>>>> 
>>>>>>  [s=.  2 6 $ 'ఝ' ,'a','ఝ'  NB. � results from 224 176 157 being
>> broken
>>>>>> across dimensions
>>>>>> ఝa��
>>>>>> �ఝa�
>>>>>> [encode=. a. i. s       NB. shape of 2 6 refers to the encoding
>>>> numbers
>>>>>> not the number of characters displayed
>>>>>> 224 176 157  97 224 176
>>>>>> 157 224 176 157  97 224
>>>>>> 
>>>>>> and convert encode to a form where the encoding for each character is
>> in
>>>>>> it's own box. Of course, this would be a verb that can work with any
>>>>>> literal array not just the example given.
>>>>>> 
>>>>>> [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224
>>>>>> ┌───────────┬───────────┬───┬───┐
>>>>>> │224 176 157│97         │224│176│
>>>>>> ├───────────┼───────────┼───┼───┤
>>>>>> │157        │224 176 157│97 │224│
>>>>>> └───────────┴───────────┴───┴───┘
>>>>>> 
>>>>>> which could be converted back to
>>>>>> 
>>>>>>  {&a.  each r
>>>>>> ┌───┬───┬─┬─┐
>>>>>> │ఝ│a  │�│�│
>>>>>> ├───┼───┼─┼─┤
>>>>>> │�  │ఝ│a│�│
>>>>>> └───┴───┴─┴─┘
>>>>>> 
>>>>>> With this in place it may be possible to have the literal view of
>>>> unicode
>>>>>> display a little more consistently
>>>>>> 
>>>>>> 
>>>>>> Any suggestions would be welcome.
>>>>>> 
>>>>>> Cheers, bob
>>>>>> 
>>>>>> [0] Video of Enhanced display of literals
>>>>>> https://www.youtube.com/watch?v=BzjfJjGb5cs
>>>>>> ----------------------------------------------------------------------
>>>>>> For information about J forums see
>> http://www.jsoftware.com/forums.htm
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>> 
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> 
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to