But your s contains illegal utf8 characters. isutf8=: 1:@(7&u:) ::0:
isutf8 'ఝ' ,'a','ఝ' 1 isutf8"1[ 8 6$ 'ఝ' ,'a','ఝ' 0 0 0 1 0 0 0 0 isutf8"1[ 8 7$ 'ఝ' ,'a','ఝ' 1 1 1 1 1 1 1 1 Since the 3 wide characters string is a 7 byte in utf8 a.i.'ఝ' ,'a','ఝ' 224 176 157 97 224 176 157 8 6 $ .... is not what you would expected. perhaps you meant [s=: 8 6 $ 7 u: 'ఝ' ,'a','ఝ' ఝaఝఝaఝ ఝaఝఝaఝ ఝaఝఝaఝ ఝaఝఝaఝ ఝaఝఝaఝ ఝaఝఝaఝ ఝaఝఝaఝ ఝaఝఝaఝ On Jun 18, 2016 7:30 AM, "robert therriault" <[email protected]> wrote: > Thanks for all the suggestions everyone. > > In the end I took a more explicit approach than I normally would, but it > seems to work. > > I am not sure if this is useful for Henry, but it is one approach. > > [s=. 8 6 $ 'ఝ' ,'a','ఝ' > ఝa�� > �ఝa� > ��ఝa > ఝఝ > aఝ�� > �aఝ� > ��aఝ > ఝa�� > boxutf s > ┌───────────┬───────────┬───────────┬───────────┐ > │224 176 157│97 │224 │176 │ > ├───────────┼───────────┼───────────┼───────────┤ > │157 │224 176 157│97 │224 │ > ├───────────┼───────────┼───────────┼───────────┤ > │176 │157 │224 176 157│97 │ > ├───────────┼───────────┼───────────┼───────────┤ > │224 176 157│224 176 157│ │ │ > ├───────────┼───────────┼───────────┼───────────┤ > │97 │224 176 157│224 │176 │ > ├───────────┼───────────┼───────────┼───────────┤ > │157 │97 │224 176 157│224 │ > ├───────────┼───────────┼───────────┼───────────┤ > │176 │157 │97 │224 176 157│ > ├───────────┼───────────┼───────────┼───────────┤ > │224 176 157│97 │224 │176 │ > └───────────┴───────────┴───────────┴───────────┘ > {&a. each boxutf s > ┌───┬───┬───┬───┐ > │ఝ│a │� │� │ > ├───┼───┼───┼───┤ > │� │ఝ│a │� │ > ├───┼───┼───┼───┤ > │� │� │ఝ│a │ > ├───┼───┼───┼───┤ > │ఝ│ఝ│ │ │ > ├───┼───┼───┼───┤ > │a │ఝ│� │� │ > ├───┼───┼───┼───┤ > │� │a │ఝ│� │ > ├───┼───┼───┼───┤ > │� │� │a │ఝ│ > ├───┼───┼───┼───┤ > │ఝ│a │� │� │ > └───┴───┴───┴───┘ > boxutf > }:@utf@(3&u:)@": > utf > 3 : 0"1 > if. y-:'' do. return. end. > try. ((utf@:((1<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (1<.#) {. ]))) y > catch. try. ((utf@:((2<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (2<.#) {. > ]))) y > catch. try. ((utf@:((3<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (3<.#) {. > ]))) y > catch. try. ((utf@:((4<.#)}.]));~((3 u: ":)@: (7 u: a.{~ (4<.#) > {. ]))) y > catch. ({. ; utf@}.) y > end. > end. > end. > end. > ) > > Row by row I am just grabbing up to 4 UTF8 numbers and boxing them. > Whenever the numbers are valid I box them and move on with the remaining > part of the row. > > I am sure others will find a more elegant approach, but this seems to work. > > Cheers, bob > > > On Jun 16, 2016, at 4:27 PM, bill lam <[email protected]> wrote: > > > > internal representation of utf8 array is no different from regular > > character array, utf8 only applies external interface. If you want to > > manipulate unicode within j, you should use the wide character data type > > (131072) as suggested by Don. > > On Jun 17, 2016 2:33 AM, "robert therriault" <[email protected]> > wrote: > > > >> You are quite right Don, > >> > >> I should change the request to displaying unicode in UTF8 I suppose. > >> Converting to unicode as you have done also allows manipulation of > >> characters within arrays, but I am looking ways to show the results when > >> reshaping breaks UTF8 representation. > >> > >> Do you have a way to take a literal array in UTF8 and box the encodings > >> for each character? > >> > >> I have seen your posts in the past and they have helped as I work > through > >> this process. Thank you. > >> > >> One of the ways that I am looking at dealing with the width issue is to > >> have the character display display in a smaller font so that some of the > >> unicode display width issues can be resolved. > >> > >> Cheers, bob > >> > >>> On Jun 16, 2016, at 11:25 AM, Don Guinn <[email protected]> wrote: > >>> > >>> You are not dealing with unicode. You have UTF8. > >>> > >>> ]s=. 7 u: 'ఝ' ,'a','ఝ' NB. s is converted to unicode. > >>> > >>> ఝaఝ > >>> > >>> $s > >>> > >>> 3 > >>> > >>> <"0 s > >>> > >>> +---+-+---+ > >>> > >>> |ఝ|a|ఝ| > >>> > >>> +---+-+---+ > >>> > >>> > >>> But the display still is messed up because the display first converts > the > >>> unicode to UTF8. Then does a byte count to determine how many boxing > >>> characters to put around the data. But there is still a problem as many > >>> unicode/UTF8 characters beyond ASCII are proportional. Notice how wide > >> the > >>> first and last characters are compared to the "a". > >>> > >>> On Thu, Jun 16, 2016 at 12:08 PM, robert therriault < > >> [email protected]> > >>> wrote: > >>> > >>>> I am in the process of extending some of the type and shape > >> visualizations > >>>> that I have done in the past [0] into the realm of unicode. > >>>> > >>>> If you look through the archives of these message lists you will find > >> that > >>>> unicode can be quite confounding, but my question is relatively > simple. > >>>> > >>>> I would like to take > >>>> > >>>> [s=. 2 6 $ 'ఝ' ,'a','ఝ' NB. � results from 224 176 157 being > broken > >>>> across dimensions > >>>> ఝa�� > >>>> �ఝa� > >>>> [encode=. a. i. s NB. shape of 2 6 refers to the encoding > >> numbers > >>>> not the number of characters displayed > >>>> 224 176 157 97 224 176 > >>>> 157 224 176 157 97 224 > >>>> > >>>> and convert encode to a form where the encoding for each character is > in > >>>> it's own box. Of course, this would be a verb that can work with any > >>>> literal array not just the example given. > >>>> > >>>> [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224 > >>>> ┌───────────┬───────────┬───┬───┐ > >>>> │224 176 157│97 │224│176│ > >>>> ├───────────┼───────────┼───┼───┤ > >>>> │157 │224 176 157│97 │224│ > >>>> └───────────┴───────────┴───┴───┘ > >>>> > >>>> which could be converted back to > >>>> > >>>> {&a. each r > >>>> ┌───┬───┬─┬─┐ > >>>> │ఝ│a │�│�│ > >>>> ├───┼───┼─┼─┤ > >>>> │� │ఝ│a│�│ > >>>> └───┴───┴─┴─┘ > >>>> > >>>> With this in place it may be possible to have the literal view of > >> unicode > >>>> display a little more consistently > >>>> > >>>> > >>>> Any suggestions would be welcome. > >>>> > >>>> Cheers, bob > >>>> > >>>> [0] Video of Enhanced display of literals > >>>> https://www.youtube.com/watch?v=BzjfJjGb5cs > >>>> ---------------------------------------------------------------------- > >>>> For information about J forums see > http://www.jsoftware.com/forums.htm > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >> > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
