Oops, I should have said I replace a.i. with 3 u: as 3 u: works just like a.i. for literal and UTF8 but does not fail with unicode. It actually w as one would expect. On Jun 16, 2016 1:45 PM, "Don Guinn" <[email protected]> wrote:
> If I have to do much processing of UTF8 I convert to unicode first. Do my > processing, then convert back to UTF8. That way I can use primitives and > definitions originally intended for literal will most likely work as > expected. I have started using 3&u: instead of a. for converting to numeric > as it works for literal, UTF8 and unicode. I just find it easier to deal > with a character always counting as one thing instead of usually one, but > often two or three things. > > On Thu, Jun 16, 2016 at 12:52 PM, robert therriault <[email protected] > > wrote: > >> Thanks Pascal, >> >> Using the original example >> >> [s=. 2 6 $ 'ఝ' ,'a','ఝ' >> ఝa�� >> �ఝa� >> >> 8 <@(a.i.u:)("0) 7 u: s NB. Arrays need to be dealt with as rank 1 >> |rank error >> | 8<@(a.i.u:)("0)7 u:s >> 8 <@(a.i.u:)("0) 7 u:"1 s NB. Issues still arise with the partial >> encodings >> |domain error >> | 8<@(a.i.u:)("0)7 u:"1 s >> 8 <@(a.i.u:)("0) 7 u:"1 {. {: s NB. Issue with the non valid >> encoding that J displays as � >> |domain error >> | 8<@(a.i.u:)("0)7 u:"1{.{:s >> >> I think that the challenge is the partial encodings. The J IDE displays >> these, but the 7 u: gives errors and even using :: for error exceptions I >> haven't found a nice way around the issues. >> >> Cheers, bob >> >> > On Jun 16, 2016, at 11:37 AM, 'Pascal Jasmin' via Programming < >> [email protected]> wrote: >> > >> > 8 <@(a.i.u:)("0) 7 u: 'ఝ' ,'a','ఝ' >> > ┌───────────┬──┬───────────┐ >> > │224 176 157│97│224 176 157│ >> > └───────────┴──┴───────────┘ >> > >> > >> > >> > >> > ----- Original Message ----- >> > From: robert therriault <[email protected]> >> > To: [email protected] >> > Sent: Thursday, June 16, 2016 2:33 PM >> > Subject: Re: [Jprogramming] Unicode (UTF8) string deconstruction >> > >> > You are quite right Don, >> > >> > I should change the request to displaying unicode in UTF8 I suppose. >> Converting to unicode as you have done also allows manipulation of >> characters within arrays, but I am looking ways to show the results when >> reshaping breaks UTF8 representation. >> > >> > Do you have a way to take a literal array in UTF8 and box the encodings >> for each character? >> > >> > I have seen your posts in the past and they have helped as I work >> through this process. Thank you. >> > >> > One of the ways that I am looking at dealing with the width issue is to >> have the character display display in a smaller font so that some of the >> unicode display width issues can be resolved. >> > >> > Cheers, bob >> > >> >> On Jun 16, 2016, at 11:25 AM, Don Guinn <[email protected]> wrote: >> >> >> >> You are not dealing with unicode. You have UTF8. >> >> >> >> ]s=. 7 u: 'ఝ' ,'a','ఝ' NB. s is converted to unicode. >> >> >> >> ఝaఝ >> >> >> >> $s >> >> >> >> 3 >> >> >> >> <"0 s >> >> >> >> +---+-+---+ >> >> >> >> |ఝ|a|ఝ| >> >> >> >> +---+-+---+ >> >> >> >> >> >> But the display still is messed up because the display first converts >> the >> >> unicode to UTF8. Then does a byte count to determine how many boxing >> >> characters to put around the data. But there is still a problem as many >> >> unicode/UTF8 characters beyond ASCII are proportional. Notice how wide >> the >> >> first and last characters are compared to the "a". >> >> >> >> On Thu, Jun 16, 2016 at 12:08 PM, robert therriault < >> [email protected]> >> >> wrote: >> >> >> >>> I am in the process of extending some of the type and shape >> visualizations >> >>> that I have done in the past [0] into the realm of unicode. >> >>> >> >>> If you look through the archives of these message lists you will find >> that >> >>> unicode can be quite confounding, but my question is relatively >> simple. >> >>> >> >>> I would like to take >> >>> >> >>> [s=. 2 6 $ 'ఝ' ,'a','ఝ' NB. � results from 224 176 157 being >> broken >> >>> across dimensions >> >>> ఝa�� >> >>> �ఝa� >> >>> [encode=. a. i. s NB. shape of 2 6 refers to the encoding >> numbers >> >>> not the number of characters displayed >> >>> 224 176 157 97 224 176 >> >>> 157 224 176 157 97 224 >> >>> >> >>> and convert encode to a form where the encoding for each character is >> in >> >>> it's own box. Of course, this would be a verb that can work with any >> >>> literal array not just the example given. >> >>> >> >>> [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224 >> >>> ┌───────────┬───────────┬───┬───┐ >> >>> │224 176 157│97 │224│176│ >> >>> ├───────────┼───────────┼───┼───┤ >> >>> │157 │224 176 157│97 │224│ >> >>> └───────────┴───────────┴───┴───┘ >> >>> >> >>> which could be converted back to >> >>> >> >>> {&a. each r >> >>> ┌───┬───┬─┬─┐ >> >>> │ఝ│a │�│�│ >> >>> ├───┼───┼─┼─┤ >> >>> │� │ఝ│a│�│ >> >>> └───┴───┴─┴─┘ >> >>> >> >>> With this in place it may be possible to have the literal view of >> unicode >> >>> display a little more consistently >> >>> >> >>> >> >>> Any suggestions would be welcome. >> >>> >> >>> Cheers, bob >> >>> >> >>> [0] Video of Enhanced display of literals >> >>> https://www.youtube.com/watch?v=BzjfJjGb5cs >> >>> ---------------------------------------------------------------------- >> >>> For information about J forums see >> http://www.jsoftware.com/forums.htm >> > >> >> ---------------------------------------------------------------------- >> >> For information about J forums see http://www.jsoftware.com/forums.htm >> > >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
