Re: [Jprogramming] Unicode (UTF8) string deconstruction

Don Guinn Thu, 16 Jun 2016 13:27:12 -0700

Oops, I should have said I replace a.i. with 3 u: as 3 u: works just like
a.i. for literal and UTF8 but does not fail with unicode. It actually w as
one would expect.
On Jun 16, 2016 1:45 PM, "Don Guinn" <[email protected]> wrote:


> If I have to do much processing of UTF8 I convert to unicode first. Do my
> processing, then convert back to UTF8. That way I can use primitives and
> definitions originally intended for literal will most likely work as
> expected. I have started using 3&u: instead of a. for converting to numeric
> as it works for literal, UTF8 and unicode. I just find it easier to deal
> with a character always counting as one thing instead of usually one, but
> often two or three things.
>
> On Thu, Jun 16, 2016 at 12:52 PM, robert therriault <[email protected]
> > wrote:
>
>> Thanks Pascal,
>>
>> Using the original example
>>
>>  [s=.  2 6 $ 'ఝ' ,'a','ఝ'
>> ఝa��
>> �ఝa�
>>
>>     8 <@(a.i.u:)("0) 7 u: s  NB. Arrays need to be dealt with as rank 1
>> |rank error
>> |   8<@(a.i.u:)("0)7     u:s
>>     8 <@(a.i.u:)("0) 7 u:"1 s  NB. Issues still arise with the partial
>> encodings
>> |domain error
>> |   8<@(a.i.u:)("0)7     u:"1 s
>>     8 <@(a.i.u:)("0) 7 u:"1 {. {: s  NB. Issue with the non valid
>> encoding that J displays as �
>> |domain error
>> |   8<@(a.i.u:)("0)7     u:"1{.{:s
>>
>> I think that the challenge is the partial encodings. The J IDE displays
>> these, but the 7 u: gives errors and even using :: for error exceptions I
>> haven't found a nice way around the issues.
>>
>> Cheers, bob
>>
>> > On Jun 16, 2016, at 11:37 AM, 'Pascal Jasmin' via Programming <
>> [email protected]> wrote:
>> >
>> > 8 <@(a.i.u:)("0) 7 u: 'ఝ' ,'a','ఝ'
>> > ┌───────────┬──┬───────────┐
>> > │224 176 157│97│224 176 157│
>> > └───────────┴──┴───────────┘
>> >
>> >
>> >
>> >
>> > ----- Original Message -----
>> > From: robert therriault <[email protected]>
>> > To: [email protected]
>> > Sent: Thursday, June 16, 2016 2:33 PM
>> > Subject: Re: [Jprogramming] Unicode (UTF8) string deconstruction
>> >
>> > You are quite right Don,
>> >
>> > I should change the request to displaying unicode in UTF8 I suppose.
>> Converting to unicode as you have done also allows manipulation of
>> characters within arrays, but I am looking ways to show the results when
>> reshaping breaks UTF8 representation.
>> >
>> > Do you have a way to take a literal array in UTF8 and box the encodings
>> for each character?
>> >
>> > I have seen your posts in the past and they have helped as I work
>> through this process. Thank you.
>> >
>> > One of the ways that I am looking at dealing with the width issue is to
>> have the character display display in a smaller font so that some of the
>> unicode display width issues can be resolved.
>> >
>> > Cheers, bob
>> >
>> >> On Jun 16, 2016, at 11:25 AM, Don Guinn <[email protected]> wrote:
>> >>
>> >> You are not dealing with unicode. You have UTF8.
>> >>
>> >>  ]s=.  7 u: 'ఝ' ,'a','ఝ' NB. s is converted to unicode.
>> >>
>> >> ఝaఝ
>> >>
>> >>     $s
>> >>
>> >> 3
>> >>
>> >>  <"0 s
>> >>
>> >> +---+-+---+
>> >>
>> >> |ఝ|a|ఝ|
>> >>
>> >> +---+-+---+
>> >>
>> >>
>> >> But the display still is messed up because the display first converts
>> the
>> >> unicode to UTF8. Then does a byte count to determine how many boxing
>> >> characters to put around the data. But there is still a problem as many
>> >> unicode/UTF8 characters beyond ASCII are proportional. Notice how wide
>> the
>> >> first and last characters are compared to the "a".
>> >>
>> >> On Thu, Jun 16, 2016 at 12:08 PM, robert therriault <
>> [email protected]>
>> >> wrote:
>> >>
>> >>> I am in the process of extending some of the type and shape
>> visualizations
>> >>> that I have done in the past [0] into the realm of unicode.
>> >>>
>> >>> If you look through the archives of these message lists you will find
>> that
>> >>> unicode can be quite confounding, but my question is relatively
>> simple.
>> >>>
>> >>> I would like to take
>> >>>
>> >>>   [s=.  2 6 $ 'ఝ' ,'a','ఝ'  NB. � results from 224 176 157 being
>> broken
>> >>> across dimensions
>> >>> ఝa��
>> >>> �ఝa�
>> >>>  [encode=. a. i. s       NB. shape of 2 6 refers to the encoding
>> numbers
>> >>> not the number of characters displayed
>> >>> 224 176 157  97 224 176
>> >>> 157 224 176 157  97 224
>> >>>
>> >>> and convert encode to a form where the encoding for each character is
>> in
>> >>> it's own box. Of course, this would be a verb that can work with any
>> >>> literal array not just the example given.
>> >>>
>> >>> [r=. 2 4 $ 224 176 157 ; 97 ; 224 ; 176 ; 157 ; 224 176 157 ; 97 ; 224
>> >>> ┌───────────┬───────────┬───┬───┐
>> >>> │224 176 157│97         │224│176│
>> >>> ├───────────┼───────────┼───┼───┤
>> >>> │157        │224 176 157│97 │224│
>> >>> └───────────┴───────────┴───┴───┘
>> >>>
>> >>> which could be converted back to
>> >>>
>> >>>   {&a.  each r
>> >>> ┌───┬───┬─┬─┐
>> >>> │ఝ│a  │�│�│
>> >>> ├───┼───┼─┼─┤
>> >>> │�  │ఝ│a│�│
>> >>> └───┴───┴─┴─┘
>> >>>
>> >>> With this in place it may be possible to have the literal view of
>> unicode
>> >>> display a little more consistently
>> >>>
>> >>>
>> >>> Any suggestions would be welcome.
>> >>>
>> >>> Cheers, bob
>> >>>
>> >>> [0] Video of Enhanced display of literals
>> >>> https://www.youtube.com/watch?v=BzjfJjGb5cs
>> >>> ----------------------------------------------------------------------
>> >>> For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> >
>> >> ----------------------------------------------------------------------
>> >> For information about J forums see http://www.jsoftware.com/forums.htm
>> >
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Unicode (UTF8) string deconstruction

Reply via email to