Re: [Jprogramming] RFC: unicode

bill lam Sat, 19 Mar 2022 23:10:27 -0700

if ": and u: were implemented using foreign conjunction, J would be more
pure.
The original J dictionary said nothing about unicode at all. How to handle
unicode in ": is implementation dependent.


https://www.jsoftware.com/docs/help807/dictionary/d602.htm

On Sun, Mar 20, 2022 at 1:56 PM Raul Miller <[email protected]> wrote:

> I think a point has been lost here (partially because of hasty
> statements I made, where I was not considering all of the details of
> how ": works) on why getting rid of u: would not change anything about
> the initial example in this thread:
>
>    #x=: 8 u: 97 243 98
> 4
>    datatype x
> literal
>    #z=: 10 u:  97 195 179 98
> 4
>    datatype z
> unicode4
>    datatype x,z
> unicode4
>    #x,z
> 8
>
> When displayed, x is displayed as utf-8. This is largely due to
> properties of the host environment and the operating system. Here, x
> is treated as an array of unicode octets.
>
> When we combine x and z into an array, x is not treated as an array of
> octets. It is, instead treated as a utf-32 sequence. Discarding u:
> would not change this, because u: was not involved in that operation.
>
> Most likely, the operation you were looking for was something like
>
>    #x,&":z
> 10
>
> or
>
>    #x,&(8 u: ]) z
> 10
>
> Here, we are not treating x as a utf-32 array -- we are instead first
> representing z as utf-8.
>
> And, again, discarding u: would not change this aspect of J (except to
> cause an error for the x,&(8 u: ]) z example).
>
> Thanks,
>
> --
> Raul
>
> On Sun, Mar 20, 2022 at 1:10 AM Raul Miller <[email protected]> wrote:
> >
> > On Sat, Mar 19, 2022 at 8:34 PM Elijah Stone <[email protected]>
> wrote:
> > > I think a deprecation period would probably be a good idea.
> >
> > I think we would  need to complete the preceding steps before we
> > attempted such a thing.
> >
> > Deprecation based on something which has not been implemented is bad
> news.
> >
> > > Per the dictionary:
> > >
> > > > ": converts literal2 and literal4 to U8 encoded 1-byte char
> >
> > Yes, I realized that after I hit send on that message.
> >
> > > Not specified is whether literal2 is interpreted as ucs-2 or utf-16.
> > > Experimentally, it is utf-16.
> >
> > It's my understanding that ucs-2 is a subset of utf-16.
> >
> > > >   ; verb each sequence
> > >
> > > I don't understand the significance of this.
> >
> > Generally speaking, when you are working with text, you are working
> > with arbitrary length sequences. So, boxing intermediate results and
> > razing the boxes is a frequently used idiom.
> >
> >    ;(# ":)each 1 2 3
> > 122333
> >
> > > > Generally speaking, if you want an unambiguous representation of your
> > > > data, you should use something like {{ 5!:5<'y' }} rather than ":
> > >
> > > I don't need unambiguous.  I'll take non-obfuscatory.  And, as
> mentioned,
> > > the behaviour of ": here is inconsistent with other primitives.
> >
> > Every primitive is in some sense "inconsistent" with other primitives,
> > because every primitive accomplishes something different.
> >
> > The ": primitive is about formatting text for display. That is going
> > to have to be different from an operation like addition.
> >
> > > > it is not being displayed correctly.
> > >
> > > The display seems correct to me.
> >
> > Ah, that was my browser / email client messing up.
> >
> > Thanks,
> >
> > --
> > Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] RFC: unicode

Reply via email to