if ": and u: were implemented using foreign conjunction, J would be more pure. The original J dictionary said nothing about unicode at all. How to handle unicode in ": is implementation dependent.
https://www.jsoftware.com/docs/help807/dictionary/d602.htm On Sun, Mar 20, 2022 at 1:56 PM Raul Miller <[email protected]> wrote: > I think a point has been lost here (partially because of hasty > statements I made, where I was not considering all of the details of > how ": works) on why getting rid of u: would not change anything about > the initial example in this thread: > > #x=: 8 u: 97 243 98 > 4 > datatype x > literal > #z=: 10 u: 97 195 179 98 > 4 > datatype z > unicode4 > datatype x,z > unicode4 > #x,z > 8 > > When displayed, x is displayed as utf-8. This is largely due to > properties of the host environment and the operating system. Here, x > is treated as an array of unicode octets. > > When we combine x and z into an array, x is not treated as an array of > octets. It is, instead treated as a utf-32 sequence. Discarding u: > would not change this, because u: was not involved in that operation. > > Most likely, the operation you were looking for was something like > > #x,&":z > 10 > > or > > #x,&(8 u: ]) z > 10 > > Here, we are not treating x as a utf-32 array -- we are instead first > representing z as utf-8. > > And, again, discarding u: would not change this aspect of J (except to > cause an error for the x,&(8 u: ]) z example). > > Thanks, > > -- > Raul > > On Sun, Mar 20, 2022 at 1:10 AM Raul Miller <[email protected]> wrote: > > > > On Sat, Mar 19, 2022 at 8:34 PM Elijah Stone <[email protected]> > wrote: > > > I think a deprecation period would probably be a good idea. > > > > I think we would need to complete the preceding steps before we > > attempted such a thing. > > > > Deprecation based on something which has not been implemented is bad > news. > > > > > Per the dictionary: > > > > > > > ": converts literal2 and literal4 to U8 encoded 1-byte char > > > > Yes, I realized that after I hit send on that message. > > > > > Not specified is whether literal2 is interpreted as ucs-2 or utf-16. > > > Experimentally, it is utf-16. > > > > It's my understanding that ucs-2 is a subset of utf-16. > > > > > > ; verb each sequence > > > > > > I don't understand the significance of this. > > > > Generally speaking, when you are working with text, you are working > > with arbitrary length sequences. So, boxing intermediate results and > > razing the boxes is a frequently used idiom. > > > > ;(# ":)each 1 2 3 > > 122333 > > > > > > Generally speaking, if you want an unambiguous representation of your > > > > data, you should use something like {{ 5!:5<'y' }} rather than ": > > > > > > I don't need unambiguous. I'll take non-obfuscatory. And, as > mentioned, > > > the behaviour of ": here is inconsistent with other primitives. > > > > Every primitive is in some sense "inconsistent" with other primitives, > > because every primitive accomplishes something different. > > > > The ": primitive is about formatting text for display. That is going > > to have to be different from an operation like addition. > > > > > > it is not being displayed correctly. > > > > > > The display seems correct to me. > > > > Ah, that was my browser / email client messing up. > > > > Thanks, > > > > -- > > Raul > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
