I agree that it is wrong to assume that byte-precision y in (": y) is UTF-8. It should simply produce y. I will fix this for the next beta (when I get back from vacation).
In boxed data, byte-precision y must be assumed to be UTF-8, as now. Henry Rich On Sun, Jul 10, 2016 at 9:16 PM, bill lam <bbill....@gmail.com> wrote: > unicode support was introduced in J5 and u: for promoting > literal to ucs was done at that time. utf8 support was added in > J6, other than C front-end and foreign file interface assume > utf8 encoding, J engine itself did not assume any encoding for > literal. In fact literal may hold binary data. > > IMO assume literal to be utf8 encoded is wrong. > > u: for promoting literal to ucs is sensible, at least it allows > round-trip conversion. > > Вс, 10 июл 2016, Don Guinn написал(а): > > I confess I have never read the whole unicode standard. > > > > And yes. I am proposing that when mixing char and wide in a primitive > like > > Append (,) that the char be converted to wide as done in 7&u: , only it > > should unconditionally convert to wide as it has to be wide to match the > > other argument. However, I feel that the current standard of converting > > with u: monadic should not be allowed at all. It should be an error > period. > > In the current world one never really can predict when some data may > appear > > with UTF-8 characters unexpectedly. This would force manual conversion > > insuring that the proper conversion from char to wide as required by the > > application is done. Otherwise testing with only ASCII char would not > catch > > the possible error. > > > > It seems to me that automatic conversion from char to wide assume UTF-8 > is > > a proper choice now. It is possible that one could run into a need to > leave > > the conversion as it is now, but where would that data come from? And it > > would really be a pain do view given that J is so insistent to treat char > > as UTF-8 when displaying. > > > > J automatically converts integer (64 bit) into float when it can cause a > > loss of accuracy and we accept that. How is this different? > > > > On Sun, Jul 10, 2016 at 4:54 PM, Raul Miller <rauldmil...@gmail.com> > wrote: > > > > > On Sun, Jul 10, 2016 at 6:14 PM, Don Guinn <dongu...@gmail.com> wrote: > > > > I am not suggesting any change in the way char is handled except when > > > > combining with wide. So programs not using wide would not be > affected. > > > Wide > > > > is different from char as it is only Unicode. It has no other use. > So any > > > > time wide and char are mixed the char bytes are must be Unicode > points. > > > So > > > > I looked at what U+80 through U+FF are. Some control codes of which I > > > don't > > > > understand and Latin-1 Supplement. There are many useful symbols in > > > > this range. But how would they be entered? > > > > > > I think what you are proposing is that J should be changed so that x > > > #@,y does not always match x+&# y. > > > > > > And, also, I think that you are proposing that x,y should throw a > > > domain error when one argument is type 131072 and the other is type 2 > > > and the type 2 argument is not valid UTF-8? > > > > > > In other words, I think you are proposing append works like this: > > > > > > append=: dyad define > > > if. 131074 = x +&(3!:0) y do. x ,&(7&u:) y else. x, y end. > > > ) > > > > > > in place of current behavior, which is more like this: > > > > > > append=: dyad define > > > if. 131074 = x +&(3!:0) y do. x ,&u: y else. x, y end. > > > ) > > > > > > But, also, I think that you are also proposing that we currently do > > > not adopt other parts of the unicode standard, such as many of those > > > listed at http://unicode.org/reports/? > > > > > > Do you feel that this accurately reflects your current point of view? > > > > > > Thanks, > > > > > > -- > > > Raul > > > ---------------------------------------------------------------------- > > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm