Re: [Jsource] Problems dealing with UTF-8

bill lam Sun, 10 Jul 2016 18:16:25 -0700

unicode support was introduced in J5 and u: for promoting
literal to ucs was done at that time. utf8 support was added in
J6, other than C front-end and foreign file interface assume
utf8 encoding, J engine itself did not assume any encoding for
literal. In fact literal may hold binary data.


IMO assume literal to be utf8 encoded is wrong.

u: for promoting literal to ucs is sensible, at least it allows
round-trip conversion.

Вс, 10 июл 2016, Don Guinn написал(а):
> I confess I have never read the whole unicode standard.
> 
> And yes. I am proposing that when mixing char and wide in a primitive like
> Append (,) that the char be converted to wide as done in 7&u: , only it
> should unconditionally convert to wide as it has to be wide to match the
> other argument. However, I feel that the current standard of converting
> with u: monadic should not be allowed at all. It should be an error period.
> In the current world one never really can predict when some data may appear
> with UTF-8 characters unexpectedly. This would force manual conversion
> insuring that the proper conversion from char to wide as required by the
> application is done. Otherwise testing with only ASCII char would not catch
> the possible error.
> 
> It seems to me that automatic conversion from char to wide assume UTF-8 is
> a proper choice now. It is possible that one could run into a need to leave
> the conversion as it is now, but where would that data come from? And it
> would really be a pain do view given that J is so insistent to treat char
> as UTF-8 when displaying.
> 
> J automatically converts integer (64 bit) into float when it can cause a
> loss of accuracy and we accept that. How is this different?
> 
> On Sun, Jul 10, 2016 at 4:54 PM, Raul Miller <rauldmil...@gmail.com> wrote:
> 
> > On Sun, Jul 10, 2016 at 6:14 PM, Don Guinn <dongu...@gmail.com> wrote:
> > > I am not suggesting any change in the way char is handled except when
> > > combining with wide. So programs not using wide would not be affected.
> > Wide
> > > is different from char as it is only Unicode. It has no other use. So any
> > > time wide and char are mixed the char bytes are must be Unicode points.
> > So
> > > I looked at what U+80 through U+FF are. Some control codes of which I
> > don't
> > > understand and Latin-1 Supplement. There are many useful symbols in
> > > this range. But how would they be entered?
> >
> > I think what you are proposing is that J should be changed so that x
> > #@,y does not always match x+&# y.
> >
> > And, also, I think that you are proposing that x,y should throw a
> > domain error when one argument is type 131072 and the other is type 2
> > and the type 2 argument is not valid UTF-8?
> >
> > In other words, I think you are proposing append works like this:
> >
> > append=: dyad define
> >   if. 131074 = x +&(3!:0) y do. x ,&(7&u:) y else. x, y end.
> > )
> >
> > in place of current behavior, which is more like this:
> >
> > append=: dyad define
> >   if. 131074 = x +&(3!:0) y do. x ,&u: y else. x, y end.
> > )
> >
> > But, also, I think that you are also proposing that we currently do
> > not adopt other parts of the unicode standard, such as many of those
> > listed at http://unicode.org/reports/?
> >
> > Do you feel that this accurately reflects your current point of view?
> >
> > Thanks,
> >
> > --
> > Raul
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

-- 
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jsource] Problems dealing with UTF-8

Reply via email to