Re: [Jsource] Problems dealing with UTF-8

Don Guinn Sun, 10 Jul 2016 16:43:20 -0700

I confess I have never read the whole unicode standard.

And yes. I am proposing that when mixing char and wide in a primitive like
Append (,) that the char be converted to wide as done in 7&u: , only it
should unconditionally convert to wide as it has to be wide to match the
other argument. However, I feel that the current standard of converting
with u: monadic should not be allowed at all. It should be an error period.
In the current world one never really can predict when some data may appear
with UTF-8 characters unexpectedly. This would force manual conversion
insuring that the proper conversion from char to wide as required by the
application is done. Otherwise testing with only ASCII char would not catch
the possible error.


It seems to me that automatic conversion from char to wide assume UTF-8 is
a proper choice now. It is possible that one could run into a need to leave
the conversion as it is now, but where would that data come from? And it
would really be a pain do view given that J is so insistent to treat char
as UTF-8 when displaying.

J automatically converts integer (64 bit) into float when it can cause a
loss of accuracy and we accept that. How is this different?

On Sun, Jul 10, 2016 at 4:54 PM, Raul Miller <rauldmil...@gmail.com> wrote:

> On Sun, Jul 10, 2016 at 6:14 PM, Don Guinn <dongu...@gmail.com> wrote:
> > I am not suggesting any change in the way char is handled except when
> > combining with wide. So programs not using wide would not be affected.
> Wide
> > is different from char as it is only Unicode. It has no other use. So any
> > time wide and char are mixed the char bytes are must be Unicode points.
> So
> > I looked at what U+80 through U+FF are. Some control codes of which I
> don't
> > understand and Latin-1 Supplement. There are many useful symbols in
> > this range. But how would they be entered?
>
> I think what you are proposing is that J should be changed so that x
> #@,y does not always match x+&# y.
>
> And, also, I think that you are proposing that x,y should throw a
> domain error when one argument is type 131072 and the other is type 2
> and the type 2 argument is not valid UTF-8?
>
> In other words, I think you are proposing append works like this:
>
> append=: dyad define
>   if. 131074 = x +&(3!:0) y do. x ,&(7&u:) y else. x, y end.
> )
>
> in place of current behavior, which is more like this:
>
> append=: dyad define
>   if. 131074 = x +&(3!:0) y do. x ,&u: y else. x, y end.
> )
>
> But, also, I think that you are also proposing that we currently do
> not adopt other parts of the unicode standard, such as many of those
> listed at http://unicode.org/reports/?
>
> Do you feel that this accurately reflects your current point of view?
>
> Thanks,
>
> --
> Raul
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jsource] Problems dealing with UTF-8

Reply via email to