Re: [Jsource] Problems dealing with UTF-8

Raul Miller Sun, 10 Jul 2016 20:10:08 -0700

On Sun, Jul 10, 2016 at 7:42 PM, Don Guinn <dongu...@gmail.com> wrote:
> I confess I have never read the whole unicode standard.
>
> And yes. I am proposing that when mixing char and wide in a primitive like
> Append (,) that the char be converted to wide as done in 7&u: , only it
> should unconditionally convert to wide as it has to be wide to match the
> other argument.


Can you give me an example where this would give a different result from:

append=: dyad define
  if. 131074 = x +&(3!:0) y do. x ,&(7&u:) y else. x, y end.
)

For that matter, is there some reason you would not want to use
,&(7&u:) if you are mixing utf-16 and utf-8 characters?

> However, I feel that the current standard of converting
> with u: monadic should not be allowed at all. It should be an error period.

Why is that?

Is this because that is the only use you have? Is this because you
believe this would break no existing code? Or is this because you
believe that no one should ever use a 16 bit literal for non-unicode
data in J? (For example, when dealing with binary files representing
music, or for representing pixels?)

> In the current world one never really can predict when some data may appear
> with UTF-8 characters unexpectedly. This would force manual conversion
> insuring that the proper conversion from char to wide as required by the
> application is done. Otherwise testing with only ASCII char would not catch
> the possible error.

I feel that you have not encountered enough problems with "almost
utf-8 data", or "utf-8 data mixed in with other binary data in a file)
if you are saying stuff like this.

For that matter, if by "manual conversion" you mean using 7&u: then I
do not see that why this should be a problem.

> It seems to me that automatic conversion from char to wide assume UTF-8 is
> a proper choice now. It is possible that one could run into a need to leave
> the conversion as it is now, but where would that data come from?

A file, most likely. Or a network stream.

> And it would really be a pain do view given that J is so insistent to treat 
> char
> as UTF-8 when displaying.

Usually you convert such data to numbers (possibly hexadecimal) when
you want to inspect it. But you expect J to function in a transparent
and predictable fashion, to get there.

> J automatically converts integer (64 bit) into float when it can cause a
> loss of accuracy and we accept that. How is this different?

This conversion changes the shape of the data.

Thanks,

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jsource] Problems dealing with UTF-8

Reply via email to