Re: [Jsource] Problems dealing with UTF-8

Raul Miller Sun, 10 Jul 2016 20:11:43 -0700

For example, the standard seems to require that 5 3 $ y would throw a
domain error for some values of y.


More generally, full compliance with the standard would result in a
programming language which is not J.

-- 
Raul

On Sun, Jul 10, 2016 at 8:41 PM, Don Guinn <dongu...@gmail.com> wrote:
> The reports are pretty big. What part of the standard does my proposal
> violate?
>
> On Sun, Jul 10, 2016 at 5:42 PM, Don Guinn <dongu...@gmail.com> wrote:
>
>> I confess I have never read the whole unicode standard.
>>
>> And yes. I am proposing that when mixing char and wide in a primitive like
>> Append (,) that the char be converted to wide as done in 7&u: , only it
>> should unconditionally convert to wide as it has to be wide to match the
>> other argument. However, I feel that the current standard of converting
>> with u: monadic should not be allowed at all. It should be an error period.
>> In the current world one never really can predict when some data may appear
>> with UTF-8 characters unexpectedly. This would force manual conversion
>> insuring that the proper conversion from char to wide as required by the
>> application is done. Otherwise testing with only ASCII char would not catch
>> the possible error.
>>
>> It seems to me that automatic conversion from char to wide assume UTF-8 is
>> a proper choice now. It is possible that one could run into a need to leave
>> the conversion as it is now, but where would that data come from? And it
>> would really be a pain do view given that J is so insistent to treat char
>> as UTF-8 when displaying.
>>
>> J automatically converts integer (64 bit) into float when it can cause a
>> loss of accuracy and we accept that. How is this different?
>>
>> On Sun, Jul 10, 2016 at 4:54 PM, Raul Miller <rauldmil...@gmail.com>
>> wrote:
>>
>>> On Sun, Jul 10, 2016 at 6:14 PM, Don Guinn <dongu...@gmail.com> wrote:
>>> > I am not suggesting any change in the way char is handled except when
>>> > combining with wide. So programs not using wide would not be affected.
>>> Wide
>>> > is different from char as it is only Unicode. It has no other use. So
>>> any
>>> > time wide and char are mixed the char bytes are must be Unicode points.
>>> So
>>> > I looked at what U+80 through U+FF are. Some control codes of which I
>>> don't
>>> > understand and Latin-1 Supplement. There are many useful symbols in
>>> > this range. But how would they be entered?
>>>
>>> I think what you are proposing is that J should be changed so that x
>>> #@,y does not always match x+&# y.
>>>
>>> And, also, I think that you are proposing that x,y should throw a
>>> domain error when one argument is type 131072 and the other is type 2
>>> and the type 2 argument is not valid UTF-8?
>>>
>>> In other words, I think you are proposing append works like this:
>>>
>>> append=: dyad define
>>>   if. 131074 = x +&(3!:0) y do. x ,&(7&u:) y else. x, y end.
>>> )
>>>
>>> in place of current behavior, which is more like this:
>>>
>>> append=: dyad define
>>>   if. 131074 = x +&(3!:0) y do. x ,&u: y else. x, y end.
>>> )
>>>
>>> But, also, I think that you are also proposing that we currently do
>>> not adopt other parts of the unicode standard, such as many of those
>>> listed at http://unicode.org/reports/?
>>>
>>> Do you feel that this accurately reflects your current point of view?
>>>
>>> Thanks,
>>>
>>> --
>>> Raul
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jsource] Problems dealing with UTF-8

Reply via email to