What you are seeing here is an oddness in the unicode specification.

   a.i.8 u: 7 u: 'þ'
195 190

7 u: gives you utf-16 character representation, and 8 u: gives you
utf-8 character representation.

It just happens to be the case that the character value in the utf-16
representation of thorn (þ) happens to be less than 256. But if you do
not make a careful distinction between "literals" and "characters" you
can confuse yourself by expecting the wrong thing here.

Thanks,

-- 
Raul

On Thu, Feb 27, 2014 at 7:03 AM, Björn Helgason <gos...@gmail.com> wrote:
> There are a lot of strange things happening regarding national characters.
>
> þ is within the 256 chars but behaves strange regarding a.
>
> 7 u: 'þ'
> þ
> a. i. 7 u: 'þ'
> 254
>    254 { a.
> �
>    7 u: 254 { a.
> |domain error
> |   7     u:254{a.
>    3 u: 254 { a.
> 254
>   'þ' = 254 { a.
> 0 0
>   (7 u: 'þ') = 254 { a.
> 1
>
>
> -
> Björn Helgason
> gsm:6985532
> skype:gosiminn
> On 26.2.2014 14:36, "Raul Miller" <rauldmil...@gmail.com> wrote:
>
>> a. is just 256 literal characters, it is a noun.
>>
>> I expect u: might have been what you were thinking about? It's a verb.
>>
>> Thanks,
>>
>> --
>> Raul
>>
>> On Wed, Feb 26, 2014 at 2:19 AM, Björn Helgason <gos...@gmail.com> wrote:
>> > Actually I want a. back as it was.
>> >
>> > Giving me two or three number is wrong and is confusing at best.
>> >
>> > It should return the digital number for Unicode and only one number per
>> > char.
>> >
>> > a. is the atomic vector and this way the atomic has grown to include all
>> of
>> > Unicode.
>> >
>> > -
>> > Björn Helgason
>> > gsm:6985532
>> > skype:gosiminn
>> > On 25.2.2014 16:10, "Björn Helgason" <gos...@gmail.com> wrote:
>> >
>> >> a. and especially i. a. - looking up chars indexes used to be useful.
>> >>
>> >> It is not as easy anymore.
>> >>
>> >> The national chars are often not in there with a single number.
>> >>
>> >> Sometimes two or three.
>> >>
>> >> Reading files also sometimes with unicode markings.
>> >>
>> >> -
>> >> Björn Helgason
>> >> gsm:6985532
>> >> skype:gosiminn
>> >> On 25.2.2014 14:03, "Don Guinn" <dongu...@gmail.com> wrote:
>> >>
>> >>> I tried that a while back. I extended the table for ;: to treat the
>> bytes
>> >>> for _128{.a to be treated as letters which made all multi-byte UTF-8
>> >>> treated as alphas. Statements were broken into tokens properly. But
>> then I
>> >>> found that the interpreter used the top half of a. internally. I
>> mentioned
>> >>> that in the forum a while back when someone noticed that some
>> character in
>> >>> there acted weird. Roger said that could be changed if needed. Might be
>> >>> easy for Roger to change that but it didn't look so easy to me.
>> >>>
>> >>> I looked at the tables for Unicode (wide characters) and in the form of
>> >>> UTF-8 and couldn't see any easy to distinguish the category of a
>> >>> character.
>> >>> Those that one would consider an alpha were mixed in with graphics and
>> >>> controls. APL characters were not grouped together but scattered all
>> over
>> >>> the place.
>> >>>
>> >>> For trying it out and seeing what happens shouldn't be too difficult to
>> >>> see
>> >>> how it would work but there are a lot of questions to answer before
>> making
>> >>> it a production tool.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Mon, Feb 24, 2014 at 10:11 PM, bill lam <bbill....@gmail.com>
>> wrote:
>> >>>
>> >>> > This seems simpler. The first thing to do is build a prototype
>> >>> > implementaton,
>> >>> > and then we can see what are other problems out there.
>> >>> >
>> >>> > Пн, 24 фев 2014, Don Guinn писал(а):
>> >>> > > A middle ground might be to allow for some Unicode (UTF-8) to be
>> >>> > > considered letters like a-z,A-Z. Then one could name APL iota to
>> >>> > something
>> >>> > > like i. . In addition, it would allow non-English languages not be
>> >>> > > restricted to ASCII characters for names. Greek letters in
>> mathematics
>> >>> > > could be used as names making statements look a little more like
>> >>> > > traditional mathematics. It would be simpler to allow all Unicode
>> >>> > > characters be considered letters, but that might lend to other
>> >>> problems.
>> >>> > >
>> ----------------------------------------------------------------------
>> >>> > > For information about J forums see
>> >>> http://www.jsoftware.com/forums.htm
>> >>> >
>> >>> > --
>> >>> > regards,
>> >>> > ====================================================
>> >>> > GPG key 1024D/4434BAB3 2008-08-24
>> >>> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
>> >>> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3
>> >>> >
>> ----------------------------------------------------------------------
>> >>> > For information about J forums see
>> http://www.jsoftware.com/forums.htm
>> >>> ----------------------------------------------------------------------
>> >>> For information about J forums see http://www.jsoftware.com/forums.htm
>> >>
>> >>
>> > ----------------------------------------------------------------------
>> > For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to