What you are seeing here is an oddness in the unicode specification. a.i.8 u: 7 u: 'þ' 195 190
7 u: gives you utf-16 character representation, and 8 u: gives you utf-8 character representation. It just happens to be the case that the character value in the utf-16 representation of thorn (þ) happens to be less than 256. But if you do not make a careful distinction between "literals" and "characters" you can confuse yourself by expecting the wrong thing here. Thanks, -- Raul On Thu, Feb 27, 2014 at 7:03 AM, Björn Helgason <gos...@gmail.com> wrote: > There are a lot of strange things happening regarding national characters. > > þ is within the 256 chars but behaves strange regarding a. > > 7 u: 'þ' > þ > a. i. 7 u: 'þ' > 254 > 254 { a. > � > 7 u: 254 { a. > |domain error > | 7 u:254{a. > 3 u: 254 { a. > 254 > 'þ' = 254 { a. > 0 0 > (7 u: 'þ') = 254 { a. > 1 > > > - > Björn Helgason > gsm:6985532 > skype:gosiminn > On 26.2.2014 14:36, "Raul Miller" <rauldmil...@gmail.com> wrote: > >> a. is just 256 literal characters, it is a noun. >> >> I expect u: might have been what you were thinking about? It's a verb. >> >> Thanks, >> >> -- >> Raul >> >> On Wed, Feb 26, 2014 at 2:19 AM, Björn Helgason <gos...@gmail.com> wrote: >> > Actually I want a. back as it was. >> > >> > Giving me two or three number is wrong and is confusing at best. >> > >> > It should return the digital number for Unicode and only one number per >> > char. >> > >> > a. is the atomic vector and this way the atomic has grown to include all >> of >> > Unicode. >> > >> > - >> > Björn Helgason >> > gsm:6985532 >> > skype:gosiminn >> > On 25.2.2014 16:10, "Björn Helgason" <gos...@gmail.com> wrote: >> > >> >> a. and especially i. a. - looking up chars indexes used to be useful. >> >> >> >> It is not as easy anymore. >> >> >> >> The national chars are often not in there with a single number. >> >> >> >> Sometimes two or three. >> >> >> >> Reading files also sometimes with unicode markings. >> >> >> >> - >> >> Björn Helgason >> >> gsm:6985532 >> >> skype:gosiminn >> >> On 25.2.2014 14:03, "Don Guinn" <dongu...@gmail.com> wrote: >> >> >> >>> I tried that a while back. I extended the table for ;: to treat the >> bytes >> >>> for _128{.a to be treated as letters which made all multi-byte UTF-8 >> >>> treated as alphas. Statements were broken into tokens properly. But >> then I >> >>> found that the interpreter used the top half of a. internally. I >> mentioned >> >>> that in the forum a while back when someone noticed that some >> character in >> >>> there acted weird. Roger said that could be changed if needed. Might be >> >>> easy for Roger to change that but it didn't look so easy to me. >> >>> >> >>> I looked at the tables for Unicode (wide characters) and in the form of >> >>> UTF-8 and couldn't see any easy to distinguish the category of a >> >>> character. >> >>> Those that one would consider an alpha were mixed in with graphics and >> >>> controls. APL characters were not grouped together but scattered all >> over >> >>> the place. >> >>> >> >>> For trying it out and seeing what happens shouldn't be too difficult to >> >>> see >> >>> how it would work but there are a lot of questions to answer before >> making >> >>> it a production tool. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On Mon, Feb 24, 2014 at 10:11 PM, bill lam <bbill....@gmail.com> >> wrote: >> >>> >> >>> > This seems simpler. The first thing to do is build a prototype >> >>> > implementaton, >> >>> > and then we can see what are other problems out there. >> >>> > >> >>> > Пн, 24 фев 2014, Don Guinn писал(а): >> >>> > > A middle ground might be to allow for some Unicode (UTF-8) to be >> >>> > > considered letters like a-z,A-Z. Then one could name APL iota to >> >>> > something >> >>> > > like i. . In addition, it would allow non-English languages not be >> >>> > > restricted to ASCII characters for names. Greek letters in >> mathematics >> >>> > > could be used as names making statements look a little more like >> >>> > > traditional mathematics. It would be simpler to allow all Unicode >> >>> > > characters be considered letters, but that might lend to other >> >>> problems. >> >>> > > >> ---------------------------------------------------------------------- >> >>> > > For information about J forums see >> >>> http://www.jsoftware.com/forums.htm >> >>> > >> >>> > -- >> >>> > regards, >> >>> > ==================================================== >> >>> > GPG key 1024D/4434BAB3 2008-08-24 >> >>> > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 >> >>> > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 >> >>> > >> ---------------------------------------------------------------------- >> >>> > For information about J forums see >> http://www.jsoftware.com/forums.htm >> >>> ---------------------------------------------------------------------- >> >>> For information about J forums see http://www.jsoftware.com/forums.htm >> >> >> >> >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm