Re: [9fans] Character case mappings

2013-06-25 Thread Daode
erik quanstrom quans...@quanstro.net wrote:
 | uuh, ok, 9atom seems to have seen a lot of progress compared to
 | what i have yet looked at.
 |
 |just a few tables.  and a bit of time spent applying them.  ;-) 
 |if you have plan 9 installed and can 
 |
 |  nflag=-n srv $nflag -q tcp!atom.9atom.org atom 
 |  mount $nflag /srv/atom /n/atom atom

Unfortunately not yet; but i have the distribution since
yesterday.  (The git(1) pack is 121 MB.  And what i've seen before
belonged to go, yet i wrote Plan9 since it seemed to have a common
origin.)

 |then the tables, c. are in /n/atom/plan9/sys/src/libc/port.
 |the awk code to generate them, and the supporting functions
 |are in /n/atom/plan9/sys/src/cmd/runetype.
 |
 |a particularlly nifty (if straightforward) application is grep -I, which is \
 |like
 |grep -i, but translates its input with tolowerrune(tobaserune(r))
 |rather than tolower(c).  also straightforward is rune/case, which is
 |like tr 'A-Z' 'a-z', except generalized for unicode.

May be worth taking a deeper look into a system that works for
non-english.

Btw. i thought i was so smart due to my Ctx objects for bracket
expressions, format string conversions etc. -- and even said so --
only to find out that on Plan9 there existed something rather
similar years before!  Pretty awkward.

 |see also,
 |http://www.9atom.org/magic/man2html/1/rune
 |http://www.9atom.org/magic/man2html/2/isalpharune
 |http://www.9atom.org/magic/man2html/2/runeclass

yea yea, maybe: i'm not familiar with something that just works,
i'm using BSD for such a long time.
Looking into upas doesn't make me much happier, too.  Sigh.

 |- erik

--steffen



Re: [9fans] Character case mappings

2013-06-24 Thread erik quanstrom
 My S-CText (on sourceforge DOT net SLASH p SLASH s-ctext SLASH
 code SLASH) tests all 0x10 code points correct with the
 above.  Now when i look at the sys/src/libc/port/runetype.c (of
 plan9front) then i think this one is generated, but i cannot find
 the creating script or program, which would be of interest to me.
 And maybe Plan9 would be interested to see the above patched into
 that, at some later time. ?
 Thank you and ciao,

that's close to the approach taken, except since one needs
a fresh table for each sorting if one hopes to do a binary search,
simple tables of (various width) integers were made.  it was also
noted that bursting the tables at the junction of the basic and
extended plans was possible in many cases.

for example, for decompositions if r is a precombined form,
and r is in the basic frame then for r = r' + c, r' and c are both
in the basic plane.  thus we can burst this table, and put
basic plane mappings (1000 of them) in a more compact table
that doesn't use vlongs.  the extended plane table is tiny
(18 entries).  it's only worth using a binary search for symmetry.

static
uint__decompose2[] =
{
0x00c0, 0x00410300,  /* À - A 0300 */
[... 998 entries skipped ... ]
0xfb4e, 0x05e405bf,  /* פֿ - פ 05bf */
}

static
uvlong  __decompose264[] =
{
0x1109a,0x11099110baull, /* ႚ - ႙ + 110ba */
[... 16 entries skipped ...]
0x1d1c0,0x1d1bc1d16full, /* 퇀 - 톼 + 1d16f */
};

static uint*
bsearch32(uint c, uint *t, int n, int ne)
{
uint *p;
int m;

while(n  1) {
m = n/2;
p = t + m*ne;
if(c = p[0]) {
t = p;
n = n-m;
} else
n = m;
}
if(n  c == t[0])
return t;
return 0;
}

[bsearch64 omitted]

int
runedecompose(Rune a, Rune *d)
{
uint *p;
uvlong *q;

if(a = 0x){
p = bsearch32(a, __decompose2, nelem(__decompose2)/2, 2);
if(p){
d[0] = p[1]  16;
d[1] = p[1]  0x;
return 0;
}
}else{
q = bsearch64(a, __decompose264, nelem(__decompose264)/2, 2);
if(q){
d[0] = q[1]  32;
d[1] = q[1]  0xfff;
return 0;
}
}
return -1;
}

all the other rune tables work this way.  there is one
table per property.  having a structure doesn't fit the
current programming interface, nor usage.

- erik