On Sun, Jun 23, 2019 at 09:47:18PM +0200, Michael von der Heide wrote: > It works (hunspell) for me with words like "prüfen" or "Straße". Flex > generates an 8-bit scanner. UTF-8 should work. Would you mind testing it?
sorry - when you said "code points", I had in mind Unicode. Applying the term to UTF-8 sequences doesn't seem entirely correct, though I'm aware people use the two interchangeably. (not to argue, but a string isn't a point) lex/flex will allow ranges, and hexadecimal's standard (hence "lex" too): http://pubs.opengroup.org/onlinepubs/9699919799/utilities/lex.html > -- > Michael von der Heide > > > Thomas Dickey <[email protected]> schrieb am So., 23. Juni 2019, 21:24: > > > On Sun, Jun 23, 2019 at 07:42:26PM +0200, Michael von der Heide wrote: > > > Would it be possible to include UTF-8 code points to check words > > containing > > > umlauts? > > > > > > WORD ([a-zA-Z]|\xc3[\x80-\xbf])+ for reference, that's the UTF-8 encoding for the Unicode codepoints 192-255: 192: 192 0300 0xc0 text "\300" utf8 \303\200 255: 255 0377 0xff text "\377" utf8 \303\277 and 0303: 195 0303 0xc3 text "\303" utf8 \303\203 0200: 128 0200 0x80 text "\200" utf8 \302\200 0277: 191 0277 0xbf text "\277" utf8 \302\277 Possibly clearer (ispell on my Debian8 works with this): diff -u -r1.59 filters/spellflt.l --- filters/spellflt.l 2013/12/02 01:32:53 1.59 +++ filters/spellflt.l 2019/06/23 20:28:42 @@ -157,7 +157,10 @@ %} -WORD [[:alpha:]]([[:alnum:]])* +ALPHA [[:alpha:]] +UMLAUT \xc3[\x80-\xbf] +LETTER ({ALPHA}|{UMLAUT})+ +WORD {LETTER}({LETTER}|[[:digit:]])* %% > > > WORD ([a-zA-Z]|\xc3[\x80-\xbf])+ > > > > lex/flex doesn't do that :-( > > > > They use small (256-entry) tables for the character types. > > > > I've seen a (long ago) patch to use big tables (which I've read > > doesn't work well). > > > > on my (too-long) to-do list, I have an idea which could be developed, > > to provide the feature using character-classes. That is, flex could > > be modified (perhaps a month's work...) -- Thomas E. Dickey <[email protected]> https://invisible-island.net ftp://ftp.invisible-island.net
signature.asc
Description: Digital signature
_______________________________________________ vile mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/vile
