On Sun, Jun 23, 2019 at 09:47:18PM +0200, Michael von der Heide wrote:
> It works (hunspell) for me with words like "prüfen" or "Straße". Flex
> generates an 8-bit scanner. UTF-8 should work. Would you mind testing it?

sorry - when you said "code points", I had in mind Unicode.

Applying the term to UTF-8 sequences doesn't seem entirely correct,
though I'm aware people use the two interchangeably.  (not to argue,
but a string isn't a point)

lex/flex will allow ranges, and hexadecimal's standard (hence "lex" too):

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/lex.html

> --
> Michael von der Heide
> 
> 
> Thomas Dickey <[email protected]> schrieb am So., 23. Juni 2019, 21:24:
> 
> > On Sun, Jun 23, 2019 at 07:42:26PM +0200, Michael von der Heide wrote:
> > > Would it be possible to include UTF-8 code points to check words
> > containing
> > > umlauts?
> > >
> > > WORD          ([a-zA-Z]|\xc3[\x80-\xbf])+

for reference, that's the UTF-8 encoding for the Unicode codepoints 192-255:

192: 192 0300 0xc0 text "\300" utf8 \303\200
255: 255 0377 0xff text "\377" utf8 \303\277

and

0303: 195 0303 0xc3 text "\303" utf8 \303\203
0200: 128 0200 0x80 text "\200" utf8 \302\200
0277: 191 0277 0xbf text "\277" utf8 \302\277

Possibly clearer (ispell on my Debian8 works with this):

diff -u -r1.59 filters/spellflt.l
--- filters/spellflt.l  2013/12/02 01:32:53     1.59
+++ filters/spellflt.l  2019/06/23 20:28:42
@@ -157,7 +157,10 @@
 
 %}
 
-WORD           [[:alpha:]]([[:alnum:]])*
+ALPHA          [[:alpha:]]
+UMLAUT         \xc3[\x80-\xbf]
+LETTER         ({ALPHA}|{UMLAUT})+
+WORD           {LETTER}({LETTER}|[[:digit:]])*
 
 %%
 

> > > WORD          ([a-zA-Z]|\xc3[\x80-\xbf])+

> >
> > lex/flex doesn't do that :-(
> >
> > They use small (256-entry) tables for the character types.
> >
> > I've seen a (long ago) patch to use big tables (which I've read
> > doesn't work well).
> >
> > on my (too-long) to-do list, I have an idea which could be developed,
> > to provide the feature using character-classes.  That is, flex could
> > be modified (perhaps a month's work...)

-- 
Thomas E. Dickey <[email protected]>
https://invisible-island.net
ftp://ftp.invisible-island.net

Attachment: signature.asc
Description: Digital signature

_______________________________________________
vile mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/vile

Reply via email to