On Fri, Jul 25, 2008 at 5:23 AM, Marcus Boerger <[EMAIL PROTECTED]> wrote: > Hello Gregg,
> The testing would be the harder work. What is required is to have re2c read > chrs from the input stream into ints rather than into chars (bytes as you > called it). However you can already di so if you provide the layer doing so > and just pass along the int array. Anyway, at this point re2c development Does anybody have some sample specs to work with Unicode input? I'm having trouble getting -u to work Summary: we have two encodings, one for the re2c spec and one for the input to the generated scanner. I'm stuck using cygwin for the time being, so I use emacs to write my files in utf-8, then us iconv to convert to UTF-16 or UTF-32. My YYCTYPE is unsigned int. Only utf-8 specs work. With utf-16 or utf-32 specs re2c runs to completion but generates humongous c files that provoke zillions of "warning: null character(s) ignored". I'm not sure if compilation would work, it was taking so long I killed it. With a utf-8 encoded spec, I can either use utf-8 encoded chars in my regexes (e.g. كتاب, lègére, etc.) or \u notation (e.g. \u0628 = ب). With utf-8 encoded chars in a utf-8 spec, the re2c command works, but re2c -u results in a re2c segfault. (I think I can make a usable scanner with this method, but I want to understand how "proper" unicode support works.) With \u encoded chars in a utf-8 spec, the re2c command produces "Illegal unicode character, out of range" for e.g. "\u0628", but re2c -u works. A utf-8 spec, without -u, with utf-8 encoded chars produces a scanner that recognizes utf-8 encoded input, but only by recognizing byte codes, not characters (in the unicode sense). It reads utf-16 and utf-32 input but doesn't recognize the (non-ascii) chars. A utf-8 spec, with -u, with \u encoded chars produces a scanner that does not recognize non-ascii input regardless of input encoding. However, it does seem to recognize ascii regexes. So I'm not understanding something. I'm also confused about the difference between -w and -u. Any help would be greatly appreciated. Also, I'd be happy to write up some documentation if somebody can get me started with an example or two. -Gregg ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Re2c-general mailing list Re2c-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/re2c-general