On Fri, Jul 25, 2008 at 5:23 AM, Marcus Boerger <[EMAIL PROTECTED]> wrote:
> Hello Gregg,

> The testing would be the harder work. What is required is to have re2c read
> chrs from the input stream into ints rather than into chars (bytes as you
> called it). However you can already di so if you provide the layer doing so
> and just pass along the int array. Anyway, at this point re2c development

Does anybody have some sample specs to work with Unicode input?  I'm
having trouble getting -u to work

Summary: we have two encodings, one for the re2c spec and one for the
input to the generated scanner.  I'm stuck using cygwin for the time
being, so I use emacs to write my files in utf-8, then us iconv to
convert to UTF-16 or UTF-32.

My YYCTYPE is unsigned int.

Only utf-8 specs work.  With utf-16 or utf-32 specs re2c runs to
completion but generates humongous c files that provoke zillions of
"warning: null character(s) ignored".  I'm not sure if compilation
would work, it was taking so long I killed it.

With a utf-8 encoded spec, I can either use utf-8 encoded chars in my
regexes (e.g. كتاب, lègére, etc.) or \u notation (e.g. \u0628 = ب).

With utf-8 encoded chars in a utf-8 spec, the re2c command works, but re2c
-u results in a re2c segfault.  (I think I can make a usable scanner
with this method, but I want to understand how "proper" unicode
support works.)

With \u encoded chars in a utf-8 spec, the re2c command produces
"Illegal unicode character, out of range" for e.g. "\u0628", but  re2c -u works.

A utf-8 spec, without -u, with utf-8 encoded chars produces a
scanner that recognizes utf-8 encoded input, but only by recognizing
byte codes, not characters (in the unicode sense).  It reads utf-16
and utf-32 input but doesn't recognize the (non-ascii) chars.

A utf-8 spec, with -u, with \u encoded chars produces a scanner that
does not recognize non-ascii input regardless of input encoding.
However, it does seem to recognize ascii regexes.

So I'm not understanding something.  I'm also confused about the
difference between -w and -u.

Any help would be greatly appreciated.  Also, I'd be happy to write up
some documentation if somebody can get me started with an example or
two.

-Gregg
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Re2c-general mailing list
Re2c-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/re2c-general

Reply via email to