Dear friends who have been using ragel for more than a year,

I bet Kevin and I are facing a similar problem that you have all faced, namely that as a software project matures, common ground between its founding users and new users erodes. Fresh code examples keep interest alive and prevent people from re-inventing the wheel. Please do speak up!

How about an authoritative Ruby code example for Ragel Guide 6.7 section 4.2.4 (Longest-Match Kleene Star)?. It's "useful when writing simple tokenizers"... that sounds like a great way to bridge the gap.

Since all the code examples are in C, it's not clear what you would use in Ruby instead of ts and te.

Best,
Seamus

On 6/13/11 12:42 PM, Kevin T. Ryan wrote:
Hey -

Just started using the library myself.  Easiest way to think about it
(at least, it was for me) is that you are defining the machine in the
section you noted below from the guide.  Until you initialize and
execute it, it doesn't "do anything".  Thus, in some part of your
script you need:

%% write data; # sets up all the static data needed by the tokenizer

Then (somewhere else in all likelihood), you need to initialize and
execute the machine.  So, for example:

int main(int argc, char* argv[]) {
     int cs; // you can use this to check the status of the machine
     char* p = "Your text to tokenize";
     char* pe = p + strlen(p);

     %% write init;
     %% write exec; # this will execute the machine given the input
provided by 'p'

     if (cs ==<machine_name>_error)
         fprintf(stderr, "Error\n");
     return 0;
}

What might action A look like? How does it use p, pe, etc.? Ditto for B.

Maybe action 'A' is used to print a match when it  ends (the '%' in
front of the A indicates that it will occur when leaving action).  For
example:

action A { print("Found alpha\n"); }
action B { print("Found int\n"); }

If you need to print out the total string, you might combine it with a
'mark' action.  Eg:

action mark { mark = p; /* mark needs to be set up in 'main' function
now as a char* */ }
<  as before>
lower ( lower | digit )*>mark %A |

And do the same for the integer portion of the machine.  You could
then change your print function to do something like:

printf("Found alpha: %.*s\n", p-mark, mark); // print out the alpha found

PS. I think this would address a big question for
ragel/parsing/lexing/tokenizing newbies, namely, how would an **expert**
implement a **simple** tokenizer?

You may also want to look at machines that are 'special' for lexing
(viz., machine := |* *|;).  BTW, I'm very new to this myself - so
hopefully I didn't screw anything up too much!

PS - I'm actually trying to write up a tutorial which I'll share with
the list for feedback once it's done.  I think I have a much better
grasp of what's going on now, but I think writing it out would
actually help my understanding too.

Good luck,

ktr

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

--
Seamus Abshere
123 N Blount St Apt 403
Madison, WI 53703
1 (201) 566-0130

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Reply via email to