In the ruby code generator one uses ts and te, except they are offsets
against 'data', instead of pointers. Aside from that, the assumptions
and use cases are all the same.
I would like to use only C in the manual. Ragel supports a number of
languages, but it was originally designed for C and I would like the
manual to reflect that.
On 11-06-13 04:09 PM, Seamus Abshere wrote:
Dear friends who have been using ragel for more than a year,
I bet Kevin and I are facing a similar problem that you have all faced,
namely that as a software project matures, common ground between its
founding users and new users erodes. Fresh code examples keep interest
alive and prevent people from re-inventing the wheel. Please do speak up!
How about an authoritative Ruby code example for Ragel Guide 6.7 section
4.2.4 (Longest-Match Kleene Star)?. It's "useful when writing simple
tokenizers"... that sounds like a great way to bridge the gap.
Since all the code examples are in C, it's not clear what you would use
in Ruby instead of ts and te.
Best,
Seamus
On 6/13/11 12:42 PM, Kevin T. Ryan wrote:
Hey -
Just started using the library myself. Easiest way to think about it
(at least, it was for me) is that you are defining the machine in the
section you noted below from the guide. Until you initialize and
execute it, it doesn't "do anything". Thus, in some part of your
script you need:
%% write data; # sets up all the static data needed by the tokenizer
Then (somewhere else in all likelihood), you need to initialize and
execute the machine. So, for example:
int main(int argc, char* argv[]) {
int cs; // you can use this to check the status of the machine
char* p = "Your text to tokenize";
char* pe = p + strlen(p);
%% write init;
%% write exec; # this will execute the machine given the input
provided by 'p'
if (cs ==<machine_name>_error)
fprintf(stderr, "Error\n");
return 0;
}
What might action A look like? How does it use p, pe, etc.? Ditto for B.
Maybe action 'A' is used to print a match when it ends (the '%' in
front of the A indicates that it will occur when leaving action). For
example:
action A { print("Found alpha\n"); }
action B { print("Found int\n"); }
If you need to print out the total string, you might combine it with a
'mark' action. Eg:
action mark { mark = p; /* mark needs to be set up in 'main' function
now as a char* */ }
< as before>
lower ( lower | digit )*>mark %A |
And do the same for the integer portion of the machine. You could
then change your print function to do something like:
printf("Found alpha: %.*s\n", p-mark, mark); // print out the alpha found
PS. I think this would address a big question for
ragel/parsing/lexing/tokenizing newbies, namely, how would an **expert**
implement a **simple** tokenizer?
You may also want to look at machines that are 'special' for lexing
(viz., machine := |* *|;). BTW, I'm very new to this myself - so
hopefully I didn't screw anything up too much!
PS - I'm actually trying to write up a tutorial which I'll share with
the list for feedback once it's done. I think I have a much better
grasp of what's going on now, but I think writing it out would
actually help my understanding too.
Good luck,
ktr
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users
--
Dr. Adrian D. Thurston
http://www.complang.org/thurston/
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users