Re: [ragel-users] Maintaining char & line counts in a scanner

Adrian Thurston Fri, 23 Apr 2010 11:31:47 -0700

Hi Joe,

There are a few approaches to this problem. The simplest approach is tosimply count newlines in the matched text in every match action. Thedownside to this is that you are passing over everything twice.

If a second pass over each token is something you'd like to avoid, thenyou can go down the sub-scanner road. Basically, any pattern that cancontain a newline, such as multi-line comments, or strings, can beimplemented with a sub-scanner. In the main scanner you write a patternfor whatever sequence of characters takes you into comments, forexample, then jump into a separate scanner for comments. You end up withbroken down comments though, as opposed to a whole match of a comment.

A third approach is to write patterns that count newlines as they go.This is my favourite approach. The only worry is backtracking. If yourscanner patterns backtrack over newlines, then you've got doublecounting happening. With a well-designed scanner, this isn't normally aproblem though. Try something like this:


counter = ( any | '\n' @inc )*;
comment = ( '/*' any* :>> '*/' ) & counter;

Or embed the counting deep:

comment = ( '/*' ( any | '\n' @inc )* :>> '*/' ) & counter;

-Adrian

Hi All,

I'm using ragel as a scanner to tokenise input for parsing of a database query 
language. I'd like to maintain a line number and character offset in the struct 
that represents a matched token but I'm having a little difficulty.


My idea would be to have two expressions - one that matches a newline and one 
that matches any other character. Clearly there would be an associated action 
with these expressions to maintain variables for the line and char count. 
Currently I have various expressions, some of which can potentially match 
multiple newlines (think multi-line comments), and some of which consume dead 
input (whitespace). I have played around keeping a tally of the counts on each 
successful match of a token (outside of the machine exex), but as in some cases 
I am discarding input completely within the state machine and not creating a 
token, it becomes difficult to track.... ideally, I'd like to keep it all 
within the machine, but can't see the best way to proceed.

Any help or pointers would be much appreciated.

Cheers,
-Joe
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Re: [ragel-users] Maintaining char & line counts in a scanner

Reply via email to