Re: [ragel-users] Maintaining char & line counts in a scanner

Joe Wildish Sun, 25 Apr 2010 08:18:46 -0700

Hi Adrian,

Thanks for the response. I agree that the third approach you mention is the 
most elegant. Having said all that, I've just finished implementing the 
counters with a second pass (ie. the first approach).

The reason is that I actually want to record the starting positions (line & 
column) *and* the ending positions for each token. I therefore played around 
with having an action on entry that recorded the existing positions, and a 
final action that then wrote the starting and ending positions into the token 
struct... however, I started to get myself confused with how the backtracking 
might come into play, so opted to take the whole counting-malarky outside of 
the machine :) I may well revisit this, but have since moved onto the cfg 
parser as the second-pass approach actually does work OK... it's just not as 
"nice" as keeping it all self-contained in the machine. 

A quick question though; regarding your examples below, are you suggesting that 
the use of the intersection means that the backtracking won't occur? 

Many thanks,

-Joe

On 23 Apr 2010, at 19:31, Adrian Thurston wrote:

> Hi Joe,
> 
> There are a few approaches to this problem. The simplest approach is to 
> simply count newlines in the matched text in every match action. The downside 
> to this is that you are passing over everything twice.
> 
> If a second pass over each token is something you'd like to avoid, then you 
> can go down the sub-scanner road. Basically, any pattern that can contain a 
> newline, such as multi-line comments, or strings, can be implemented with a 
> sub-scanner. In the main scanner you write a pattern for whatever sequence of 
> characters takes you into comments, for example, then jump into a separate 
> scanner for comments. You end up with broken down comments though, as opposed 
> to a whole match of a comment.
> 
> A third approach is to write patterns that count newlines as they go. This is 
> my favourite approach. The only worry is backtracking. If your scanner 
> patterns backtrack over newlines, then you've got double counting happening. 
> With a well-designed scanner, this isn't normally a problem though. Try 
> something like this:
> 
> counter = ( any | '\n' @inc )*;
> comment = ( '/*' any* :>> '*/' ) & counter;
> 
> Or embed the counting deep:
> 
> comment = ( '/*' ( any | '\n' @inc )* :>> '*/' ) & counter;
> 
> -Adrian

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Re: [ragel-users] Maintaining char & line counts in a scanner

Reply via email to