Hi Adrian, Thanks for the response. I agree that the third approach you mention is the most elegant. Having said all that, I've just finished implementing the counters with a second pass (ie. the first approach).
The reason is that I actually want to record the starting positions (line & column) *and* the ending positions for each token. I therefore played around with having an action on entry that recorded the existing positions, and a final action that then wrote the starting and ending positions into the token struct... however, I started to get myself confused with how the backtracking might come into play, so opted to take the whole counting-malarky outside of the machine :) I may well revisit this, but have since moved onto the cfg parser as the second-pass approach actually does work OK... it's just not as "nice" as keeping it all self-contained in the machine. A quick question though; regarding your examples below, are you suggesting that the use of the intersection means that the backtracking won't occur? Many thanks, -Joe On 23 Apr 2010, at 19:31, Adrian Thurston wrote: > Hi Joe, > > There are a few approaches to this problem. The simplest approach is to > simply count newlines in the matched text in every match action. The downside > to this is that you are passing over everything twice. > > If a second pass over each token is something you'd like to avoid, then you > can go down the sub-scanner road. Basically, any pattern that can contain a > newline, such as multi-line comments, or strings, can be implemented with a > sub-scanner. In the main scanner you write a pattern for whatever sequence of > characters takes you into comments, for example, then jump into a separate > scanner for comments. You end up with broken down comments though, as opposed > to a whole match of a comment. > > A third approach is to write patterns that count newlines as they go. This is > my favourite approach. The only worry is backtracking. If your scanner > patterns backtrack over newlines, then you've got double counting happening. > With a well-designed scanner, this isn't normally a problem though. Try > something like this: > > counter = ( any | '\n' @inc )*; > comment = ( '/*' any* :>> '*/' ) & counter; > > Or embed the counting deep: > > comment = ( '/*' ( any | '\n' @inc )* :>> '*/' ) & counter; > > -Adrian _______________________________________________ ragel-users mailing list [email protected] http://www.complang.org/mailman/listinfo/ragel-users
