Feature Requests item #1540845, was opened at 2006-08-15 20:47
Message generated for change (Comment added) made by helly
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=616203&aid=1540845&group_id=96864

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Priority: 5
Submitted By: Justin Mason (jmason)
>Assigned to: Marcus Börger (helly)
Summary: RFE: way for scanner to report subsumed tokens

Initial Comment:
hi --

Looking at re2c for SpamAssassin -- it's improved a lot
since the last time I checked ;)  nice work!

one thing, though.  it would be really great if re2c
could track subsumed tokens.  For example:

/*!re2c
        "foo"     {return "FOO";}
        "food"    {return "FOOD";}
  [\000-\377]     { return NULL; }
*/


Assume the input string is "food", and an
appropriately-smart caller who knows to track the
YYCURSOR state and call multiple times until it
receives NULL is being used.  This should return "FOO"
on first call, then "FOOD" on the second call, then
NULL on the third call.  

Instead, the longest matching token is used: return
"FOOD" on first call, then NULL on the third call.  

most re2c users could write their token tables to
automatically return *both* "FOO" and "FOOD" on the
first call -- and initially I was doing this.  however,
in my usage, the tokens are derived from spamassassin
rules, so I can't always know if one is subsumed by
another... and determining this programatically in
advance would require rewriting most of re2c ;)

Instead, I've been changing my calling code to not
support full regexp semantics in the input to re2c. 
This is obviously defeating much of the point, so I'd
love to fix that...

Are there any plans to implement this?

cheers,

--j.

----------------------------------------------------------------------

>Comment By: Marcus Börger (helly)
Date: 2006-08-15 22:53

Message:
Logged In: YES 
user_id=271023

re2c is designed in a way that requires most complex rule 
first. In this case it means the "FOOD" rule needs to be 
in front of the "FOO" rule. Then when re2c reads "FOO" you 
get the token and the story ends. However you can write 
some handling the code generated by re2c to do what you 
want. That is why re2c was built with a focus on extreme 
flexibility.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=616203&aid=1540845&group_id=96864

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Re2c-general mailing list
Re2c-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/re2c-general

Reply via email to