[re2c-general] HowTo: 'greedy' regex accepts first of several matches?

Lynn Allan Tue, 20 Feb 2007 06:46:25 -0800

<alert comment="not that familiar with regex and rusty with re2" />


I'm trying to write a scanner than does the equivalent of 'greedily' 
detecting html tag-pairs, including situations with several of the 
same tag-pair in the string. An example:
normal-a <b>bold-b </b> normal-c <b>bold-d </b> normal-e

I've tried a variety of combinations that are something like:
/*!re2c
"<b>".+?"</b>" { code goes here; }
[\000-\377] { code goes here; }
*/

This sort of works, but I haven't been able to figure out how to get 
it to be "greedy". With a "source string" like the previous, I want it 
to
"accept" after "consuming" <b>bold-b </b> .... but the scanner keeps 
on going.

When I step thru the generated code, I see:
yyaccept = 1;
when it it has "consumed" <b>bold-b </b>, but it keeps going and also 
reaches:
yyaccept = 1;
after <b>bold-d </b>.

I want it to stop/accept after <b>bold-b </b> so the length with be 14 
rather than 38.

Can this be done? Am I doing something wrong or leaving something out?

In the comments for the "strip comments" example, I saw information 
about "multiple scanner blocks" and also "trailing contexts". Do these 
apply?

Is there sample code that demonstrates "best practices" for detecting 
and removing html tags? Seems like that would be a good use of re2c. 
Even better would be a sample that demonstrated "best practices" for 
using re2c to replace html tags with something else.

Thanks




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Re2c-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/re2c-general

[re2c-general] HowTo: 'greedy' regex accepts first of several matches?

Reply via email to