<alert comment="not that familiar with regex and rusty with re2" />
I'm trying to write a scanner than does the equivalent of 'greedily'
detecting html tag-pairs, including situations with several of the
same tag-pair in the string. An example:
normal-a <b>bold-b </b> normal-c <b>bold-d </b> normal-e
I've tried a variety of combinations that are something like:
/*!re2c
"<b>".+?"</b>" { code goes here; }
[\000-\377] { code goes here; }
*/
This sort of works, but I haven't been able to figure out how to get
it to be "greedy". With a "source string" like the previous, I want it
to
"accept" after "consuming" <b>bold-b </b> .... but the scanner keeps
on going.
When I step thru the generated code, I see:
yyaccept = 1;
when it it has "consumed" <b>bold-b </b>, but it keeps going and also
reaches:
yyaccept = 1;
after <b>bold-d </b>.
I want it to stop/accept after <b>bold-b </b> so the length with be 14
rather than 38.
Can this be done? Am I doing something wrong or leaving something out?
In the comments for the "strip comments" example, I saw information
about "multiple scanner blocks" and also "trailing contexts". Do these
apply?
Is there sample code that demonstrates "best practices" for detecting
and removing html tags? Seems like that would be a good use of re2c.
Even better would be a sample that demonstrated "best practices" for
using re2c to replace html tags with something else.
Thanks
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Re2c-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/re2c-general