>Do I understand all that correctly and does anybody see if there is any >way this could be re-factored and simplified?
Mostly fine - but depending on how well you know/trust the input, there are some things that could be improved. As a quick[ish] general comment about Regex and HTML/XML - parsing markup is not something regular expressions have been designed for, and whilst for relatively simple things like this they can work, if you start getting into invalid/non-standard HTML, or even just nested tags, Regexes rapidly increase in complexity. For fiddling with HTML/XML, it's a good idea to consider: - Functions like xmlParse or htmlParse (Railo) to create CF objects. - XPath - jQuery selectors (Sizzle) Anyway, now to give my opinion on what you've done. :) >< - match the opening angle bracket '<' character. Yep. >(/?) - match an optional forward slash character '/' and put result in >back reference 1. Yep, although unless you're using this backreference for something specifically, I would probably do something more like this: </a>|<a ...> Which I think is clearer and more precise (but doesn't have the backref which you may be using) >a - match the 'a' character. Yep. You may want to make your expression case-insensitive if you want to match uppercase 'A' tags also. >([^>]*?(?=target|>)) - match the minimum zero or more characters until >either 'target' or '>' and put result in back reference 2. Correct description, but not necessarily what you want - consider: <a href="www.target.com" class="something" target="something"...> To avoid the first target matching (then failing, slowing things down), you could do: ([^>]*?(?=\starget="|>)) Where \s is whitespace (space/newline/tab) along with the =" makes it much more likely for the attribute to be what you're matching. Also, I think you know, but for reference of anyone else reading, the lookahead (?=...) part is a non-capturing zero-width match - it doesn't put the target text into the 2nd back reference. (Does that explanation make sense?) >( *target="[^"]*" *)? - match an optional 'target="..." with zero or >more non-double quote characters between the double quotes and put in >back reference 3. With multiple spaces matching either side, yes. With HTML, you can technically have line breaks and tabs between attributes - so whilst this might work for specific input, in general it's a good idea to use \s as above. Again, with general HTML stuff you can have spaces around the equals sign also. If you know you've got specifically formatted stuff the above is redundant, but for an example... (\s*target\s*=\s*"[^"]*"\s*)? And that's still not considering single quotes, unquoted values, etc. >([^>]*) - match zero or more non-angle bracket characters that maybe >left after all the above an put into back reference 4. Yep. Hopefully that is helpful? Let me know if there's anything I've not explained properly. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Adobe® ColdFusion® 8 software 8 is the most important and dramatic release to date Get the Free Trial http://ad.doubleclick.net/clk;207172674;29440083;f Archive: http://www.houseoffusion.com/groups/regex/message.cfm/messageid:1217 Subscription: http://www.houseoffusion.com/groups/regex/subscribe.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.21
