Re: [PHP] Re: strip comments from HTML?
On Thu, May 06, 2004 at 11:48:36PM -0400, Paul Chvostek wrote: On Thu, May 06, 2004 at 07:11:55PM +, Curt Zirzow wrote: $text=one !--bleh\nblarg - two\n; print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text); Because your missing a - $text=one !--bleh\nblarg -- two\n; /me applies mallet to head % php -r '$text=one !--bleh\nblarg -- two\n; print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);' one two whee, it works! :) you're still missing things like ! START -... don't know how you can get around that with ereg. also preg_replace('/!--.*?--/s', ...) is much faster. :-) - rob -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
On Thu, May 06, 2004 at 07:11:55PM +, Curt Zirzow wrote: $text=one !--bleh\nblarg - two\n; print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text); Because your missing a - $text=one !--bleh\nblarg -- two\n; /me applies mallet to head % php -r '$text=one !--bleh\nblarg -- two\n; print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);' one two whee, it works! :) -- Paul Chvostek [EMAIL PROTECTED] it.canadahttp://www.it.ca/ Free PHP web hosting!http://www.it.ca/web/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
Petr, et al -- ...and then Petr U. said... % % On Thu, 6 May 2004 11:57:45 -0400 % David T-G [EMAIL PROTECTED] wrote: % % Am I missing something painfully obvious? % % From http://www.php.net/manual/en/pcre.pattern.syntax.php % % However, if a quantifier is followed by a question mark, then it ceases to be % greedy, and instead matches the minimum number of times possible, so the % pattern /\*.*?\*/ does the right thing with the C comments. Gee, I guess I was :-) Thanks! % % -- % Petr U. HAND :-D -- David T-G [EMAIL PROTECTED] http://justpickone.org/davidtg/ Shpx gur Pbzzhavpngvbaf Qrprapl Npg! pgp0.pgp Description: PGP signature
Re: [PHP] Re: strip comments from HTML?
Michael, et al -- ...and then Michael Sims said... % % David T-G wrote: % % Am I missing something painfully obvious? % % www.perldoc.com appears to be unavailable at the moment, but if you have perldoc % installed, here's an excerpt from the perlre man page: I do, but I looked in the php manual and didn't see this behavior. [Petr, as you'll note, has rectified that matter.] I know that some of the PHP PCRE implementation strays from Perl's, so I just stuck with that. Thanks HAND :-D -- David T-G [EMAIL PROTECTED] http://justpickone.org/davidtg/ Shpx gur Pbzzhavpngvbaf Qrprapl Npg! pgp0.pgp Description: PGP signature
Re: [PHP] Re: strip comments from HTML?
Thanks to everyone who's replied... appears to be quite a tricky one!! $text = preg_replace('/!--.*--/su','',$text); Did not work (was too greedy, matched multiple comments) $text = preg_replace('/!--.*?--/','',$text); Did not work (needed multiple lines) $text = preg_replace('/!--.*?--/su','',$text); Does work so far, finger's crossed. Thanks again to John, Paul, Rob, Tom, et al. --- Justin French http://indent.com.au -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
Justin French wrote: $text = preg_replace('/!--.*--/su','',$text); Did not work (was too greedy, matched multiple comments) Just for the record, it should be a capital 'U' for ungreedy. Lowercase 'u' is something else. :) -- ---John Holmes... Amazon Wishlist: www.amazon.com/o/registry/3BEXC84AB3A5E/ php|architect: The Magazine for PHP Professionals www.phparch.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
On Thu, May 06, 2004 at 11:10:17AM -0400, Paul Chvostek wrote: On Thu, May 06, 2004 at 03:02:16PM +1000, Justin French wrote: This isn't working: $text = preg_replace('/!--(.*)--/','',$text); Can someone advise what characters I need to escape, or whatever to get it going? It's not a matter of escaping. You're matching too much with the .*. If you're sure you won't have any right-point-brackets inside comments, you can use something like: $text = ereg_replace(!--[^]*--,,$text); Accurately matching comments in an extended regular expression is tricky though. The only thing you can really *negate* in an ereg is a range, not an atom. And the close of the comment can't be prepresented as a range, since it's multiple characters. Not to say it can't be done. I just can't think of how at the moment. you can make the .* less greedy... $text = preg_replace('/!--.*?--/', '', $text); - rob -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
From: Rob Ellis [EMAIL PROTECTED] you can make the .* less greedy... $text = preg_replace('/!--.*?--/', '', $text); You still need an 's' modifier if you want to match multi-line comments. The dot character won't match newlines unless you use an 's' modifier. ---John Holmes... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
Rob, et al -- ...and then Rob Ellis said... % ... % you can make the .* less greedy... % % $text = preg_replace('/!--.*?--/', '', $text); How does that make it ungreedy? The only thing I see relating question marks and greediness is that a question mark makes part of a regexp with a /U modifier (otherwise not greedy) back into a greedy portion. Am I missing something painfully obvious? % % - rob TIA HAND :-D -- David T-G [EMAIL PROTECTED] http://justpickone.org/davidtg/ Shpx gur Pbzzhavpngvbaf Qrprapl Npg! pgpR4LqMxbiLk.pgp Description: PGP signature
RE: [PHP] Re: strip comments from HTML?
David T-G wrote: % you can make the .* less greedy... % % $text = preg_replace('/!--.*?--/', '', $text); How does that make it ungreedy? The only thing I see relating question marks and greediness is that a question mark makes part of a regexp with a /U modifier (otherwise not greedy) back into a greedy portion. Am I missing something painfully obvious? www.perldoc.com appears to be unavailable at the moment, but if you have perldoc installed, here's an excerpt from the perlre man page: By default, a quantified subpattern is greedy, that is, it will match as many times as possible (given a particu- lar starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a ?. There are other useful examples in that man page. The U modifier is something that PHP added, it is not part of perl's regex syntax. It basically reverses the greedy tendency, so that ALL of the quantifiers in a particular regex are ungreedy, and the ? makes them greedy. Without the U, the normal (and perl compatible) behavior of ? following a quantifier is to make it ungreedy. HTH -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
On Thu, 6 May 2004 11:57:45 -0400 David T-G [EMAIL PROTECTED] wrote: Am I missing something painfully obvious? From http://www.php.net/manual/en/pcre.pattern.syntax.php However, if a quantifier is followed by a question mark, then it ceases to be greedy, and instead matches the minimum number of times possible, so the pattern /\*.*?\*/ does the right thing with the C comments. -- Petr U. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
On Thu, May 06, 2004 at 11:26:34AM -0400, Rob Ellis wrote: $text = ereg_replace(!--[^]*--,,$text); you can make the .* less greedy... $text = preg_replace('/!--.*?--/', '', $text); Interesting to know. My preg-foo is limited; I came at PHP from a background of awk and sed, so when I regexp, I'm a little more traditional about it. Interestingly, from a shell: $ text='one !-- bleh -- two\nthree !-- blarg --four\n' $ printf $text | sed -E 's/!--([^-][^-]?[^]?)*--//g' one two three four which is the same behaviour as PHP. But that still doesn't cover multi-line. PHP's ereg support is supposed to, but doesn't work with this particular substitution: $text=one !--bleh\nblarg - two\n; print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text); returns one !--bleh blarg - two But we know it really does support multiline, because: $text=bb\nbb; print ereg_replace([^ac],,$text); returns So ... this is interesting, and perhaps I'll investigate it further if the spirit moves me. ;-) -- Paul Chvostek [EMAIL PROTECTED] it.canadahttp://www.it.ca/ Free PHP web hosting!http://www.it.ca/web/ -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
On Thu, May 06, 2004 at 12:47:10PM -0400, Paul Chvostek wrote: On Thu, May 06, 2004 at 11:26:34AM -0400, Rob Ellis wrote: $text = ereg_replace(!--[^]*--,,$text); you can make the .* less greedy... $text = preg_replace('/!--.*?--/', '', $text); Interesting to know. My preg-foo is limited; I came at PHP from a background of awk and sed, so when I regexp, I'm a little more traditional about it. Interestingly, from a shell: $ text='one !-- bleh -- two\nthree !-- blarg --four\n' $ printf $text | sed -E 's/!--([^-][^-]?[^]?)*--//g' one two three four which is the same behaviour as PHP. But that still doesn't cover multi-line. PHP's ereg support is supposed to, but doesn't work with this particular substitution: $text=one !--bleh\nblarg - two\n; print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text); returns one !--bleh blarg - two But we know it really does support multiline, because: $text=bb\nbb; print ereg_replace([^ac],,$text); returns So ... this is interesting, and perhaps I'll investigate it further if the spirit moves me. ;-) right, to strip multi-line comments with preg_replace you need /s $text = preg_replace('/!--.*?--/s', '', $text); - rob -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: strip comments from HTML?
* Thus wrote Paul Chvostek ([EMAIL PROTECTED]): On Thu, May 06, 2004 at 11:26:34AM -0400, Rob Ellis wrote: $text = ereg_replace(!--[^]*--,,$text); you can make the .* less greedy... $text = preg_replace('/!--.*?--/', '', $text); Interestingly, from a shell: $ text='one !-- bleh -- two\nthree !-- blarg --four\n' $ printf $text | sed -E 's/!--([^-][^-]?[^]?)*--//g' one two three four which is the same behaviour as PHP. But that still doesn't cover multi-line. PHP's ereg support is supposed to, but doesn't work with this particular substitution: $text=one !--bleh\nblarg - two\n; print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text); Because your missing a - $text=one !--bleh\nblarg -- two\n; ^^^ Curt -- I used to think I was indecisive, but now I'm not so sure. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php