Re: [PHP] Re: strip comments from HTML?

2004-05-07 Thread Rob Ellis
On Thu, May 06, 2004 at 11:48:36PM -0400, Paul Chvostek wrote:
 On Thu, May 06, 2004 at 07:11:55PM +, Curt Zirzow wrote:
   
$text=one !--bleh\nblarg - two\n;
print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);
  
  Because your missing a -
  $text=one !--bleh\nblarg -- two\n;
 
 /me applies mallet to head
 
  % php -r '$text=one !--bleh\nblarg -- two\n; print 
 ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);'
  one  two
 
 whee, it works!  :)
 

you're still missing things like ! START -...
don't know how you can get around that with ereg.

also preg_replace('/!--.*?--/s', ...) is much faster. :-)

- rob

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Paul Chvostek
On Thu, May 06, 2004 at 07:11:55PM +, Curt Zirzow wrote:
  
   $text=one !--bleh\nblarg - two\n;
   print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);
 
 Because your missing a -
 $text=one !--bleh\nblarg -- two\n;

/me applies mallet to head

 % php -r '$text=one !--bleh\nblarg -- two\n; print 
ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);'
 one  two

whee, it works!  :)

-- 
  Paul Chvostek [EMAIL PROTECTED]
  it.canadahttp://www.it.ca/
  Free PHP web hosting!http://www.it.ca/web/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread David T-G
Petr, et al --

...and then Petr U. said...
% 
% On Thu, 6 May 2004 11:57:45 -0400
% David T-G [EMAIL PROTECTED] wrote:
% 
%   Am I missing something painfully obvious?
% 
% From http://www.php.net/manual/en/pcre.pattern.syntax.php 
% 
% However, if a quantifier is followed by a question mark, then it ceases to be
% greedy, and instead matches the minimum number of times possible, so the
% pattern /\*.*?\*/  does the right thing with the C comments.

Gee, I guess I was :-)  Thanks!


% 
% --
% Petr U.


HAND

:-D
-- 
David T-G
[EMAIL PROTECTED]
http://justpickone.org/davidtg/  Shpx gur Pbzzhavpngvbaf Qrprapl Npg!



pgp0.pgp
Description: PGP signature


Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread David T-G
Michael, et al --

...and then Michael Sims said...
% 
% David T-G wrote:
% 
%  Am I missing something painfully obvious?
% 
% www.perldoc.com appears to be unavailable at the moment, but if you have perldoc
% installed, here's an excerpt from the perlre man page:

I do, but I looked in the php manual and didn't see this behavior.
[Petr, as you'll note, has rectified that matter.]  I know that some
of the PHP PCRE implementation strays from Perl's, so I just stuck
with that.


Thanks  HAND

:-D
-- 
David T-G
[EMAIL PROTECTED]
http://justpickone.org/davidtg/  Shpx gur Pbzzhavpngvbaf Qrprapl Npg!



pgp0.pgp
Description: PGP signature


Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Justin French
Thanks to everyone who's replied... appears to be quite a tricky one!!

$text = preg_replace('/!--.*--/su','',$text);
Did not work (was too greedy, matched multiple comments)
$text = preg_replace('/!--.*?--/','',$text);
Did not work (needed multiple lines)
$text = preg_replace('/!--.*?--/su','',$text);
Does work so far, finger's crossed.
Thanks again to John, Paul, Rob, Tom, et al.

---
Justin French
http://indent.com.au
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread John W. Holmes
Justin French wrote:

$text = preg_replace('/!--.*--/su','',$text);
Did not work (was too greedy, matched multiple comments)
Just for the record, it should be a capital 'U' for ungreedy. Lowercase 
'u' is something else. :)

--
---John Holmes...
Amazon Wishlist: www.amazon.com/o/registry/3BEXC84AB3A5E/

php|architect: The Magazine for PHP Professionals  www.phparch.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Rob Ellis
On Thu, May 06, 2004 at 11:10:17AM -0400, Paul Chvostek wrote:
 On Thu, May 06, 2004 at 03:02:16PM +1000, Justin French wrote:
  
  This isn't working:
  $text = preg_replace('/!--(.*)--/','',$text);
  
  Can someone advise what characters I need to escape, or whatever to get 
  it going?
 
 It's not a matter of escaping.  You're matching too much with the .*.
 
 If you're sure you won't have any right-point-brackets inside comments,
 you can use something like:
   
 $text = ereg_replace(!--[^]*--,,$text);
 
 Accurately matching comments in an extended regular expression is tricky
 though.  The only thing you can really *negate* in an ereg is a range,
 not an atom.  And the close of the comment can't be prepresented as a
 range, since it's multiple characters.
 
 Not to say it can't be done.  I just can't think of how at the moment.
 

you can make the .* less greedy...

  $text = preg_replace('/!--.*?--/', '', $text);

- rob

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread John W. Holmes
From: Rob Ellis [EMAIL PROTECTED]
 you can make the .* less greedy...

   $text = preg_replace('/!--.*?--/', '', $text);

You still need an 's' modifier if you want to match multi-line comments. The
dot character won't match newlines unless you use an 's' modifier.

---John Holmes...

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread David T-G
Rob, et al --

...and then Rob Ellis said...
% 
...
% you can make the .* less greedy...
% 
%   $text = preg_replace('/!--.*?--/', '', $text);

How does that make it ungreedy?  The only thing I see relating question
marks and greediness is that a question mark makes part of a regexp with
a /U modifier (otherwise not greedy) back into a greedy portion.

Am I missing something painfully obvious?


% 
% - rob


TIA  HAND

:-D
-- 
David T-G
[EMAIL PROTECTED]
http://justpickone.org/davidtg/  Shpx gur Pbzzhavpngvbaf Qrprapl Npg!



pgpR4LqMxbiLk.pgp
Description: PGP signature


RE: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Michael Sims
David T-G wrote:
 % you can make the .* less greedy...
 %
 %   $text = preg_replace('/!--.*?--/', '', $text);

 How does that make it ungreedy?  The only thing I see relating
 question marks and greediness is that a question mark makes part of a
 regexp with a /U modifier (otherwise not greedy) back into a greedy
 portion.

 Am I missing something painfully obvious?

www.perldoc.com appears to be unavailable at the moment, but if you have perldoc
installed, here's an excerpt from the perlre man page:

   By default, a quantified subpattern is greedy, that is,
   it will match as many times as possible (given a particu-
   lar starting location) while still allowing the rest of
   the pattern to match.  If you want it to match the minimum
   number of times possible, follow the quantifier with a
   ?.

There are other useful examples in that man page.  The U modifier is something
that PHP added, it is not part of perl's regex syntax.  It basically reverses the
greedy tendency, so that ALL of the quantifiers in a particular regex are ungreedy,
and the ? makes them greedy.  Without the U, the normal (and perl compatible)
behavior of ? following a quantifier is to make it ungreedy.

HTH

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Petr U.
On Thu, 6 May 2004 11:57:45 -0400
David T-G [EMAIL PROTECTED] wrote:

  
  Am I missing something painfully obvious?

From http://www.php.net/manual/en/pcre.pattern.syntax.php 

However, if a quantifier is followed by a question mark, then it ceases to be
greedy, and instead matches the minimum number of times possible, so the
pattern /\*.*?\*/  does the right thing with the C comments.

--
Petr U.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Paul Chvostek
On Thu, May 06, 2004 at 11:26:34AM -0400, Rob Ellis wrote:

  $text = ereg_replace(!--[^]*--,,$text);
 
 you can make the .* less greedy...
 
   $text = preg_replace('/!--.*?--/', '', $text);

Interesting to know.  My preg-foo is limited; I came at PHP from a
background of awk and sed, so when I regexp, I'm a little more
traditional about it.

Interestingly, from a shell:

 $ text='one !-- bleh -- two\nthree !-- blarg --four\n'
 $ printf $text | sed -E 's/!--([^-][^-]?[^]?)*--//g'
 one  two
 three four

which is the same behaviour as PHP.  But that still doesn't cover
multi-line.  PHP's ereg support is supposed to, but doesn't work with
this particular substitution:

 $text=one !--bleh\nblarg - two\n;
 print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);

returns

 one !--bleh
 blarg - two

But we know it really does support multiline, because:

 $text=bb\nbb;
 print ereg_replace([^ac],,$text);

returns

 

So ... this is interesting, and perhaps I'll investigate it further if
the spirit moves me.  ;-)

-- 
  Paul Chvostek [EMAIL PROTECTED]
  it.canadahttp://www.it.ca/
  Free PHP web hosting!http://www.it.ca/web/

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Rob Ellis
On Thu, May 06, 2004 at 12:47:10PM -0400, Paul Chvostek wrote:
 On Thu, May 06, 2004 at 11:26:34AM -0400, Rob Ellis wrote:
 
   $text = ereg_replace(!--[^]*--,,$text);
  
  you can make the .* less greedy...
  
$text = preg_replace('/!--.*?--/', '', $text);
 
 Interesting to know.  My preg-foo is limited; I came at PHP from a
 background of awk and sed, so when I regexp, I'm a little more
 traditional about it.
 
 Interestingly, from a shell:
 
  $ text='one !-- bleh -- two\nthree !-- blarg --four\n'
  $ printf $text | sed -E 's/!--([^-][^-]?[^]?)*--//g'
  one  two
  three four
 
 which is the same behaviour as PHP.  But that still doesn't cover
 multi-line.  PHP's ereg support is supposed to, but doesn't work with
 this particular substitution:
 
  $text=one !--bleh\nblarg - two\n;
  print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);
 
 returns
 
  one !--bleh
  blarg - two
 
 But we know it really does support multiline, because:
 
  $text=bb\nbb;
  print ereg_replace([^ac],,$text);
 
 returns
 
  
 
 So ... this is interesting, and perhaps I'll investigate it further if
 the spirit moves me.  ;-)

right, to strip multi-line comments with preg_replace you need /s

   $text = preg_replace('/!--.*?--/s', '', $text);

- rob

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: strip comments from HTML?

2004-05-06 Thread Curt Zirzow
* Thus wrote Paul Chvostek ([EMAIL PROTECTED]):
 On Thu, May 06, 2004 at 11:26:34AM -0400, Rob Ellis wrote:
 
   $text = ereg_replace(!--[^]*--,,$text);
  
  you can make the .* less greedy...
  
$text = preg_replace('/!--.*?--/', '', $text);
 
 Interestingly, from a shell:
 
  $ text='one !-- bleh -- two\nthree !-- blarg --four\n'
  $ printf $text | sed -E 's/!--([^-][^-]?[^]?)*--//g'
  one  two
  three four
 
 which is the same behaviour as PHP.  But that still doesn't cover
 multi-line.  PHP's ereg support is supposed to, but doesn't work with
 this particular substitution:
 
  $text=one !--bleh\nblarg - two\n;
  print ereg_replace(!--([^-][^-]?[^]?)*--, ,$text);

Because your missing a -
$text=one !--bleh\nblarg -- two\n;
   ^^^
Curt
-- 
I used to think I was indecisive, but now I'm not so sure.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php