-Original Message-
From: Andreas Perstinger [mailto:andiper...@gmail.com]
Sent: Tuesday, May 28, 2013 11:10 PM
To: php-general@lists.php.net
Subject: Re: [PHP] need some regex help to strip out // comments but not
http:// urls
On 28.05.2013 23:17, Daevid Vincent wrote:
I want to remove all comments of the // variety, HOWEVER I don't want to
remove URLs...
You need a negative look behind assertion
( http://www.php.net/manual/en/regexp.reference.assertions.php ).
(?!http:)// will match // only if it isn't preceded by http:.
Bye, Andreas
This worked like a CHAMP Andreas my friend! You are a regex guru!
-Original Message-
From: Sean Greenslade [mailto:zootboys...@gmail.com]
Sent: Wednesday, May 29, 2013 10:28 AM
Also, (I haven't tested it, but) I don't think that example you gave
would work. Without any sort of quoting around the http://;
, I would assume the JS interpreter would take that double slash as a
comment starter. Do tell me if I'm wrong, though.
You're wrong Sean. :-p
This regex works in all cases listed in my example target string.
\s*(?!:)//.*?$
Or in my actual compress() method:
$sBlob = preg_replace(@\s*(?!:)//.*?$@m,'',$sBlob);
Target test case with intentional traps:
// another comment here
iframe src=http://foo.com;
function bookmarksite(title,url){
if (window.sidebar) // firefox
window.sidebar.addPanel(title, url, );
else if(window.opera window.print){ // opera
var elem = document.createElement('a');
elem.setAttribute('href',url);
elem.setAttribute('title',title);
elem.setAttribute('rel','sidebar');
elem.click();
}
else if(document.all)// ie
window.external.AddFavorite(url, title);
}
And for those interested here is the whole method...
public function compress($sBlob)
{
//remove C style /* */ blocks as well as PHPDoc /** */ blocks
$sBlob = preg_replace(@/\*(.*?)\*/@s,'',$sBlob);
//$sBlob =
preg_replace(/\*[^*]*\*+(?:[^*/][^*]*\*+)*/s,'',$sBlob);
//$sBlob = preg_replace(/\\*(?:.|[\\n\\r])*?\\*/s,'',$sBlob);
//remove // or # style comments at the start of a line possibly
redundant with next preg_replace
$sBlob =
preg_replace(@^\s*((^\s*(#+|//+)\s*.+?$\n)+)@m,'',$sBlob);
//remove // style comments that might be tagged onto valid code
lines. we don't try for # style as that's risky and not widely used
// @see http://www.php.net/manual/en/regexp.reference.assertions.php
$sBlob = preg_replace(@\s*(?!:)//.*?$@m,'',$sBlob);
if (in_array($this-_file_name_suffix, array('html','htm')))
{
//remove !-- -- blocks
$sBlob = preg_replace(/!--[^\[](.*?)--/s,'',$sBlob);
//if Tidy is enabled...
//if (!extension_loaded('tidy')) dl( ((PHP_SHLIB_SUFFIX ===
'dll') ? 'php_' : '') . 'tidy.' . PHP_SHLIB_SUFFIX);
if (FALSE extension_loaded('tidy'))
{
//use Tidy to clean up the rest. There may be some
redundancy with the above, but it shouldn't hurt
//See all parameters available here:
http://tidy.sourceforge.net/docs/quickref.html
$tconfig = array(
'clean' = true,
'hide-comments' = true,
'hide-endtags' = true,
'drop-proprietary-attributes' = true,
'join-classes' = true,
'join-styles' = true,
'quote-marks' = false,
'fix-uri' = false,
'numeric-entities' = true,
'preserve-entities' = true,
'doctype' = 'omit',
'tab-size' = 1,
'wrap' = 0,
'wrap-php' = false,
'char-encoding' = 'raw',
'input-encoding' = 'raw',
'output-encoding' = 'raw',
'ascii-chars' = true,
'newline' = 'LF',
'tidy-mark' = false,
'quiet' = true,
'show-errors' =
($this-_debug ? 6 : 0),
'show-warnings' =
$this-_debug,
);
if ($this-_log_messages) $tconfig['error-file'] =