RE: [PHP] [SOLVED] need some regex help to strip out // comments but not http:// urls

2013-05-29 Thread Daevid Vincent


 -Original Message-
 From: Andreas Perstinger [mailto:andiper...@gmail.com]
 Sent: Tuesday, May 28, 2013 11:10 PM
 To: php-general@lists.php.net
 Subject: Re: [PHP] need some regex help to strip out // comments but not
 http:// urls
 
 On 28.05.2013 23:17, Daevid Vincent wrote:
  I want to remove all comments of the // variety, HOWEVER I don't want to
  remove URLs...
 
 
 You need a negative look behind assertion
 ( http://www.php.net/manual/en/regexp.reference.assertions.php ).
 
 (?!http:)// will match // only if it isn't preceded by http:.
 
 Bye, Andreas

This worked like a CHAMP Andreas my friend! You are a regex guru!

 -Original Message-
 From: Sean Greenslade [mailto:zootboys...@gmail.com]
 Sent: Wednesday, May 29, 2013 10:28 AM

 Also, (I haven't tested it, but) I don't think that example you gave
 would work. Without any sort of quoting around the http://;
 , I would assume the JS interpreter would take that double slash as a
 comment starter. Do tell me if I'm wrong, though.

You're wrong Sean. :-p

This regex works in all cases listed in my example target string.

\s*(?!:)//.*?$

Or in my actual compress() method:

$sBlob = preg_replace(@\s*(?!:)//.*?$@m,'',$sBlob);

Target test case with intentional traps:

// another comment here
iframe src=http://foo.com;
function bookmarksite(title,url){
if (window.sidebar) // firefox
window.sidebar.addPanel(title, url, );
else if(window.opera  window.print){ // opera
var elem = document.createElement('a');
elem.setAttribute('href',url);
elem.setAttribute('title',title);
elem.setAttribute('rel','sidebar');
elem.click();
} 
else if(document.all)// ie
window.external.AddFavorite(url, title);
}


And for those interested here is the whole method...

public function compress($sBlob)
{
//remove C style /* */ blocks as well as PHPDoc /** */ blocks
$sBlob = preg_replace(@/\*(.*?)\*/@s,'',$sBlob);
//$sBlob =
preg_replace(/\*[^*]*\*+(?:[^*/][^*]*\*+)*/s,'',$sBlob);
//$sBlob = preg_replace(/\\*(?:.|[\\n\\r])*?\\*/s,'',$sBlob);

//remove // or # style comments at the start of a line possibly
redundant with next preg_replace
$sBlob =
preg_replace(@^\s*((^\s*(#+|//+)\s*.+?$\n)+)@m,'',$sBlob);
//remove // style comments that might be tagged onto valid code
lines. we don't try for # style as that's risky and not widely used
// @see http://www.php.net/manual/en/regexp.reference.assertions.php
$sBlob = preg_replace(@\s*(?!:)//.*?$@m,'',$sBlob);

if (in_array($this-_file_name_suffix, array('html','htm')))
{
//remove !-- -- blocks
$sBlob = preg_replace(/!--[^\[](.*?)--/s,'',$sBlob);

//if Tidy is enabled...
//if (!extension_loaded('tidy')) dl( ((PHP_SHLIB_SUFFIX ===
'dll') ? 'php_' : '') . 'tidy.' . PHP_SHLIB_SUFFIX);
if (FALSE  extension_loaded('tidy'))
{
//use Tidy to clean up the rest. There may be some
redundancy with the above, but it shouldn't hurt
//See all parameters available here:
http://tidy.sourceforge.net/docs/quickref.html
$tconfig = array(
'clean' = true,
'hide-comments' = true,
'hide-endtags' = true,

'drop-proprietary-attributes' = true,
'join-classes' = true,
'join-styles' = true,
'quote-marks' = false,
'fix-uri' = false,
'numeric-entities' = true,
'preserve-entities' = true,
'doctype' = 'omit',
'tab-size' = 1,
'wrap' = 0,
'wrap-php' = false,
'char-encoding' = 'raw',
'input-encoding' = 'raw',
'output-encoding' = 'raw',
'ascii-chars' = true,
'newline' = 'LF',
'tidy-mark' = false,
'quiet' = true,
'show-errors' =
($this-_debug ? 6 : 0),
'show-warnings' =
$this-_debug,
);

if ($this-_log_messages) $tconfig['error-file'] =

Re: [PHP] [SOLVED] need some regex help to strip out // comments but not http:// urls

2013-05-29 Thread Sean Greenslade
On Wed, May 29, 2013 at 4:26 PM, Daevid Vincent dae...@daevid.com wrote:


 -Original Message-
 From: Sean Greenslade [mailto:zootboys...@gmail.com]
 Sent: Wednesday, May 29, 2013 10:28 AM

 Also, (I haven't tested it, but) I don't think that example you gave
 would work. Without any sort of quoting around the http://;
 , I would assume the JS interpreter would take that double slash as a
 comment starter. Do tell me if I'm wrong, though.

 You're wrong Sean. :-p

Glad to hear it. I knew I shouldn't have opened my mouth. =P
(In all seriousness, I realize that I mis-read that code earlier. I
think I was still reeling from the suggestion of doing arbitrary
string replacements in files.)


 This regex works in all cases listed in my example target string.

 \s*(?!:)//.*?$

 Or in my actual compress() method:

 $sBlob = preg_replace(@\s*(?!:)//.*?$@m,'',$sBlob);

 Target test case with intentional traps:

 // another comment here
 iframe src=http://foo.com;
 function bookmarksite(title,url){
 if (window.sidebar) // firefox
 window.sidebar.addPanel(title, url, );
 else if(window.opera  window.print){ // opera
 var elem = document.createElement('a');
 elem.setAttribute('href',url);
 elem.setAttribute('title',title);
 elem.setAttribute('rel','sidebar');
 elem.click();
 }
 else if(document.all)// ie
 window.external.AddFavorite(url, title);
 }


And if that's the only case you're concerned about, I suppose that
regex will do just fine. Just always keep an eye out for double
slashes elsewhere. My concern would be something within a quoted
string. If that happens, no regex will save you. As I mentioned
before, regexes aren't smart enough to understand whether they're
inside or outside matching quotes. Thus, a line like this may get
eaten by your regex:
document.getElementById(textField).innerHTML = Lol slashes // are // fun;

The JS parser sees that the double slashes are inside a string, but
your regex won't. Just something to be aware of, especially because
it's something that might not show up right away.

-- 
--Zootboy

Sent from some sort of computing device.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php