It depends on what YOU want to allow in the way of basic HTML... some parts
of our sites, we allow <B><I><A>, other parts we don't allow <A>.

The reason I mention this is that javascript has exist inside an <A>, which
is an issue Lief will have to look at.

<A HREF="javascript:self.close();">bye bye</a> is a pretty evil piece of
code :)

It gets worse:

Javascript can be placed inside MANY objects using event handlers like


... not just within <SCRIPT> tags.

So you can't even opt to remove any links that begin with "javascript:"
because, this won't strip any of the events above.

All of these will close the window, in one way or another on JS enabled
browsers (perhaps not all of them, but does it matter?):

<A HREF="javascript:self.close();">bye bye</a>
<A HREF="#" onmouseover="javascript:self.close();" >bye bye</a>
<A HREF="#" onmouseout="javascript:self.close();" >bye bye</a>
<A HREF="#" onclick="javascript:self.close();" >bye bye</a>

But it gets worse :)

<DIV onmouseover="javascript:self.close();">close</DIV>
<P onmouseover="javascript:self.close();">close</P>

Both achieve the same thing, so the above events are not tied to <A> tags.

This means that any allowed tag could be used maliciously with an event like
onmouseover to cause havoc on any site.

So we could strip out ALL of the above events (and the many more that
exist), but then we'd be taking away the ability for these events to work in
our favour on CSS and DHTML projects... so that's a choice you'd have to

An ideal solution would be for a function that looked for a list of events
(like above, but more of them) in a string , and stripped them out if they
begin with 'javascript:'.  If <A> is one of your allowed HTML tags when
using strip_tags(), it would also have to strip out any HREF which begins
with 'javascript:', or perhaps strip out the entire <A> tag


<DIV onmouseover="javascript:self.close();">close</DIV>
would become
<DIV >close</DIV>

<A HREF="#" onmouseover="javascript:self.close();">bye bye</a>
would become
<A HREF="#" >

<A HREF="javascript:self.close();">bye bye</a>
would become
<A>bye bye</a> or bye bye

This sounds like a big job -- way out of my league, from both a logic and
regular expression point of view, but a worthwhile cause indeed!!!

I had a quick search for javascript on, and found limited stuff of

there was this code on

// $document should contain an HTML document.
// This will remove HTML tags, javascript sections
// and white space. It will also convert some
// common HTML entities to their text equivalent.

$search = array ("'<script[^>]*?>.*?</script>'si",  // Strip out javascript
                 "'<[\/\!]*?[^<>]*?>'si",           // Strip out html tags
                 "'([\r\n])[\s]+'",                 // Strip out white space
                 "'&(quot|#34);'i",                 // Replace html entities
                 "'&#(\d+);'e");                    // evaluate as php

$replace = array ("",
                  " ",

$text = preg_replace ($search, $replace, $document);

On I found this

"This function does not modify any attributes on the tags that you allow
using allowable_tags, including the style and onmouseover attributes that a
mischievous user may abuse when posting text that will be shown to other

There is also a number of interesting comments in regards to these events.

Perhaps the solution is a regular expression, or perhaps it's to do with XML
parsing, which I've not had any experience.

Sorry for the long post!!!

Justin French
Creative Director

PHP General Mailing List (
To unsubscribe, visit:

Reply via email to