Ok, I'm going to pay attention to the problem, the XSS filter:
I am using a 'blacklist', because my users need to enter as much X\HTML
as I can possibly allow them.
So, tags I'm originally NOT allowing are:
<applet> <script> <embed> <object> <server> <frame> <iframe> <frameset>
<html> <body>
I'm removing all javascript event attributes ( onclick="alert('xss');" )
Removing all javascript escaped quotes: \' and \"
In any tag left that has a link in it (src|href|action), I'm making sure
it is NOT relative and NOT to my server: <a> <img> <ilayer> <form>
Any 'target' attributes, I'm changing to target='_blank', although I
still think there is a security flaw in here for a popup window trying
to run code on
the originating page.
I will be checking CSS urls.
Also, these dangerous strings:
javascript:
java \n script:
String.fromCharCode(x) //mainly for js quotes or parenthesis
charCodeAt
eval(
Well, that is to start anyways.
-Joe
Dale Newfield wrote:
There are two discussions here that are getting convoluted: WHEN to
"clean" and HOW to clean. I still have yet to find a good
comprehensive way to do the latter (more below), but right here I'm
responding to the former.
Christopher Schultz wrote:
If you /are/ capturing text you will be using that /can/ contain HTML
markup, then cleaning it as it comes in is still a mistake. Let's say
you have a bug in your cleansing code. In that case, bad stuff gets into
your database where it's hard to root out and fix.
If that data is hard to find than you haven't cleanly defined your DB
schema.
WHEN to do the cleaning is not a question of security and
maintainability, but a question of amortizing clock cycles to try to
get responses out to browsers as quickly as possible. There is no
reason to clean the same piece of text with the same algorithm more
than once, so why not do it on the input side? If you find a bug in
your cleansing code, then once you change it, re-run it ONCE on all
the potentially dangerous text blocks. Those should map directly to
columns in your DB. If you can't look at your DB schema and tell me
which columns are displayed without escaping their contents, then
something is wrong.
I agree with Leon: cleaning input is not usually a good idea. Cleaning
output is where the real money is -- from a security and maintainability
standpoint.
I'd be happy to change my mind if you can you suggest any other reason
to re-do that work more frequently than changes to the filtering
module / data that backs the filtering module?
The acknowledgment that said algorithm also needs backing data leads
us right back to the question of HOW.
I believe all filtering efforts will eventually come down to "What
tags/attributes are OK?" (among other critical questions, like "What
values for attributes are OK?".) (If you're stuck in the "what
tags/attributes are NOT OK" world then we have need of a different
discussion: white lists vs black lists.)
So, does anyone have a good list of "safe" tags/attributes that should
be allowed through (assuming the attribute values also pass muster)?
For example, here are my (woefully incomplete) lists (plus a crossover
table (allowed_xhtml_tag_attribute_map) not shown linking allowable
combinations of the two):
allowed_xhtml_tag: a b blockquote br cite del div em font h1 h2 h3 h4
h5 h6 i img ins li ol p pre span strong sub sup table td th tr u ul
allowed_xhtml_attribute: alt border cite class color href name src
style title
For example, I already know I need to add caption and tbody to the
first table, but I've been delaying more by-hand tweaks in hopes of
finding a more systematic way to fill the tables. I've yet to find
it. Any suggestions?
-Dale Newfield
[EMAIL PROTECTED]
P.S.: the "tagsoup parse" suggestion is also good because it
guarantees that anything you do reflect back to users is valid XHTML
(and so won't screw up other parts of your page with illegally
nested/unbalanced tags).
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]