'Twas brillig, and Jochem Maas at 14/03/10 23:56 did gyre and gimble: > Op 3/14/10 11:45 AM, Ashley Sheridan schreef: >> On Sun, 2010-03-14 at 12:25 +0100, Rene Veerman wrote: >> >>> On Sun, Mar 14, 2010 at 12:24 PM, Rene Veerman <rene7...@gmail.com> wrote: >>>> >>>> I'd love to have a copy of whatever function you use to filter out bad >>>> HTML/js/flash for use cases where users are allowed to enter html. >>>> I'm aware of strip_tags() "allowed tags" param, but haven't got a good list >>>> for it. >>>> >>> >>> oh, and even <img> tags can be used for cookie-stuffing on many browsers.. >>> >> >> >> Yes, and you call strip_tags() before the data goes to the browser for >> display, not before it gets inserted into the database. Essentially, you >> need to keep as much original information as possible. > > I disagree with both you. I'm like that :) > > let's assume we're not talking about data that is allowed to contain HTML, > in such cases I would do a strip_tags() on the incoming data then compare > the output ofstrip_tags() to the original input ... if they don't match then > I would log the problem and refuse to input the data at all. > > using strip_tags() on a piece of data everytime you output it if you know > that it shouldn't contain any in the first is a waste of resources ... this > does assume that you can trust the data source ... which in the case of a > database > that you control should be the case.
I used to think like that too, but I've relatively recently changed my position. While it's not as extreme an example, I used to keep data in the database *after* I had processed it with htmlspecialchars() (not quite the same as strip_tags, but the principle is the same). The issue I had was that over time, I've found the need to output to other formats - e.g. spread sheets, plain text emails, PDFs etc. in which case this pre-encoded format is a pain and I have to call html_entity_decode() to reverse the htmlspecialchars() I did in the first place. This is a royal pain in the bum and it's really ugly in the code, remembering what format the data is in in order to process it appropriately at the right points. Nowadays I work rather differently and always escape at the point of output (this does not exclude filtering at the point of input too, but I do not keep things encoded any longer - I keep it raw). Any half way decently designed caching layer will prevent any major impact from escaping at the point of output anyway. Now you could argue that encoding at the save point and reversing the encoding when needed is still a better approach and I wont argue too heavily, but for the sake of my sanity I'm much happier working the way I do now. The view layers are very clearly escaping everything that needs escaping and no logic for the "is it or is it not already escaped" leaks into this layer. (I appreciate strip tags and htmlspecialchars are not the same and my general usage may not apply to a pure striptags usage). > at any rate, strip_tags() doesn't belong in an 'anti-sql-injection' routine as > it has nothing to do with sql injection at all. Indeed, it's more about XSS and CSRF rather than SQL injection. Col -- Colin Guthrie gmane(at)colin.guthr.ie http://colin.guthr.ie/ Day Job: Tribalogic Limited [http://www.tribalogic.net/] Open Source: Mandriva Linux Contributor [http://www.mandriva.com/] PulseAudio Hacker [http://www.pulseaudio.org/] Trac Hacker [http://trac.edgewall.org/] -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php