Alan Kennedy wrote: > [Niall] > >> If you are using an editor that gives you >> HTML you are relying on it for all the escaping, you can't escape >> yourself since you would then lose the formatting. >> What worries me about this is that you would have to be very sure that >> your input is actually coming from the editor and >> not just someone sending in their own crafted POST request. >> >> Is this a valid concern or am I just being paranoid? >> > > It is a valid concern, you are being justifiably paranoid ;-) > > Best way to deal with this situation is to turn the HTML into xhtml, > and sanitize that, i.e. strip the <script> tags, etc, yourself. > > I wrote a post in comp.lang.python a few years ago about doing exactly > this, using SAX. > > http://groups.google.com/group/comp.lang.python/msg/4886938cd7fd3732 > > Read the entire thread, there are a few versions of the code. > > Alan. Ah thanks :) So the main advantage of going this route (parsing the XHTML and removing anything not on a node/attribute whitelist) is that you only have to do the parsing once (on the input side) and you can trust the content after that. Whereas with the intermediary markup you never trust it and have to do the cleaning + parsing every time.
Niall --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Python Ireland" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.ie/group/pythonireland?hl=en -~----------~----~----~----~------~----~------~--~---
