Re: html sanitizers

2007-07-13 Thread Derek Anderson
wow, this lib is great. danke. heh, love that feeling that i've been wasting my life coding scrapers with regexs up until now... :) patrick k. wrote: > it´s easy to write a customized sanitizer using beautifulsoup. > http://www.crummy.com/software/BeautifulSoup/ > > 1) place beautifulsoup.p

Re: html sanitizers

2007-07-13 Thread Derek Anderson
well, but sometimes you want them to be able to enter HTML. style items, simple links, etc... [EMAIL PROTECTED] wrote: > Yes it is much safer to reject rather than sanitize. If bad tags are > detected then reject the input out of hand. If you don't your > sanitizer could be turned against you

Re: html sanitizers

2007-07-13 Thread [EMAIL PROTECTED]
Yes it is much safer to reject rather than sanitize. If bad tags are detected then reject the input out of hand. If you don't your sanitizer could be turned against you and end up changing slightly dangerous tags into really dangerous tags. What happens here ipt> when a sanitizer is set to remov

Re: html sanitizers

2007-07-13 Thread Horst Gutmann
Brett Parker wrote: > On Fri, Jul 13, 2007 at 11:48:50AM +0100, Nic James Ferrier wrote: >> Brett Parker <[EMAIL PROTECTED]> writes: >> >>> On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote: Derek Anderson <[EMAIL PROTECTED]> writes: > hey all, > > could anyon

Re: html sanitizers

2007-07-13 Thread Brett Parker
On Fri, Jul 13, 2007 at 11:48:50AM +0100, Nic James Ferrier wrote: > > Brett Parker <[EMAIL PROTECTED]> writes: > > > On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote: > >> > >> Derek Anderson <[EMAIL PROTECTED]> writes: > >> > >> > hey all, > >> > > >> > could anyone point me

Re: html sanitizers

2007-07-13 Thread Nic James Ferrier
Brett Parker <[EMAIL PROTECTED]> writes: > On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote: >> >> Derek Anderson <[EMAIL PROTECTED]> writes: >> >> > hey all, >> > >> > could anyone point me to a python html sanitizer implementation (or >> > example)? i don't mean to strip al

Re: html sanitizers

2007-07-13 Thread Brett Parker
On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote: > > Derek Anderson <[EMAIL PROTECTED]> writes: > > > hey all, > > > > could anyone point me to a python html sanitizer implementation (or > > example)? i don't mean to strip all html, just tags and attributes not > > on a whit

Re: html sanitizers

2007-07-13 Thread Nic James Ferrier
Derek Anderson <[EMAIL PROTECTED]> writes: > hey all, > > could anyone point me to a python html sanitizer implementation (or > example)? i don't mean to strip all html, just tags and attributes not > on a whitelist, such as I/B/A href/U/etc. I use libxml2/libxslt, something like: doc = li

Re: html sanitizers

2007-07-13 Thread patrick k.
it´s easy to write a customized sanitizer using beautifulsoup. http://www.crummy.com/software/BeautifulSoup/ 1) place beautifulsoup.py somewhere in your pythonpath 2) build your sanitizer and save it somewhere on your pythonpath in my case it´s called eatMe and looks like this: http://dpaste.com/

html sanitizers

2007-07-13 Thread Derek Anderson
hey all, could anyone point me to a python html sanitizer implementation (or example)? i don't mean to strip all html, just tags and attributes not on a whitelist, such as I/B/A href/U/etc. danke, derek --~--~-~--~~~---~--~~ You received this message because y