Re: Quarantining crap HTML?

2013-05-23 Thread Philip Skinner
On 05/22/2013 07:53 PM, David Dorward wrote: On 22 May 2013, at 16:29, DAVID HODGKINSON wrote: On 21 May 2013, at 13:14, Philip Skinner wrote: You can specify the content of an iframe using a javascript call in the src: Upon sleeping on it, this was the direction I was headed in. The pro

Re: Quarantining crap HTML?

2013-05-22 Thread Th. J. van Hoesel
Op 22 mei 2013, om 19:30 heeft Dirk Koopman het volgende geschreven: > On 22/05/13 16:29, DAVID HODGKINSON wrote: >> >> Upon sleeping on it, this was the direction I was headed in. >> >> The problem is the HTML is user-generated and we know where that >> leads. >> > > Carefully constructed, e

Re: Quarantining crap HTML?

2013-05-22 Thread David Dorward
On 22 May 2013, at 16:29, DAVID HODGKINSON wrote: On 21 May 2013, at 13:14, Philip Skinner wrote: You can specify the content of an iframe using a javascript call in the src: Upon sleeping on it, this was the direction I was headed in. The problem is the HTML is user-generated and we know

Re: Quarantining crap HTML?

2013-05-22 Thread Dirk Koopman
On 22/05/13 16:29, DAVID HODGKINSON wrote: Upon sleeping on it, this was the direction I was headed in. The problem is the HTML is user-generated and we know where that leads. Carefully constructed, efficient and well tested code?

Re: Quarantining crap HTML?

2013-05-22 Thread DAVID HODGKINSON
Upon sleeping on it, this was the direction I was headed in. The problem is the HTML is user-generated and we know where that leads. On 21 May 2013, at 13:14, Philip Skinner wrote: > You can specify the content of an iframe using a javascript call in the src: > > > > On 05/21/2013 01:57

Re: Quarantining crap HTML?

2013-05-22 Thread DAVID HODGKINSON
On 21 May 2013, at 13:08, Dave Cross wrote: > http://www.catb.org/esr/faqs/smart-questions.html Wow, you can put links in email. Amazing!

Re: Quarantining crap HTML?

2013-05-21 Thread David Cantrell
On Tue, May 21, 2013 at 12:31:54PM +0100, Dave Hodgkinson wrote: > In keeping with the spirit of the list, this isn't directly a perl question > but it might be part of the solution. > > I'm picking up HTML from another site, and that HTML is pretty crappy. > > Is there any way of quarantining it

Re: Quarantining crap HTML?

2013-05-21 Thread Ben Vinnerd
What if it contains \ ? :) Seriously though, I'd assumed that OP (Dave) didn't want to make any changes to the HTML he'd taken from the other website - although I may be wrong. On 21 May 2013 14:06, Philip Skinner wrote: > \ > > > On 05/21/2013 02:28 PM, Ben Vinnerd wrote: > >> What if the HT

Re: Quarantining crap HTML?

2013-05-21 Thread Denny
On Tue, 2013-05-21 at 13:00 +, dave.lamb...@gmail.com wrote: > I did a thing about 10 years ago using HTML::TreeBuilder to > remove elements and attributes which aren't on a whitelist. There's a module for that. Well, several actually, but I settled on HTML::Restrict when I was looking at th

Re: Quarantining crap HTML?

2013-05-21 Thread Philip Skinner
\ On 05/21/2013 02:28 PM, Ben Vinnerd wrote: What if the HTML contains single or double quotes? On 21 May 2013 13:14, Philip Skinner wrote: You can specify the content of an iframe using a javascript call in the src: On 05/21/2013 01:57 PM, Ben Vinnerd wrote: You could try putting it

Re: Quarantining crap HTML?

2013-05-21 Thread dave . lambley
bject: Quarantining crap HTML? Sent: 21 May 2013 12:31 In keeping with the spirit of the list, this isn't directly a perl question but it might be part of the solution. I'm picking up HTML from another site, and that HTML is pretty crappy. Is there any way of quarantining it so it doesn&

Re: Quarantining crap HTML?

2013-05-21 Thread Ruud H.G. van Tol
On 21/05/2013 13:31, Dave Hodgkinson wrote: In keeping with the spirit of the list, this isn't directly a perl question but it might be part of the solution. I'm picking up HTML from another site, and that HTML is pretty crappy. Is there any way of quarantining it so it doesn't bugger up the r

Re: Quarantining crap HTML?

2013-05-21 Thread Ben Vinnerd
What if the HTML contains single or double quotes? On 21 May 2013 13:14, Philip Skinner wrote: > You can specify the content of an iframe using a javascript call in the > src: > > > > > On 05/21/2013 01:57 PM, Ben Vinnerd wrote: > >> You could try putting it in (which doesn't support inline h

Re: Quarantining crap HTML?

2013-05-21 Thread Dave Cross
Quoting Dave Hodgkinson : In keeping with the spirit of the list, this isn't directly a perl question but it might be part of the solution. I'm picking up HTML from another site, and that HTML is pretty crappy. Is there any way of quarantining it so it doesn't bugger up the rest of the page?

Re: Quarantining crap HTML?

2013-05-21 Thread Philip Skinner
You can specify the content of an iframe using a javascript call in the src: On 05/21/2013 01:57 PM, Ben Vinnerd wrote: You could try putting it in (which doesn't support inline html, so you'd have to load it with src="/path/to/buggered_html_loader") On 21 May 2013 12:31, Dave Hodgkinson

Re: Quarantining crap HTML?

2013-05-21 Thread Ben Vinnerd
You could try putting it in (which doesn't support inline html, so you'd have to load it with src="/path/to/buggered_html_loader") On 21 May 2013 12:31, Dave Hodgkinson wrote: > In keeping with the spirit of the list, this isn't directly a perl question > but it might be part of the solution.

Re: Quarantining crap HTML?

2013-05-21 Thread Joel Bernstein
OK, so assuming (you didn't mention but it's sort of implied if you squint a bit) you mean you're inserting it into another HTML page, I wonder: a) in what way is it crappy? b) how are you inserting it? c) what are you trying to avoid? dodgy formatting? broken formatting? malicious code execution?

Re: Quarantining crap HTML?

2013-05-21 Thread Jérôme Étévé
What about parsing it with a lax XHTML parser and rendering it? On 21 May 2013 12:31, Dave Hodgkinson wrote: > In keeping with the spirit of the list, this isn't directly a perl question > but it might be part of the solution. > > I'm picking up HTML from another site, and that HTML is pretty cr

Quarantining crap HTML?

2013-05-21 Thread Dave Hodgkinson
In keeping with the spirit of the list, this isn't directly a perl question but it might be part of the solution. I'm picking up HTML from another site, and that HTML is pretty crappy. Is there any way of quarantining it so it doesn't bugger up the rest of the page?