On 12/01/2010 10:24, Henrik K wrote:

>>>> Presently it renders them as plain text. I'm fully aware of the
>>>> potential problems with it. Ideally I'd like to be able to render
>>>> those parts as HTML, but I need to be 100% sure that I've stripped
>>>> out anything dangerous (including embedded remote content by
>>>> default) first. It's on the "ToDo List" page.
>>>
>>> Nice job Mike! :)
>>>
>>> I wrestled with that same issue when I added direct viewing of HTML
>>> content to my offline analysis/FP-pipeline/MassChecks tool.
>>>
>>> Originally, I was using an ActiveX wrapper around IE, which (of
>>> course) made me nervous.  I added some VERY simple, crude tag
>>> stripping (script, iframe, style), but was never happy with it.
>>> I ended up switching to an open source HTML rendering component
>>> which :) lacked support for all the scary stuff.
>>>
>>> Whatever you decide to do, please do post more about it, and q'pla!
>>
>> I shall. There are a multitude of modules on cpan for fixing up html and  
>> stripping out tags. I just need to find time to test them. I've got to  
>> figure out how to "cleanse" the CSS as well. Eg, you can execute  
>> javascript from CSS with stuff like:  
>> background:url("javascript:someFunction();")
> 
> IMO whatever you do, there will always be some hole to be found. Your only
> safe option is to render the HTML into image and display that. It will also
> be always consistent and not depend on browser version.

That was a good suggestion and something I hadn't considered. I've
updated Spamalyser to generate PDFs from HTML parts using the WebKit
rendering engine and QT. So the HTML should look the same as on any
Webkit based user agent. From my tests so far, it's an accurate
representation of what you see in your email client. It handles remote
content like images and CSS fine, and also content attached to the email
with Content-ID headers references by cid URIs. Here's a prime example:
http://spamalyser.com/v/jfv3iz0l/mime#part_1.2

PDF is better than an image because it allows you to maintain the links
in the document. A PNG "thumbnail" generated from the PDF is displayed
along side text/html parts. Clicking that preview image takes you to the
PDF.

I've also tweaked some of the styling so the headers are easier to read.

I've also set up a mailman based mailing list which is linked to from
http://spamalyser.com/ so if anyone wants to discuss anything further to
do with Spamalyser the discussion should probably move there. Any
further announcements will happen there, not here.

-- 
Mike Cardwell    : UK based IT Consultant, LAMP developer, Linux admin
Cardwell IT Ltd. : UK Company - http://cardwellit.com/       #06920226
Technical Blog   : Tech Blog  - https://secure.grepular.com/blog/
Spamalyser       : Spam Tool  - http://spamalyser.com/

Reply via email to