Cyrille Bonnet wrote: > Daniel Dekany wrote: >> BTW, anybody has found a solution for fixing HTML copy-pasted from >> Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the >> HTML pasted from it is a CSS killer mess. I tried mxTidy but it >> didn't improved substantially the HTML. So how do you guys do it? I >> have looked after solutions for Epoz, but didn't found any. But I >> don't stick to Epoz... if there is a solution already for Kupu (is >> Kupu already recommended over Epoz anyway?). Certainly the solution >> would be an Epoz post-tidy Python script, but I didn't found any for >> Word tidying. (However, the ideal would be if the HTML is tidied >> right on the client when it pastes it in -- thus user would really >> get what it sees, i.e. the HTML wouldn't be changed when he saves it. >> That effect is really evil.) >> >> > As Shane pointed out, there is a tidy up in Kupu. However, in my > experience, it is not a very good tidy up (if I remember correctly, a > lot of tags are still there after the tidy up). > Unfortunately there is a fine line between tidying up the cruft pasted from Word, and not stripping out things which might actually have been entered legitimately. I think Kupu does this pretty well (but then I'm a bit biased), but without any way to detect that the user is pasting from Word I don't see how much more could be stripped.
So far as I know the only thing which doesn't really get stripped from the pasted Word text are the mso classnames. These can be manually blacklisted, but I never got round to producing a definitive blacklist. One of my thoughts is to provide a separate 'clean this up' button which would apply a more aggressive tidy-up than the one when saving. Also, I agree that only applying the tidy on save is bad, but there isn't a cross- browser way to detect a paste, and applying the cleanup on a large document every time you cut/paste one word wouldn't be nice either. Suggestions for improvements are most welcome. P.S. It isn't just pasting bad HTML which is a problem: some Microsoft applications supply RTF on the clipboard but not HTML and it turns out that if you paste RTF into IE it generates seriously invalid HTML with a totally weird and corrupted DOM. That is another area where I think the cleanup code finally does a passable job but not yet a perfect one. _______________________________________________ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )