I agree with you, Duncan, the tidy up can not be much more aggressive by default. And Kupu probably does the best possible job there.

Now, the button "Clean this up" is a good idea, I think. Did you get started on this? I am happy to help if you do develop that feature.

Also, another option for users that need to convert a lot of Word documents is, of course, WebDAV + PortalTransform.

Cheers

Cyrille

Duncan Booth wrote:
Cyrille Bonnet wrote:


Daniel Dekany wrote:

BTW, anybody has found a solution for fixing HTML copy-pasted from
Microsoft Word (mostly 2000/XP)? Lot of users has MS Word, and the
HTML pasted from it is a CSS killer mess. I tried mxTidy but it
didn't improved substantially the HTML. So how do you guys do it? I
have looked after solutions for Epoz, but didn't found any. But I
don't stick to Epoz... if there is a solution already for Kupu (is
Kupu already recommended over Epoz anyway?). Certainly the solution
would be an Epoz post-tidy Python script, but I didn't found any for
Word tidying. (However, the ideal would be if the HTML is tidied
right on the client when it pastes it in -- thus user would really
get what it sees, i.e. the HTML wouldn't be changed when he saves it.
That effect is really evil.)



As Shane pointed out, there is a tidy up in Kupu. However, in my experience, it is not a very good tidy up (if I remember correctly, a lot of tags are still there after the tidy up).



Unfortunately there is a fine line between tidying up the cruft pasted from Word, and not stripping out things which might actually have been entered legitimately. I think Kupu does this pretty well (but then I'm a bit biased), but without any way to detect that the user is pasting from Word I don't see how much more could be stripped.


So far as I know the only thing which doesn't really get stripped from the pasted Word text are the mso classnames. These can be manually blacklisted, but I never got round to producing a definitive blacklist.

One of my thoughts is to provide a separate 'clean this up' button which would apply a more aggressive tidy-up than the one when saving. Also, I agree that only applying the tidy on save is bad, but there isn't a cross-
browser way to detect a paste, and applying the cleanup on a large document every time you cut/paste one word wouldn't be nice either.


Suggestions for improvements are most welcome.

P.S. It isn't just pasting bad HTML which is a problem: some Microsoft applications supply RTF on the clipboard but not HTML and it turns out that if you paste RTF into IE it generates seriously invalid HTML with a totally weird and corrupted DOM. That is another area where I think the cleanup code finally does a passable job but not yet a perfect one.

_______________________________________________
Zope maillist - Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )



_______________________________________________
Zope maillist - Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
** No cross posts or HTML encoding! **
(Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )

Reply via email to