Brian Sutherland <> added the comment:

I also was bitten by this. Attached is the patch I am using, it includes and 
expands on the 
originally posted patches using dbaty's "more complex than it should be" method 
to keep the 
doctype out of html that didn't already have it.

Using regexes does start to seem like a nice idea after looking at all the 
gymnastics one has to 
go through with lxml.

