Problem is, and as I recall from having looked into this a couple of years ago, there's a gigantic amount of malformed html out there. So lots and lots of stuff will fail TAG(). Perhaps the default behavior should be to pass junk through without choking.
In the meantime, which do you recommend: Beautiful Soup or HTMLParser? I've used HTMLParser in the past. On Sep 13, 12:08 am, mdipierro <[email protected]> wrote: > Good idea but complex to do. Eventually ti will be done... > > On Sep 12, 10:46 pm, weheh <[email protected]> wrote: > > > For instance, I think this will get ticketed by TAG(x).flatten(): > > > x='test email <[email protected]>' > > > How about treating it like Mozilla, and strip off the '<' and '>'? > > > I'm sure there are other cases. Hopefully, not a zillion. > > > On Sep 12, 11:32 pm, weheh <[email protected]> wrote: > > > > I'm thinking it would be useful if TAG(...).flatten() had a flag it > > > could set, something like _assert=False, that would cause it *not* to > > > assert when it finds a malformed tag. In this mode, it would skip over > > > the malformed tag until seeing the next tag close "/>". It would > > > inject an error message in the text where it cut out the malformed > > > tag. What do you think? > >

