Malthe Borch <mbo...@gmail.com> added the comment: Perhaps we can use ``lxml`` to extract the locations (string start- and end- ranges) for the ``<img>`` tags and then simply use regex matching on those.
This way, the original document isn't changed, but we don't have the pitfalls of heuristic. __________________________________ Repoze Bugs <b...@bugs.repoze.org> <http://bugs.repoze.org/issue103> __________________________________ _______________________________________________ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev