Patches item #1462498, was opened at 2006-04-01 00:56
Message generated for change (Comment added) made by gbrandl
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462498&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Rares Vernica (rvernica)
Assigned to: Nobody/Anonymous (nobody)
Summary: bug #1452246 and patch #1087808; sgmllib entities
Initial Comment:
Patch for bug #1452246 htmllib doesn't properly
substitute entities
Continuation of patch #1087808 sgmllib.SGMLParser does
not unescape attribute values; patch
Substitute entities in argument values
import htmllib
import formatter
import StringIO
s = StringIO.StringIO()
p =
htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter(s)))
p.feed('<img alt="<>&">')
print s.getvalue()
will now print '<>&' instead of '<>&'.
The patch modifies module sgmllib, class SGMLParser,
method parse_starttag. In this method, the entities are
substituted in the argument values. The substitutions
are based on existing property SGMLParser.entitydefs.
For parsing is uses the regular expression entityref.
Regarding the differences between this patch and patch
#1087808:
- use self.entitydefs to determine the set of entity
names that are supported;
- unknown entities references are left alone;
- the regular expression entityref is used to find
references;
- a documentation patch is not needed as the method
is Internal.
Regarding the fact that semicolon after the entity name
is not mandatory in SGML, the way entityref is defined
"< " will become "< ", while "<" will stay "<",
regardless of being an attribute value.
The patch also adds test cases in module
test/test_sgmllib.py, class SGMLParserTestCase, method
test_attr_values. In that method, the proper
substitution is tested.
Ray
----------------------------------------------------------------------
>Comment By: Georg Brandl (gbrandl)
Date: 2006-04-01 08:35
Message:
Logged In: YES
user_id=849994
Changed your patch a bit (only allowing entityrefs ending
with ';' and recognizing charrefs), added more tests and
docs and committed as rev. 43532.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1462498&group_id=5470
_______________________________________________
Patches mailing list
[email protected]
http://mail.python.org/mailman/listinfo/patches