Magnus Lie Hetland wrote: > According to The Sgmlop Module Handbook [1], the handle_entityref() > callback is called for "malformed character entities". What does that > mean, exactly? What is a malformed character entity? I've tried > mis-spelling them (e.g., dropping the semicolon), but then they're > (quite naturally) treated as text/data, with handle_data(). I've tried > to use number that is too great, or (equivalently, it turns out) to > use names instead of numbers, such as &#foo;. In these cases, I only > get an exception, because the number is too high... > > So -- how can I produce a malformed character entity?
with sgmlop 1.1, the following script class entity_handler: def handle_entityref(self, entityref): print "ENTITY", repr(entityref) parser = sgmlop.XMLParser() parser.register(entity_handler()) parser.feed("&-10;&/()=?;") prints: ENTITY '-10' ENTITY '/()=?' > And another thing... For the case where a numeric reference is too > high (i.e. it can't be translated into a Unicode character) -- is it > possible to ignore it (or replace it, as with encode/decode)? if you don't do anything, it is ignored. if you specify a handle_charref hook, the part between &# and ; is passed to that method. if you have a handle_entityref hook, but no handle_charref, the part between & and ; is passed to handle_entityref. </F> -- http://mail.python.org/mailman/listinfo/python-list