Re: [Digester] HTML entity decoding?

Paul Libbrecht Wed, 15 Apr 2009 15:24:42 -0700

Hello Otis,

For the second form you'll need to hook a DTD to do so. A DTD declaration in your header pointing to a DTD which defines these entities I am no expert in Digester but I believe that it is the only way to do so. At least according to the XML specs.


Here's a text pointing to such a DTD:
  
http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities

Note that opening the file with a validating parser will certainly grumble about all sorts of undeclared elements, this is ok, it does not prevent parsing but is, indeed, a validation error.

However you get the entity-expansion.

Note that using the first form, which contains an *escaped* entity, there's nothing to do! You'd have to match them manually ("re- entrantly") into a parser that parses entities properly.


paul

PS: I would feel lucky not to have been blown away the XML parsing in the second case as a normal XML parser does: missing entity declaration means unparseable XML while missing element declaration means much less a dangerous thing.


Le 16-avr.-09 à 00:06, Otis Gospodnetic a écrit :


Hello,

I'm using Digester 2.0 and trying to process XML that
may include HTML entities and trying to get Digester to decode them
when parsing.

For example, my XML contains:
 <name><![CDATA[Gr&uuml;ber]]></name>

Currently, Digester is parses this as:  Gr&uuml;ber

But what I am really after is "Grüber", so I am looking for a way to get this ü entity decoded by Digester.

How do I tell Digester to decode HTML entities?

Also, if I don't use CDATA, like this:
 <name>Gr&uuml;ber</name>

Digester gives me: Grber

smime.p7s
Description: S/MIME cryptographic signature

Re: [Digester] HTML entity decoding?

Reply via email to