For my application, it would be very helpful if there was a parser mode
which did not result in an error if an undeclared entity is
encountered. The specific scenario where this would be particularly
helpful is in the recognition of a XML document. I would like to parse the
input XML once in a permissive mode that would allow me to examine the
result without undeclared entities causing a failure. It might be a XML
fragment, without a DOCTYPE, that contains entities. After I had examined
the document, then I could continue on to use a specific DTD with the
document if I recognized it (using a heuristic). The undeclared entities
would be declared in a second parse with a reused validator corresponding
to my DTD containing the entities.
The alternative is to preprocess the input to the parser in such a way that
entity references are removed. The problem with this is that I would need
to handle lots of issues which the parser has already solved, like
character encodings. Another option is to recognize the document without
using the parser, again I would lose all the services of the parser. I
would prefer to user the parser for the initial scan of the input.
I have managed to add a option to the XMLScanner such that it will simply
write an unknown entity reference into the output. If the parser saw
"¬declared;" and notdeclared is not a declared entity, it would simply
pass through as "¬declared;" and coninue without error. This is
similar to what many web browsers do when they encounter an unknown
character entity.
This of course would not be the default mode of the parser. I have seen
some complaints about this behavior on the list in the past that lead me to
believe that this might be a useful addition to the parser. It's true that
with this option enabled the parser would accept input which is not well
formed, but recognizing XML fragments with entity references would be much
less difficult using this approach.
Chris
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]