For my application, it would be very helpful if there was a parser mode 
which did not result in an error if an undeclared entity is 
encountered.  The specific scenario where this would be particularly 
helpful is in the recognition of a XML document.  I would like to parse the 
input XML once in a permissive mode that would allow me to examine the 
result without undeclared entities causing a failure.  It might be a XML 
fragment, without a DOCTYPE, that contains entities.  After I had examined 
the document, then I could continue on to use a specific DTD with the 
document if I recognized it (using a heuristic).  The undeclared entities 
would be declared in a second parse with a reused validator corresponding 
to my DTD containing the entities.

The alternative is to preprocess the input to the parser in such a way that 
entity references are removed.  The problem with this is that I would need 
to handle lots of issues which the parser has already solved, like 
character encodings.  Another option is to recognize the document without 
using the parser, again I would lose all the services of the parser.  I 
would prefer to user the parser for the initial scan of the input.

I have managed to add a option to the XMLScanner such that it will simply 
write an unknown entity reference into the output.  If the parser saw 
"&notdeclared;" and notdeclared is not a declared entity, it would simply 
pass through as "&notdeclared;"  and coninue without error.  This is 
similar to what many web browsers do when they encounter an unknown 
character entity.

This of course would not be the default mode of the parser.  I have seen 
some complaints about this behavior on the list in the past that lead me to 
believe that this might be a useful addition to the parser.  It's true that 
with this option enabled the parser would accept input which is not well 
formed, but recognizing XML fragments with entity references would be much 
less difficult using this approach.

Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to