It's not ivory-tower theory, it's sheer pragmatism.  Once you start
trying to repair broken documents, you enter a world where everything is
gray (not happy and shiny, believe me), and there's no guarantee that
one XML processor will behave the same as another.  It's unlikely that
any two will implement the same heuristics, and implementors end up
trying to emulate each other and pointing fingers at each other.  We've
been down that road with HTML, and it was (is) an ugly mess.  It really
just makes sense to require input to be well formed, so the result of
any processing is predictable.  Whether you require it to be valid as
well is up to you.

I think the blame belongs with your clients' authoring tools, which
should help them produce well-formed documents.  On the other hand, if
you want to work around the presence of certain illegal characters, you
could (as a service) translate them into character entities before
handing them off to a parser.  (I wouldn't, though.  You'll come to
regret it, perhaps when you have to accept a UTF-16 document, and a byte
of data is no longer even roughly equivalent to a character.)

> -----Original Message-----
> From: Marc Seldin [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 23, 2004 11:18 AM
> To: [EMAIL PROTECTED]
> Subject: Making Xerces less strict?
> 
> While in theory it makes good sense to have a strict parser, 
> in the world of
> my clients getting them to make generally well formed 
> documents is difficult
> enough. I've been getting a number of complaints about the "invalid
> character" error; usually they've included some control 
> character, like an
> ASCII 18. (http://xml.apache.org/xerces-c/faq-parse.html#faq-20)
> 
> Is there any way to make the xerces parser less strict? If 
> not, I'd like to
> put in a feature request for this. It would really make the 
> world a happier,
> shinier place.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to