It's not ivory-tower theory, it's sheer pragmatism. Once you start trying to repair broken documents, you enter a world where everything is gray (not happy and shiny, believe me), and there's no guarantee that one XML processor will behave the same as another. It's unlikely that any two will implement the same heuristics, and implementors end up trying to emulate each other and pointing fingers at each other. We've been down that road with HTML, and it was (is) an ugly mess. It really just makes sense to require input to be well formed, so the result of any processing is predictable. Whether you require it to be valid as well is up to you.
I think the blame belongs with your clients' authoring tools, which should help them produce well-formed documents. On the other hand, if you want to work around the presence of certain illegal characters, you could (as a service) translate them into character entities before handing them off to a parser. (I wouldn't, though. You'll come to regret it, perhaps when you have to accept a UTF-16 document, and a byte of data is no longer even roughly equivalent to a character.) > -----Original Message----- > From: Marc Seldin [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 23, 2004 11:18 AM > To: [EMAIL PROTECTED] > Subject: Making Xerces less strict? > > While in theory it makes good sense to have a strict parser, > in the world of > my clients getting them to make generally well formed > documents is difficult > enough. I've been getting a number of complaints about the "invalid > character" error; usually they've included some control > character, like an > ASCII 18. (http://xml.apache.org/xerces-c/faq-parse.html#faq-20) > > Is there any way to make the xerces parser less strict? If > not, I'd like to > put in a feature request for this. It would really make the > world a happier, > shinier place. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]