I think the issue here is that DIH uses Woodstox "BasicStreamReader" (see http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/sr/BasicStreamReader.html) which has only minimal DTD support. It might be best to use ValidatingStreamReader (http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/sr/ValidatingStreamReader.html) instead.

I think you could get this by requesting a validating XmlReader; that's a setting that's exposed at the factory level that returns a parser (ie an XmlReader). But then you would probably also get validation turned on, which might not be so great in all cases. Probably should be a user setting for XPathEntityProcessor somewhere?

-Mike

On 07/10/2012 07:10 PM, Chris Hostetter wrote:
: Somebody any idea? Solr seems to ignore the DTD definition and therefore
: does not understand the entities likeü orä that are defined in
: dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
: definition?

Solr is just utilizing the builtin java XML parser for this, so there's
nothing you can tell solr to "consider the DTD" but it is odd that this
isn't working by default with java's parser -- i supsect there is some
"hint" XPathEntityProcessor should be giving hte parser to ask it to look
at these ENTITY declarations.

I've filed a Jira issue to try and track this (and included a test case)
but unfortunately i don't relaly know what the fix is...

https://issues.apache.org/jira/browse/SOLR-3614



-Hoss

Reply via email to