I think the issue here is that DIH uses Woodstox "BasicStreamReader"
(see
http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/sr/BasicStreamReader.html)
which has only minimal DTD support. It might be best to use
ValidatingStreamReader
(http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/sr/ValidatingStreamReader.html)
instead.
I think you could get this by requesting a validating XmlReader; that's
a setting that's exposed at the factory level that returns a parser (ie
an XmlReader). But then you would probably also get validation turned
on, which might not be so great in all cases. Probably should be a user
setting for XPathEntityProcessor somewhere?
-Mike
On 07/10/2012 07:10 PM, Chris Hostetter wrote:
: Somebody any idea? Solr seems to ignore the DTD definition and therefore
: does not understand the entities likeü orä that are defined in
: dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
: definition?
Solr is just utilizing the builtin java XML parser for this, so there's
nothing you can tell solr to "consider the DTD" but it is odd that this
isn't working by default with java's parser -- i supsect there is some
"hint" XPathEntityProcessor should be giving hte parser to ask it to look
at these ENTITY declarations.
I've filed a Jira issue to try and track this (and included a test case)
but unfortunately i don't relaly know what the fix is...
https://issues.apache.org/jira/browse/SOLR-3614
-Hoss