I think the issue here is that DIH uses Woodstox BasicStreamReader
(see
http://woodstox.codehaus.org/3.2.9/javadoc/com/ctc/wstx/sr/BasicStreamReader.html)
which has only minimal DTD support. It might be best to use
ValidatingStreamReader
I don't have any experience with DIH: maybe XPathEntityProcessor doesn't
use a true XML parser?
You might want to try passing your documents through xmllint -noent
(basically parse and reserialize) - that should inline the characters as
UTF-8?
On 07/09/2012 03:18 PM, Michael Belenki wrote:
: Somebody any idea? Solr seems to ignore the DTD definition and therefore
: does not understand the entities like uuml; or auml; that are defined in
: dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
: definition?
Solr is just utilizing the builtin java XML parser for
Somebody any idea? Solr seems to ignore the DTD definition and therefore
does not understand the entities like uuml; or auml; that are defined in
dtd. Is it the problem? If yes how can I tell SOLR to consider the DTD
definition?
On Fri, 06 Jul 2012 10:58:59 +0200, Michael Belenki