Hi list!
I want to index my data using solr, but the data contains some special xml
tags, which Tika is not able to parse by itself. That's why I wrote my own
parser (which is similar to DcXMLParser). I checked that it works (I've
checked it in tika-app, where set variable parser to be instance of my
parser class). Now I want this to work when I post files to my Solr
server. So I've to do something with AutoDetectParser, basing on what Tika
website says ("Finally, you should explicitly tell the AutoDetectParser to
include your new parser. This step is only needed if you want to use the
AutoDetectParser functionality. If you figure out the correct parser in a
different way, it isn't needed). List your new parser in:
tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parse"
).
I've added my parser to the list but I don't know how to explicity tell
AutoDetectParser to use my parser. I've tried so many different ways...
The main problem is that I can't include
org.apache.tika.parser.xml.MyXmlParser, because this package is in
apache-parsers module and I can't add this dependency without making cyclic
reference between tika-parsers and tika-core.
Regards,
AN
P.S. Sorry if this is a stupid question, this is my first time using Apache
Tika...