Re: How to:- Extending Tika within Solr

Jan Høydahl Sat, 25 Jul 2015 14:28:20 -0700

Moving discussion from dev to user-list (CC to Aditya in case you’re not on the 
user list)


Since you define a new parser, you should be able to simply drop your parser’s 
JAR(s)
on the class path of Solr, without modifying core Tika at all, and it will be 
discovered
with the SPI-mechanism. 

You can quickly test by posting your .mx file to Solr with parameter 
&extractOnly=true, it will return extracted XML back

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jul. 2015 kl. 12.45 skrev Jan Høydahl <[email protected]>:
> 
> You can place a file called tika.config in your Solr core’s conf directory, 
> and Solr’s
> ExtractingRequestHandler will parse it. In there you can define your custom 
> new parser.
> 
> See 
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
>> 23. jul. 2015 kl. 22.03 skrev Aditya Dhulipala <[email protected]>:
>> 
>> Hi,
>> 
>> 
>> I have implemented a new file-type parser for TIka. It parses a custom
>> filetype (*.mx)
>> 
>> 
>> I would like my Solr instance to use my version of Tika with the mx parser.
>> 
>> I found this by a google search
>> 
>> https://lucidworks.com/blog/extending-apache-tika-capabilities/
>> 
>> But it seems to be over 5 years old. And the "download project" link is
>> broken
>> 
>> 
>> Can anybody help me with this?
>> 
>> 
>> I tried replaceing the tika-* jars within contrib/extraction/lib under
>> solr-root with my compiled tika-* jars. But that didn't work, Solr is still
>> using the old Tika binaries (i.e. without .mx parser). I know that my
>> tika-** jars are working correctly, because I can run them in GUI mode and
>> parse a test .mx file.
>> 
>> 
>> 
>> Thanks!
>> 
>> -
>> 
>> Aditya
>

Re: How to:- Extending Tika within Solr

Reply via email to