RE: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

Allison, Timothy B. Thu, 12 Apr 2018 09:26:26 -0700

There's also, of course, tika-server. 😊

No matter the method, it is always best to isolate Tika to its own jvm, vm or m.

-----Original Message-----
From: Charlie Hull [mailto:char...@flax.co.uk] 
Sent: Monday, April 9, 2018 4:15 PM
To: solr-user@lucene.apache.org
Subject: Re: How to use Tika (Solr Cell) to extract content from HTML document 
instead of Solr's MostlyPassthroughHtmlMapper ?

As a bonus here's a Dropwizard Tika wrapper that gives you a Tika web service 
https://github.com/mattflax/dropwizard-tika-server written by a colleague of 
mine at Flax. Hope this is useful.

Cheers

Charlie

RE: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ?

Reply via email to