There's also, of course, tika-server. 😊 No matter the method, it is always best to isolate Tika to its own jvm, vm or m.
-----Original Message----- From: Charlie Hull [mailto:char...@flax.co.uk] Sent: Monday, April 9, 2018 4:15 PM To: solr-user@lucene.apache.org Subject: Re: How to use Tika (Solr Cell) to extract content from HTML document instead of Solr's MostlyPassthroughHtmlMapper ? As a bonus here's a Dropwizard Tika wrapper that gives you a Tika web service https://github.com/mattflax/dropwizard-tika-server written by a colleague of mine at Flax. Hope this is useful. Cheers Charlie