Re: Tika as server/daemon for content, metadata and language

Mr Havercamp Fri, 24 Jun 2011 04:03:16 -0700

We've had a great deal of success running Tika from Solr server as adocument extractor (I believe Solr refers to it as Solr Cell).


http://wiki.apache.org/solr/ExtractingRequestHandler


Cheers


Hayden


On 24/06/11 18:31, Marian Steinbach wrote:

Hi!

I have tested the Tika client for extraction of content, metadata and
language and I'm really happy with the results.

For performance reasons when extracting larger numbers of documents I
think it would be worthwhile to avoid starting the client three times
for each document, which also includes starting the virtual machine
etc.

I was thinking about having Tika running as a daemon and pushing
document path info to it, in order to get the metadata, content and
language as a response.

Is there a best practice for this? Maybe a servlet/jsp solution? Does
the current Tika release include an out of the box solution for that?

(I only found https://issues.apache.org/jira/browse/TIKA-169 on this
topic, which is pretty old and has "won't fix" status.)

Thanks!

Marian

Re: Tika as server/daemon for content, metadata and language

Reply via email to