Tika as server/daemon for content, metadata and language

Marian Steinbach Fri, 24 Jun 2011 03:32:39 -0700

Hi!

I have tested the Tika client for extraction of content, metadata and
language and I'm really happy with the results.


For performance reasons when extracting larger numbers of documents I
think it would be worthwhile to avoid starting the client three times
for each document, which also includes starting the virtual machine
etc.

I was thinking about having Tika running as a daemon and pushing
document path info to it, in order to get the metadata, content and
language as a response.

Is there a best practice for this? Maybe a servlet/jsp solution? Does
the current Tika release include an out of the box solution for that?

(I only found https://issues.apache.org/jira/browse/TIKA-169 on this
topic, which is pretty old and has "won't fix" status.)

Thanks!

Marian

Tika as server/daemon for content, metadata and language

Reply via email to