On Thu, 7 Aug 2014, Bratislav Stojanovic wrote:
This was exactly what I was afraid of...you see, I have to extract thousands and thousands of documents and calling java command *three times* for each of them is highly inefficient.
The Tika App is largely intended for testing, debugging, demos and light use from non-Java environments. It was never really intended for very heavy use
I want to keep tika in memory somehow and in a single VM, not to instantiate new VM every time I need to extract something.
Have you thought about calling the Java code from C? It's not as bad as it used to be... What you want to do is pretty easy in Java, so that's one way to tackle it
Otherwise, might be best to look into adding your own custom CXF endpoint to the tika server, to return everything you need in one go.
Nick
