In my case - initially at least - the tika server would be on the same physical
server as the application needing to extract text from the documents that are
uploaded to it. So network traffic is not so much an issue.

The main advantages I can see are:

1. Speed - the server is up and running all the time, so can process a document
immediately. Obviously with many requests coming fast, then they could get
backed up in a queue, but I'm hoping that queue would clear faster.

2. Memory usage. By running the server, the memory usage can be more easily
controlled. It would use memory all the time it was running, but that would be
in a process completely independent of the web application that needs the
documents processed. If the web application needed to run a command line script
every time, with a 25M JAR file (before it is decompressed) and a Java run-time,
and the document being processed in memory, then I can see all sorts of memory
issues getting in the way of its operation.

-- Jason


On 01/07/2012 13:28, Jukka Zitting wrote:
> Hi,
>
> On Sun, Jul 1, 2012 at 2:17 PM, Mark Kerzner <[email protected]> wrote:
>> Out of curiosity, what would be the performance benefit of server vs
>> initialising every time?
> You replace JVM startup overhead with that of a transmitting the
> document over a network connection. How that affects overall system
> performance depends on your deployment details.
>
> BR,
>
> Jukka Zitting


Reply via email to