Hi,

On Thu, Jun 16, 2011 at 8:55 AM, Charles <[email protected]> wrote:
> The problem was fixed by increasing the VM memory from 1 GB to 3 GB
> (intermediate sizes not explored, JAVA_OPTS fix attempts backed out) so it
> seems it really was a memory shortage despite top's and vmstat's
> re-assurance.  I wonder what triggered it.

It sounds unlikely for Tika to be using that much memory unless you're
processing some huge documents.

To better investigate the issue you could start your JVM with the
-XX:+HeapDumpOnOutOfMemoryError option, and inspect the heap dump that
gets created when an OOM error is encountered.

Alternatively, you can try identifying the troublesome document by
running a script like the following:

    for file in /path/to/documents/*; do
        echo $file
        java -Xmx100m -jar tika-app-0.9.jar $file > /dev/null
    done

The ForkParser feature introduced in Tika 0.9 can be used to run text
extraction in a background process so that a possible OOM error or
even a JVM crash won't affect your application.

BR,

Jukka Zitting

Reply via email to