Hello :-)

Using Xapian-Omega's omindex binary to run Tika on 400 files, Tika gives the error in the subject 247 times. The files triggering the error have extensions doc, pdf, ppt, rtf and
xls so the problem is probably not specific to the file type.

Running vmstat with a 1 second delay during the omindex run shows no swapping and consistently ~0.5GB (of 1 GB) free memory so the problem is not memory.

The bash ulimit command reported "unlimited" and /etc/security/limits.conf is all comments or empty lines.

Netsearching widely (most informative pages listed below sig) suggested adjusting Java memory spaces but neither export JAVA_OPTS='-Xms256m -Xmx512m' nor export JAVA_OPTS='-Xmx512m' before running omindex fixed the problem. I do not know what the defaults are.

Tika worked on this development system from installation on 31mar11 until it was last used on 14apr11. All system changes are logged but none of the changes since 14apr11 are obviously relevant. Tomcat 6 was installed for GeoServer and this did take ~1 GB virtual memory, perhaps triggering the problem, but it and MySQL have since been disabled in the boot scripts and the system rebooted. Tika is still working on the live system which is similar to the development system in terms of installed software and versions.

I wanted to try with Tika 0.9 but it failed bmp, jpeg and png parsing tests during installation by Maven. I do not know enough Java/Maven to see if the errors are related.

The OS is Debian Squeeze 64 bit running in a virtual machine -- hence the small sample of 400 files and the 1 GB memory -- running headless.

What to do for more analysis and hopefully a fix?

Best

Charles

*Good pages re "OutOfMemoryError/Out of swap space?*":
* JVM Lies: The OutOfMemory Myth:
http://www.codingthearchitecture.com/2008/01/14/jvm_lies_the_outofmemory_myth.html
* http://www.oracle.com/technetwork/java/javase/memleaks-137499.html#gbyvj
* Troubleshooting Guide for Java SE 6 with HotSpot VM:
  http://www.oracle.com/technetwork/java/javase/memleaks-137499.html#gbyvj

Reply via email to