I have watching memory leak with some paser used by Tika, for example when i'am parsing a large amount of files (about 200 000) with my application. Why there is memory leak ? Because, some closable objects as Stream for example are not closed when parsing Exceptions occurs.

For me the best way to fork Tika parsing, is using another system's Process (not a Java Thread) an rebuild periodically this Process to continue on my parsing Work.

For example, i can have a parsing (content and metadata) with about one millions files without any problems of stability for the main application with this trick.

I have publish my work freely on the Web (Apache license 2.0) for my application USB-Search here : http://sourceforge.net/projects/usb-search/ feel free to download
sources and watch some ideas for a parsing process isolation with Tika.

BR

Raphaël GUYOT,
Software engineer at French
Signal school ETRS,
Rennes, France.

Reply via email to