On Sat, 13 Sep 2014, Mugat Gurkowsky wrote:
i am trying to use tika in combination with lucene to parse and index of very large xml-files. so far, without success, because of memory limitations. tika's BodyContentHandler seems to try to copy the whole content in memory, which doesn't work as files are several giga-bytes large.

It depends on what the BodyContentHandler is doing with the resulting content. Make sure whatever is downstream of it is doing streaming not buffering

Nick

Reply via email to