On Sat, 13 Sep 2014, Mugat Gurkowsky wrote:
i am trying to use tika in combination with lucene to parse and index of very large xml-files. so far, without success, because of memory limitations. tika's BodyContentHandler seems to try to copy the whole content in memory, which doesn't work as files are several giga-bytes large.
It depends on what the BodyContentHandler is doing with the resulting content. Make sure whatever is downstream of it is doing streaming not buffering
Nick
