Data storage allocation for what? The parsed text? Unless you have really large documents, it's simplest to just use Tika to parse, then build a Lucene Document, then index it.
With really large documents it's possible to make a Lucene Field using Reader, so Lucene incrementally reads the characters, and Tika's ParsingReader to create the Reader, but I suspect this won't save that much memory in general (many parsers require loading the full binary document in RAM, I believe). Mike McCandless http://blog.mikemccandless.com On Sun, Feb 24, 2013 at 9:36 PM, <[email protected]> wrote: > .. but how to streamline the two, lucene and tika, through some internal > interface, that avoids the whole piece of data storage allocation ? > > ----- 原始邮件 ----- > 发件人:Michael McCandless <[email protected]> > 收件人:[email protected], [email protected] > 主题:Re: hello , how to utilize tika inside lucene ? > 日期:2013年02月25日 03点55分 >
