Data storage allocation for what?  The parsed text?

Unless you have really large documents, it's simplest to just use Tika
to parse, then build a Lucene Document, then index it.

With really large documents it's possible to make a Lucene Field using
Reader, so Lucene incrementally reads the characters, and Tika's
ParsingReader to create the Reader, but I suspect this won't save that
much memory in general (many parsers require loading the full binary
document in RAM, I believe).

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 24, 2013 at 9:36 PM,  <[email protected]> wrote:
>  .. but how to streamline the two, lucene and tika, through some internal
> interface, that avoids the whole piece of data storage allocation ?
>
> ----- 原始邮件 -----
> 发件人:Michael McCandless <[email protected]>
> 收件人:[email protected], [email protected]
> 主题:Re: hello , how to utilize tika inside lucene ?
> 日期:2013年02月25日 03点55分
>

Reply via email to