Re: 回复：Re: hello , how to utilize tika inside lucene ?

Michael McCandless Mon, 25 Feb 2013 03:01:08 -0800

Data storage allocation for what?  The parsed text?

Unless you have really large documents, it's simplest to just use Tika
to parse, then build a Lucene Document, then index it.

With really large documents it's possible to make a Lucene Field using
Reader, so Lucene incrementally reads the characters, and Tika's
ParsingReader to create the Reader, but I suspect this won't save that
much memory in general (many parsers require loading the full binary
document in RAM, I believe).

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 24, 2013 at 9:36 PM,  <[email protected]> wrote:
>  .. but how to streamline the two, lucene and tika, through some internal
> interface, that avoids the whole piece of data storage allocation ?
>
> ----- 原始邮件 -----
> 发件人：Michael McCandless <[email protected]>
> 收件人：[email protected], [email protected]
> 主题：Re: hello , how to utilize tika inside lucene ?
> 日期：2013年02月25日 03点55分
>

Re: 回复：Re: hello , how to utilize tika inside lucene ?

Reply via email to