...@gmail.com]
Sent: Friday, July 11, 2014 1:38 PM
To: user@tika.apache.org
Subject: Re: How to index the parsed content effectively
Hi Tim, All.
On 02/07/14 14:32, Allison, Timothy B. wrote:
Hi Sergey,
I'd take a look at what the DataImportHandler in Solr does. If you want
to store
Hi Tim, All.
On 02/07/14 14:32, Allison, Timothy B. wrote:
Hi Sergey,
I'd take a look at what the DataImportHandler in Solr does. If you want to
store the field, you need to create the field with a String (as opposed to a
Reader); which means you have to have the whole thing in memory.
On Jul 2, 2014, at 5:27am, Sergey Beryozkin sberyoz...@gmail.com wrote:
Hi All,
We've been experimenting with indexing the parsed content in Lucene and
our initial attempt was to index the output from
ToTextContentHandler.toString() as a Lucene Text field.
This is unlikely to be
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
If you want to have a try, we created a crawling Tika parser, which gives
recursive, incremental
crawing capabilities to Tika. There we also implemented a handler as a
decorator that writes into
a Lucene index.
Checkout 'Create a Lucene index'
Hi,
On 02/07/14 13:54, Ken Krugler wrote:
On Jul 2, 2014, at 5:27am, Sergey Beryozkin sberyoz...@gmail.com
mailto:sberyoz...@gmail.com wrote:
Hi All,
We've been experimenting with indexing the parsed content in Lucene and
our initial attempt was to index the output from
Hi Sergey,
I'd take a look at what the DataImportHandler in Solr does. If you want to
store the field, you need to create the field with a String (as opposed to a
Reader); which means you have to have the whole thing in memory. Also, if
you're proposing adding a field entry in a
Hi Tim
Thanks for sharing your thoughts. I find them very helpful,
On 02/07/14 14:32, Allison, Timothy B. wrote:
Hi Sergey,
I'd take a look at what the DataImportHandler in Solr does. If you want to
store the field, you need to create the field with a String (as opposed to a
Reader);
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
another aspect is, if you index such large documents, you also recieve these
documents inside your
search results, which is then again a bit ambigous for a user (if there is one
in the use case).
The search problem is only partially solved in this
Hi
On 02/07/14 17:32, Christian Reuschling wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
another aspect is, if you index such large documents, you also recieve these
documents inside your
search results, which is then again a bit ambigous for a user (if there is one
in the use case).