Yonik/Erick,
We are building a custome Search which is to be done in 2 parts executed at
different points of time. As a result of it, the first step we want tokenize
the information and store it, which we want to retrieve a later point of
time for further processing and then store it back into the index. This
processed information is what we want the users to be able to search on.
Regards,
Eswar
On Dec 20, 2007 8:15 PM, Erick Erickson [EMAIL PROTECTED] wrote:
I think that what Yonik wants is a higher-level response.
*Why* do you want to process the tokens later? What is the
use case you're trying to satisfy?
Best
Erick
On Dec 20, 2007 1:37 AM, Rishabh Joshi [EMAIL PROTECTED] wrote:
What are you trying to do with the tokens?
Yonik, we wanted a tokenizer that would tokenize the content of a
document
as per our requirements, and then store them in the index so that, we
could
retrieve those tokens at search time, for further processing in our
application.
Regards,
Rishabh
On Dec 19, 2007 10:02 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
On Dec 19, 2007 10:59 AM, Rishabh Joshi [EMAIL PROTECTED] wrote:
I have created my own Tokenizer and I am indexing the documents
using
the
same.
I wanted to know if there is a way to retrieve the tokens (created
by
my
custom tokenizer) from the index.
If you want the tokens in the index, see the luke request handler.
If you want the tokens for a specific document, it's more
complicated... Lucene maintains an *inverted* index... terms point to
documents, so by default there is no way to ask for all of the terms
in a certain document. One could ask lucene to store the terms for
certain fields (called term vectors), but that requires extra space in
the index, and solr doesn't yet have a way to ask that they be
retrieved.
What are you trying to do with the tokens?
-Yonik