Re: No Analyzer, tokenizer or stemmer works at Solr

MitchK Thu, 07 Jan 2010 11:09:12 -0800

The difference between stored and indexed is clear now.

You are right, if you are responsing only to "normal users".


Use case:
You got a stored field "The good, the bad and the ugly".
And you got a really fantastic analyzer, which is doing some magic to this
movie title.
Let's say, the analyzer translates the title into md5 or into another
abstract expression.
Instead of doing the same magical function on the client's side again and
again, he only needs to take the prepared data from your response.

Another use case could be:
Imagine you have got two categories: cheap and expensive and your document
gots a title-, a label-, an owner- and a price-field.
Imagine you would analyze, index and store them like you normally do and
afterwards you want to set, whether the document belongs to the expensive
item-group or not.
If the price for the item is higher than 500$, it belongs to the expensive
ones, otherwise not.
I think, this would be a job for a special analyzer - and this only makes
sense, if I also store the analyzed data.

I think information retrieval is a really interesting use case.


Erick Erickson wrote:
> 
> What is your use case for "responding sometimes with the indexed value"?
> Other than reconstructing a field that hasn't been stored, I can't think
> of
> one.
> 
> I still think you're missing the point. Indexing and storing are
> orthogonal operations that have (almost) nothing to do with each
> other, for all that they happen at the same time on the same field.
> 
> You never search against the stored data in a field. You *always*
> search against the indexed data.
> 
> Contrariwise, you never display the indexed form to the user, you
> *always* show the stored data (unless you come up with
> a really interesting use case).
> 
> Step back and consider what happens when you index data,
> it gets broken up all kinds of ways. Stop words are removed,
> case may change, etc, etc, etc. It makes no sense to
> then display this data for a user. Would you really like
> to have, say a movie title "The Good, The Bad, and The
> Ugly". Remove stopwords, puncuation and lowercase
> and you index three tokens "good", "bad", "ugly".
> Even if you reconstruct this field, the user would see
> "good bad ugly". Bad, very bad.
> 
> Yet I want to display the original title to the user in
> response to searching on "ugly", so I need the
> original, unanalyzed data.
> 
> Perhaps it would help to think of it this way.
> 1> take some data and index it in f1
>     but do NOT store it in f1. Store it in f2
>     but do NOT index it in f2.
> 2> take that same data, index AND store
>     it in f3.
> 
> <1> is almost entirely equivalent to <2>
> in terms of index resources.
> 
> Practically though, <1> is harder to use,
> because you have to remember
> to use f1 for searching and f2 for getting
> the raw data.
> 
> HTH
> Erick
> 
> On Thu, Jan 7, 2010 at 12:11 PM, MitchK <mitc...@web.de> wrote:
> 
>>
>> Thank you, Ryan. I will have a look on lucene's material and luke.
>>
>> I think I got it. :)
>>
>> Sometimes there will be the need, to response on the one hand the value
>> and
>> on the other hand the indexed version of the value.
>> How can I fullfill such needs? Doing copyfield on indexed-only fields?
>>
>>
>>
>> ryantxu wrote:
>> >
>> >
>> > On Jan 7, 2010, at 10:50 AM, MitchK wrote:
>> >
>> >>
>> >> Eric,
>> >>
>> >> you mean, everything is okay, but I do not see it?
>> >>
>> >>>> Internally for searching the analysis takes place and writes to the
>> >>>> index in an inverted fashion, but the stored stuff is left alone.
>> >>
>> >> if I use an analyzer, Solr "stores" it's output two ways?
>> >> One public output, which is similar to the original input
>> >> and one "hidden" or internal output, which is based on the
>> >> analyzer's work?
>> >> Did I understand that right?
>> >
>> > yes.
>> >
>> > indexed fields and stored fields are different.
>> >
>> > Solr results show stored fields in the results (however facets are
>> > based on indexed fields)
>> >
>> > Take a look at Lucene in Action for a better description of what is
>> > happening.  The best tool to get your head around what is happening is
>> > probably luke (http://www.getopt.org/luke/)
>> >
>> >
>> >>
>> >> If yes, I have got another problem:
>> >> I don't want to waste any diskspace.
>> >
>> > You have control over what is stored and what is indexed -- how that
>> > is configured is up to you.
>> >
>> > ryan
>> >
>> >
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27065305.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: No Analyzer, tokenizer or stemmer works at Solr

Reply via email to