Somewhere, you have to create the document XML you
send to SOLR. Just add the calculated data to
your new field there...

HTH
Erick

On Fri, Jan 8, 2010 at 9:30 AM, MitchK <mitc...@web.de> wrote:

>
> Okay, you're right. It really would be cleaner, if I do such stuff in the
> code which populates the document to Solr.
>
> Is there a way to prepare a document the described way with Lucene/Solr,
> before I analyze it?
> My use case is to categorize several documents in an automatic way, which
> includes that I have to "create" data from the given input doing some
> information retrieval.
>
> The problem is I am really new to Solr and Lucene - as you can see - and I
> do not know, whether there are some classes that fit my needs.
>
> Any idea?
>
>
> Erick Erickson wrote:
> >
> > Well, I'd approach either of these use cases
> > by simply performing my computations on
> > the input and storing the result in another
> > (non-indexed unless I wanted to search it)
> > field. This wouldn't happen in the Analyzer,
> > but in the code that populated the document
> > fields.....
> >
> > Which is a much cleaner solution IMO than creating
> > some sort of "index this but store that" capability.
> > The purpose of analysis is to produce *searchable*
> > tokens after all.
> >
> > But we're getting into angels dancing on pins here. Do
> > you actually have a use case you're trying to implement
> > or is this mostly theoretical?
> >
> > Erick
> >
> > On Thu, Jan 7, 2010 at 2:08 PM, MitchK <mitc...@web.de> wrote:
> >
> >>
> >> The difference between stored and indexed is clear now.
> >>
> >> You are right, if you are responsing only to "normal users".
> >>
> >> Use case:
> >> You got a stored field "The good, the bad and the ugly".
> >> And you got a really fantastic analyzer, which is doing some magic to
> >> this
> >> movie title.
> >> Let's say, the analyzer translates the title into md5 or into another
> >> abstract expression.
> >> Instead of doing the same magical function on the client's side again
> and
> >> again, he only needs to take the prepared data from your response.
> >>
> >> Another use case could be:
> >> Imagine you have got two categories: cheap and expensive and your
> >> document
> >> gots a title-, a label-, an owner- and a price-field.
> >> Imagine you would analyze, index and store them like you normally do and
> >> afterwards you want to set, whether the document belongs to the
> expensive
> >> item-group or not.
> >> If the price for the item is higher than 500$, it belongs to the
> >> expensive
> >> ones, otherwise not.
> >> I think, this would be a job for a special analyzer - and this only
> makes
> >> sense, if I also store the analyzed data.
> >>
> >> I think information retrieval is a really interesting use case.
> >>
> >>
> >> Erick Erickson wrote:
> >> >
> >> > What is your use case for "responding sometimes with the indexed
> >> value"?
> >> > Other than reconstructing a field that hasn't been stored, I can't
> >> think
> >> > of
> >> > one.
> >> >
> >> > I still think you're missing the point. Indexing and storing are
> >> > orthogonal operations that have (almost) nothing to do with each
> >> > other, for all that they happen at the same time on the same field.
> >> >
> >> > You never search against the stored data in a field. You *always*
> >> > search against the indexed data.
> >> >
> >> > Contrariwise, you never display the indexed form to the user, you
> >> > *always* show the stored data (unless you come up with
> >> > a really interesting use case).
> >> >
> >> > Step back and consider what happens when you index data,
> >> > it gets broken up all kinds of ways. Stop words are removed,
> >> > case may change, etc, etc, etc. It makes no sense to
> >> > then display this data for a user. Would you really like
> >> > to have, say a movie title "The Good, The Bad, and The
> >> > Ugly". Remove stopwords, puncuation and lowercase
> >> > and you index three tokens "good", "bad", "ugly".
> >> > Even if you reconstruct this field, the user would see
> >> > "good bad ugly". Bad, very bad.
> >> >
> >> > Yet I want to display the original title to the user in
> >> > response to searching on "ugly", so I need the
> >> > original, unanalyzed data.
> >> >
> >> > Perhaps it would help to think of it this way.
> >> > 1> take some data and index it in f1
> >> >     but do NOT store it in f1. Store it in f2
> >> >     but do NOT index it in f2.
> >> > 2> take that same data, index AND store
> >> >     it in f3.
> >> >
> >> > <1> is almost entirely equivalent to <2>
> >> > in terms of index resources.
> >> >
> >> > Practically though, <1> is harder to use,
> >> > because you have to remember
> >> > to use f1 for searching and f2 for getting
> >> > the raw data.
> >> >
> >> > HTH
> >> > Erick
> >> >
> >> > On Thu, Jan 7, 2010 at 12:11 PM, MitchK <mitc...@web.de> wrote:
> >> >
> >> >>
> >> >> Thank you, Ryan. I will have a look on lucene's material and luke.
> >> >>
> >> >> I think I got it. :)
> >> >>
> >> >> Sometimes there will be the need, to response on the one hand the
> >> value
> >> >> and
> >> >> on the other hand the indexed version of the value.
> >> >> How can I fullfill such needs? Doing copyfield on indexed-only
> fields?
> >> >>
> >> >>
> >> >>
> >> >> ryantxu wrote:
> >> >> >
> >> >> >
> >> >> > On Jan 7, 2010, at 10:50 AM, MitchK wrote:
> >> >> >
> >> >> >>
> >> >> >> Eric,
> >> >> >>
> >> >> >> you mean, everything is okay, but I do not see it?
> >> >> >>
> >> >> >>>> Internally for searching the analysis takes place and writes to
> >> the
> >> >> >>>> index in an inverted fashion, but the stored stuff is left
> alone.
> >> >> >>
> >> >> >> if I use an analyzer, Solr "stores" it's output two ways?
> >> >> >> One public output, which is similar to the original input
> >> >> >> and one "hidden" or internal output, which is based on the
> >> >> >> analyzer's work?
> >> >> >> Did I understand that right?
> >> >> >
> >> >> > yes.
> >> >> >
> >> >> > indexed fields and stored fields are different.
> >> >> >
> >> >> > Solr results show stored fields in the results (however facets are
> >> >> > based on indexed fields)
> >> >> >
> >> >> > Take a look at Lucene in Action for a better description of what is
> >> >> > happening.  The best tool to get your head around what is happening
> >> is
> >> >> > probably luke (http://www.getopt.org/luke/)
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> If yes, I have got another problem:
> >> >> >> I don't want to waste any diskspace.
> >> >> >
> >> >> > You have control over what is stored and what is indexed -- how
> that
> >> >> > is configured is up to you.
> >> >> >
> >> >> > ryan
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27065305.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27076795.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Reply via email to