Somewhere, you have to create the document XML you send to SOLR. Just add the calculated data to your new field there...
HTH Erick On Fri, Jan 8, 2010 at 9:30 AM, MitchK <mitc...@web.de> wrote: > > Okay, you're right. It really would be cleaner, if I do such stuff in the > code which populates the document to Solr. > > Is there a way to prepare a document the described way with Lucene/Solr, > before I analyze it? > My use case is to categorize several documents in an automatic way, which > includes that I have to "create" data from the given input doing some > information retrieval. > > The problem is I am really new to Solr and Lucene - as you can see - and I > do not know, whether there are some classes that fit my needs. > > Any idea? > > > Erick Erickson wrote: > > > > Well, I'd approach either of these use cases > > by simply performing my computations on > > the input and storing the result in another > > (non-indexed unless I wanted to search it) > > field. This wouldn't happen in the Analyzer, > > but in the code that populated the document > > fields..... > > > > Which is a much cleaner solution IMO than creating > > some sort of "index this but store that" capability. > > The purpose of analysis is to produce *searchable* > > tokens after all. > > > > But we're getting into angels dancing on pins here. Do > > you actually have a use case you're trying to implement > > or is this mostly theoretical? > > > > Erick > > > > On Thu, Jan 7, 2010 at 2:08 PM, MitchK <mitc...@web.de> wrote: > > > >> > >> The difference between stored and indexed is clear now. > >> > >> You are right, if you are responsing only to "normal users". > >> > >> Use case: > >> You got a stored field "The good, the bad and the ugly". > >> And you got a really fantastic analyzer, which is doing some magic to > >> this > >> movie title. > >> Let's say, the analyzer translates the title into md5 or into another > >> abstract expression. > >> Instead of doing the same magical function on the client's side again > and > >> again, he only needs to take the prepared data from your response. > >> > >> Another use case could be: > >> Imagine you have got two categories: cheap and expensive and your > >> document > >> gots a title-, a label-, an owner- and a price-field. > >> Imagine you would analyze, index and store them like you normally do and > >> afterwards you want to set, whether the document belongs to the > expensive > >> item-group or not. > >> If the price for the item is higher than 500$, it belongs to the > >> expensive > >> ones, otherwise not. > >> I think, this would be a job for a special analyzer - and this only > makes > >> sense, if I also store the analyzed data. > >> > >> I think information retrieval is a really interesting use case. > >> > >> > >> Erick Erickson wrote: > >> > > >> > What is your use case for "responding sometimes with the indexed > >> value"? > >> > Other than reconstructing a field that hasn't been stored, I can't > >> think > >> > of > >> > one. > >> > > >> > I still think you're missing the point. Indexing and storing are > >> > orthogonal operations that have (almost) nothing to do with each > >> > other, for all that they happen at the same time on the same field. > >> > > >> > You never search against the stored data in a field. You *always* > >> > search against the indexed data. > >> > > >> > Contrariwise, you never display the indexed form to the user, you > >> > *always* show the stored data (unless you come up with > >> > a really interesting use case). > >> > > >> > Step back and consider what happens when you index data, > >> > it gets broken up all kinds of ways. Stop words are removed, > >> > case may change, etc, etc, etc. It makes no sense to > >> > then display this data for a user. Would you really like > >> > to have, say a movie title "The Good, The Bad, and The > >> > Ugly". Remove stopwords, puncuation and lowercase > >> > and you index three tokens "good", "bad", "ugly". > >> > Even if you reconstruct this field, the user would see > >> > "good bad ugly". Bad, very bad. > >> > > >> > Yet I want to display the original title to the user in > >> > response to searching on "ugly", so I need the > >> > original, unanalyzed data. > >> > > >> > Perhaps it would help to think of it this way. > >> > 1> take some data and index it in f1 > >> > but do NOT store it in f1. Store it in f2 > >> > but do NOT index it in f2. > >> > 2> take that same data, index AND store > >> > it in f3. > >> > > >> > <1> is almost entirely equivalent to <2> > >> > in terms of index resources. > >> > > >> > Practically though, <1> is harder to use, > >> > because you have to remember > >> > to use f1 for searching and f2 for getting > >> > the raw data. > >> > > >> > HTH > >> > Erick > >> > > >> > On Thu, Jan 7, 2010 at 12:11 PM, MitchK <mitc...@web.de> wrote: > >> > > >> >> > >> >> Thank you, Ryan. I will have a look on lucene's material and luke. > >> >> > >> >> I think I got it. :) > >> >> > >> >> Sometimes there will be the need, to response on the one hand the > >> value > >> >> and > >> >> on the other hand the indexed version of the value. > >> >> How can I fullfill such needs? Doing copyfield on indexed-only > fields? > >> >> > >> >> > >> >> > >> >> ryantxu wrote: > >> >> > > >> >> > > >> >> > On Jan 7, 2010, at 10:50 AM, MitchK wrote: > >> >> > > >> >> >> > >> >> >> Eric, > >> >> >> > >> >> >> you mean, everything is okay, but I do not see it? > >> >> >> > >> >> >>>> Internally for searching the analysis takes place and writes to > >> the > >> >> >>>> index in an inverted fashion, but the stored stuff is left > alone. > >> >> >> > >> >> >> if I use an analyzer, Solr "stores" it's output two ways? > >> >> >> One public output, which is similar to the original input > >> >> >> and one "hidden" or internal output, which is based on the > >> >> >> analyzer's work? > >> >> >> Did I understand that right? > >> >> > > >> >> > yes. > >> >> > > >> >> > indexed fields and stored fields are different. > >> >> > > >> >> > Solr results show stored fields in the results (however facets are > >> >> > based on indexed fields) > >> >> > > >> >> > Take a look at Lucene in Action for a better description of what is > >> >> > happening. The best tool to get your head around what is happening > >> is > >> >> > probably luke (http://www.getopt.org/luke/) > >> >> > > >> >> > > >> >> >> > >> >> >> If yes, I have got another problem: > >> >> >> I don't want to waste any diskspace. > >> >> > > >> >> > You have control over what is stored and what is indexed -- how > that > >> >> > is configured is up to you. > >> >> > > >> >> > ryan > >> >> > > >> >> > > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> > http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27065305.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > View this message in context: > http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27076795.html > Sent from the Solr - User mailing list archive at Nabble.com. > >