Hmmm, I may have mis-lead you. Re-reading my text it wasn't very well written....
TF/IDF calculations are, indeed, per-field. I was trying to say that there was no difference between storing all the data for an individual field as a single long string of text in a single-valued field or as several shorter strings in a multi-valued field. Best Erick On Tue, May 31, 2011 at 12:16 PM, Ian Holsman <had...@holsman.net> wrote: > > On May 31, 2011, at 12:11 PM, Erick Erickson wrote: > >> Can you explain the use-case a bit more here? Especially the post-query >> processing and how you expect the multiple documents to help here. >> > > we have a collection of related stories. when a user searches for something, > we might not want to display the story that is most-relevant (according to > SOLR), but according to other home-grown rules. by combing all the > possibilities in one SolrDocument, we can avoid a DB-hit to get related > stories. > > >> But TF/IDF is calculated over all the values in the field. There's really no >> difference between a multi-valued field and storing all the data in a >> single field >> as far as relevance calculations are concerned. >> > > so.. it will suck regardless.. I thought we had per-field relevance in the > current trunk. :-( > > >> Best >> Erick >> >> On Tue, May 31, 2011 at 11:02 AM, Ian Holsman <had...@holsman.net> wrote: >>> Hi. >>> >>> I want to store a list of documents (say each being 30-60k of text) into a >>> single SolrDocument. (to speed up post-retrieval querying) >>> >>> In order to do this, I need to know if lucene calculates the TF/IDF score >>> over the entire field or does it treat each value in the list as a unique >>> field? >>> >>> If I can't store it as a multi-value, I could create a schema where I put >>> each document into a unique field, but I'm not sure how to create the query >>> to search all the fields. >>> >>> >>> Regards >>> Ian >>> >>> > >