Yes, field collapsing is like faceting, only more so, and very useful, I believe. As my project gets going, I have lready imagined uses for it.
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 9/16/10, Andre Bickford <abickf...@softrek.com> wrote: > From: Andre Bickford <abickf...@softrek.com> > Subject: Re: Simple Filter Query (fq) Use Case Question > To: solr-user@lucene.apache.org > Date: Thursday, September 16, 2010, 4:45 PM > Thanks to everyone for your > suggestions. > > It seems that creating the index using gifts as the top > level entity is the appropriate approach so I can > effectively filter gifts on both the gift amount and > gift date without running into multiValued field issues. It > introduces a problem of listing donors multiple times, but > that can be addressed by the field collapsing feature which > will hopefully be completed in trunk soon. > > For anyone else who is looking for information on the Solr > equivalent of "select distinct", check out these resources: > > http://wiki.apache.org/solr/FieldCollapsing > https://issues.apache.org/jira/browse/SOLR-236 > > > > On Sep 16, 2010, at 2:26 PM, Dennis Gearon wrote: > > > So THAT'S what a core is! I have been wondering. Thank > you very much! > > Dennis Gearon > > > > Signature Warning > > ---------------- > > EARTH has a Right To Life, > > otherwise we all die. > > > > Read 'Hot, Flat, and Crowded' > > Laugh at http://www.yert.com/film.php > > > > > > --- On Thu, 9/16/10, Jonathan Rochkind <rochk...@jhu.edu> > wrote: > > > >> From: Jonathan Rochkind <rochk...@jhu.edu> > >> Subject: Re: Simple Filter Query (fq) Use Case > Question > >> To: "solr-user@lucene.apache.org" > <solr-user@lucene.apache.org> > >> Date: Thursday, September 16, 2010, 11:20 AM > >> One solr core has essentially one > >> index in it. (not only one 'field', > >> but one indexed collection of documents) There are > weird > >> hacks, like I > >> believe the spellcheck component kind of creates > it's own > >> sub-indexes, > >> not sure how it does that. > >> > >> You can have more than one core in a single solr > instance, > >> but they're > >> essentially seperate, there's no easy way to > 'join' accross > >> them or > >> anything, a given request targets one core. > >> > >> Dennis Gearon wrote: > >>> This brings me to ask a question that's been > on my > >> mind for awhile. > >>> > >>> Are indexes set up for the whole site, or a > set of > >> searches, with several different indexes for a > site? > >>> > >>> How many instances does one Solr/Lucene > instance have > >> access to, (not counting shards/segments)? > >>> Dennis Gearon > >>> > >>> Signature Warning > >>> ---------------- > >>> EARTH has a Right To Life, > >>> otherwise we all die. > >>> > >>> Read 'Hot, Flat, and Crowded' > >>> Laugh at http://www.yert.com/film.php > >>> > >>> > >>> --- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de> > >> wrote: > >>> > >>> > >>>> From: Chantal Ackermann <chantal.ackerm...@btelligent.de> > >>>> Subject: RE: Simple Filter Query (fq) Use > Case > >> Question > >>>> To: "solr-user@lucene.apache.org" > >> <solr-user@lucene.apache.org> > >>>> Date: Thursday, September 16, 2010, 1:05 > AM > >>>> Hi Andre, > >>>> > >>>> changing the entity in your index from > donor to > >> gift > >>>> changes of course > >>>> the scope of your search results. I found > it > >> helpful to > >>>> re-think such > >>>> change from that "other" side (the result > side). > >>>> If the users of your search application > look for > >> individual > >>>> gifts, in > >>>> the end, then changing the index to gift > is for > >> the > >>>> better. > >>>> > >>>> If they are searching for donors, then I > would > >> rethink the > >>>> change but > >>>> not discard it completely: you can still > get the > >> list of > >>>> distinct donors > >>>> by facetting over donors. You can show the > users > >> that list > >>>> of donors > >>>> (the facets), and they can chose from it > and get > >> all > >>>> information on that > >>>> donor (restricted to the original query, > of > >> course). The > >>>> information > >>>> would include the actual search result of > a list > >> of gifts > >>>> that passed > >>>> the query. > >>>> > >>>> Cheers, > >>>> Chantal > >>>> > >>>> On Wed, 2010-09-15 at 21:49 +0200, Andre > Bickford > >> wrote: > >>>> > >>>>> Thanks for the response Erick. > >>>>> > >>>>> I did actually try exactly what you > suggested. > >> I > >>>>> > >>>> flipped the index over so that a gift is > the > >> document. This > >>>> solution certainly solves the previous > problem, > >> but > >>>> introduces a new issue where the search > results > >> show > >>>> duplicate donors. If a donor gave 12 times > in a > >> year, and we > >>>> offer full years as facet ranges, my > understanding > >> is that > >>>> you'd see that donor 12 times in the > search > >> results, once > >>>> for each gift document. Obviously I could > do some > >> client > >>>> side filtering to list only distinct > donors, but I > >> was > >>>> hoping to avoid that. > >>>> > >>>>> If I've simply stumbled into the > basic > >> tradeoffs of > >>>>> > >>>> denormalization, I can live with client > side > >> de-duplication, > >>>> but if you have any further suggestions > I'm all > >> eyes. > >>>> > >>>>> As for sizing, we have some huge > charities as > >> clients. > >>>>> > >>>> However, right now I'm testing on a copy > of prod > >> data from a > >>>> smaller client with ~350,000 donors and > ~8,000,000 > >> gift > >>>> records. So, when I "flipped" the index > around as > >> you > >>>> suggested, it went from 350,000 documents > to > >> 8,000,000 > >>>> documents. No issues with performance at > all. > >>>> > >>>>> Thanks again, > >>>>> Andre > >>>>> > >>>>> -----Original Message----- > >>>>> From: Erick Erickson [mailto:erickerick...@gmail.com] > >>>>> > >>>>> Sent: Wednesday, September 15, 2010 > 3:09 PM > >>>>> To: solr-user@lucene.apache.org > >>>>> Subject: Re: Simple Filter Query (fq) > Use > >> Case > >>>>> > >>>> Question > >>>> > >>>>> One strategy is to denormalize all the > way. > >> That is, > >>>>> > >>>> each > >>>> > >>>>> Solr "document" is Gift Amount and > Gift Date > >> would not > >>>>> > >>>> be multiValued. > >>>> > >>>>> You'd create a different "document" > for each > >> gift, so > >>>>> > >>>> you'd have multiple > >>>> > >>>>> documents with the same Id, Name, and > Address. > >> Be > >>>>> > >>>> careful, though, > >>>> > >>>>> if you've defined Id as a UniqueKey, > you'd > >> only have > >>>>> > >>>> one record/donor. You > >>>> > >>>>> can handle this easily enough by > making a > >> composite > >>>>> > >>>> key of Id+Gift Date > >>>> > >>>>> (assuming no donor made more than one > gift on > >> exactly > >>>>> > >>>> the same date). > >>>> > >>>>> I know this goes completely against > all the > >> reflexes > >>>>> > >>>> you've built up with > >>>> > >>>>> working with DBs, but... > >>>>> > >>>>> Can you give us a clue how many > donations > >> we're > >>>>> > >>>> talking about here? > >>>> > >>>>> You'd have to be working with a really > big > >> nonprofit > >>>>> > >>>> to get enough documents > >>>> > >>>>> to have to start worrying about making > your > >> index > >>>>> > >>>> smaller. > >>>> > >>>>> HTH > >>>>> Erick > >>>>> > >>>>> On Wed, Sep 15, 2010 at 1:41 PM, > Andre > >> Bickford <abickf...@softrek.com>wrote: > >>>>> > >>>>> > >>>>>> I'm working on creating a solr > index > >> search for a > >>>>>> > > >>>> charitable organization. > >>>> > >>>>>> The solr index stores documents of > donors. > >> Each > >>>>>> > > >>>> donor document has the > >>>> > >>>>>> following four fields: > >>>>>> > >>>>>> Id > >>>>>> Name > >>>>>> Address > >>>>>> Gift Amount (multiValued) > >>>>>> Gift Date (multiValued) > >>>>>> > >>>>>> In our relational database, there > is a > >>>>>> > > >>>> one-to-many relationship between the > >>>> > >>>>>> DONOR table and the GIFT table. > One donor > >> can of > >>>>>> > > >>>> course give many gifts over > >>>> > >>>>>> time. Consequently, I created the > Gift > >> Amount and > >>>>>> > > >>>> Gift Date fields to be > >>>> > >>>>>> mutiValued. > >>>>>> > >>>>>> Now, consider the following query > filtered > >> for > >>>>>> > > >>>> gifts last month between $0 > >>>> > >>>>>> and $100: > >>>>>> > >>>>>> q=name:Jones > >>>>>> fq=giftDate:[NOW/MONTH-1 TO > NOW/MONTH] > >>>>>> fq=giftAmount:[0 TO 100] > >>>>>> > >>>>>> The results show me donors who > donated ANY > >> amount > >>>>>> > > >>>> in the past month and > >>>> > >>>>>> donors who had EVER in the past > given a > >> gift > >>>>>> > > >>>> between $0 and $100. I was > >>>> > >>>>>> hoping to only see donors who had > given a > >> gift > >>>>>> > > >>>> between $0 and $100 in the > >>>> > >>>>>> past month exclusively. I believe > the > >> problem is > >>>>>> > > >>>> that I neglected to > >>>> > >>>>>> consider that for two multiValued > fields, > >> while > >>>>>> > > >>>> the values might align > >>>> > >>>>>> "index wise", there is really no > other > >>>>>> > > >>>> association between the two fields, > >>>> > >>>>>> so the filter query intersection > isn't > >> really > >>>>>> > > >>>> behaving as I expected. > >>>> > >>>>>> I think this is a fundamental > question of > >>>>>> > > >>>> one-to-many denormalization, but > >>>> > >>>>>> obviously I'm not yet experienced > enough > >> with > >>>>>> > > >>>> Lucene/Solr to find a > >>>> > >>>>>> solution. As to why not just keep > using a > >>>>>> > > >>>> relational database, it's because > >>>> > >>>>>> I'm trying to provide a faceting > solution > >> to > >>>>>> > > >>>> "drill down" to donors. The > >>>> > >>>>>> aforementioned fq parameters would > come > >> from > >>>>>> > > >>>> faceting. Oh, that and Oracle > >>>> > >>>>>> Text indexes are a PITA. :-) > >>>>>> > >>>>>> Thanks for any help you can > provide. > >>>>>> > >>>>>> André Bickford > >>>>>> Software Engineering Team Leader > >>>>>> SofTrek Corporation > >>>>>> 30 Bryant Woods North > Amherst, NY > >> 14228 > >>>>>> 716.691.2800 x154 > 800.442.9211 > >> Fax: > >>>>>> > > >>>> 716.691.2828 > >>>> > >>>>>> abickf...@softrek.com > >> > >>>>>> > > >>>> www.softrek.com > >>>> > >>>>>> > >>>>>> > > >>>> > >>>> > >>> > >>> > >> > >