Yes, field collapsing is like faceting, only more so, and very useful, I 
believe. As my project gets going, I have lready imagined uses for it.


Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Thu, 9/16/10, Andre Bickford <abickf...@softrek.com> wrote:

> From: Andre Bickford <abickf...@softrek.com>
> Subject: Re: Simple Filter Query (fq) Use Case Question
> To: solr-user@lucene.apache.org
> Date: Thursday, September 16, 2010, 4:45 PM
> Thanks to everyone for your
> suggestions.
> 
> It seems that creating the index using gifts as the top
> level entity is the appropriate approach so I can
> effectively filter gifts  on both the gift amount and
> gift date without running into multiValued field issues. It
> introduces a problem of listing donors multiple times, but
> that can be addressed by the field collapsing feature which
> will hopefully be completed in trunk soon.
> 
> For anyone else who is looking for information on the Solr
> equivalent of "select distinct", check out these resources:
> 
> http://wiki.apache.org/solr/FieldCollapsing
> https://issues.apache.org/jira/browse/SOLR-236
>  
> 
> 
> On Sep 16, 2010, at 2:26 PM, Dennis Gearon wrote:
> 
> > So THAT'S what a core is! I have been wondering. Thank
> you very much!
> > Dennis Gearon
> > 
> > Signature Warning
> > ----------------
> > EARTH has a Right To Life,
> >  otherwise we all die.
> > 
> > Read 'Hot, Flat, and Crowded'
> > Laugh at http://www.yert.com/film.php
> > 
> > 
> > --- On Thu, 9/16/10, Jonathan Rochkind <rochk...@jhu.edu>
> wrote:
> > 
> >> From: Jonathan Rochkind <rochk...@jhu.edu>
> >> Subject: Re: Simple Filter Query (fq) Use Case
> Question
> >> To: "solr-user@lucene.apache.org"
> <solr-user@lucene.apache.org>
> >> Date: Thursday, September 16, 2010, 11:20 AM
> >> One solr core has essentially one
> >> index in it. (not only one 'field', 
> >> but one indexed collection of documents) There are
> weird
> >> hacks, like I 
> >> believe the spellcheck component kind of creates
> it's own
> >> sub-indexes, 
> >> not sure how it does that.
> >> 
> >> You can have more than one core in a single solr
> instance,
> >> but they're 
> >> essentially seperate, there's no easy way to
> 'join' accross
> >> them or 
> >> anything, a given request targets one core.
> >> 
> >> Dennis Gearon wrote:
> >>> This brings me to ask a question that's been
> on my
> >> mind for awhile.
> >>> 
> >>> Are indexes set up for the whole site, or a
> set of
> >> searches, with several different indexes for a
> site?
> >>> 
> >>> How many instances does one Solr/Lucene
> instance have
> >> access to, (not counting shards/segments)?
> >>> Dennis Gearon
> >>> 
> >>> Signature Warning
> >>> ----------------
> >>> EARTH has a Right To Life,
> >>>    otherwise we all die.
> >>> 
> >>> Read 'Hot, Flat, and Crowded'
> >>> Laugh at http://www.yert.com/film.php
> >>> 
> >>> 
> >>> --- On Thu, 9/16/10, Chantal Ackermann <chantal.ackerm...@btelligent.de>
> >> wrote:
> >>> 
> >>>    
> >>>> From: Chantal Ackermann <chantal.ackerm...@btelligent.de>
> >>>> Subject: RE: Simple Filter Query (fq) Use
> Case
> >> Question
> >>>> To: "solr-user@lucene.apache.org"
> >> <solr-user@lucene.apache.org>
> >>>> Date: Thursday, September 16, 2010, 1:05
> AM
> >>>> Hi Andre,
> >>>> 
> >>>> changing the entity in your index from
> donor to
> >> gift
> >>>> changes of course
> >>>> the scope of your search results. I found
> it
> >> helpful to
> >>>> re-think such
> >>>> change from that "other" side (the result
> side).
> >>>> If the users of your search application
> look for
> >> individual
> >>>> gifts, in
> >>>> the end, then changing the index to gift
> is for
> >> the
> >>>> better.
> >>>> 
> >>>> If they are searching for donors, then I
> would
> >> rethink the
> >>>> change but
> >>>> not discard it completely: you can still
> get the
> >> list of
> >>>> distinct donors
> >>>> by facetting over donors. You can show the
> users
> >> that list
> >>>> of donors
> >>>> (the facets), and they can chose from it
> and get
> >> all
> >>>> information on that
> >>>> donor (restricted to the original query,
> of
> >> course). The
> >>>> information
> >>>> would include the actual search result of
> a list
> >> of gifts
> >>>> that passed
> >>>> the query.
> >>>> 
> >>>> Cheers,
> >>>> Chantal
> >>>> 
> >>>> On Wed, 2010-09-15 at 21:49 +0200, Andre
> Bickford
> >> wrote:
> >>>>      
> >>>>> Thanks for the response Erick.
> >>>>> 
> >>>>> I did actually try exactly what you
> suggested.
> >> I
> >>>>>        
> >>>> flipped the index over so that a gift is
> the
> >> document. This
> >>>> solution certainly solves the previous
> problem,
> >> but
> >>>> introduces a new issue where the search
> results
> >> show
> >>>> duplicate donors. If a donor gave 12 times
> in a
> >> year, and we
> >>>> offer full years as facet ranges, my
> understanding
> >> is that
> >>>> you'd see that donor 12 times in the
> search
> >> results, once
> >>>> for each gift document. Obviously I could
> do some
> >> client
> >>>> side filtering to list only distinct
> donors, but I
> >> was
> >>>> hoping to avoid that.
> >>>>      
> >>>>> If I've simply stumbled into the
> basic
> >> tradeoffs of
> >>>>>        
> >>>> denormalization, I can live with client
> side
> >> de-duplication,
> >>>> but if you have any further suggestions
> I'm all
> >> eyes.
> >>>>      
> >>>>> As for sizing, we have some huge
> charities as
> >> clients.
> >>>>>        
> >>>> However, right now I'm testing on a copy
> of prod
> >> data from a
> >>>> smaller client with ~350,000 donors and
> ~8,000,000
> >> gift
> >>>> records. So, when I "flipped" the index
> around as
> >> you
> >>>> suggested, it went from 350,000 documents
> to
> >> 8,000,000
> >>>> documents. No issues with performance at
> all.
> >>>>      
> >>>>> Thanks again,
> >>>>> Andre
> >>>>> 
> >>>>> -----Original Message-----
> >>>>> From: Erick Erickson [mailto:erickerick...@gmail.com]
> >>>>>        
> >>>>> Sent: Wednesday, September 15, 2010
> 3:09 PM
> >>>>> To: solr-user@lucene.apache.org
> >>>>> Subject: Re: Simple Filter Query (fq)
> Use
> >> Case
> >>>>>        
> >>>> Question
> >>>>      
> >>>>> One strategy is to denormalize all the
> way.
> >> That is,
> >>>>>        
> >>>> each
> >>>>      
> >>>>> Solr "document" is Gift Amount and
> Gift Date
> >> would not
> >>>>>        
> >>>> be multiValued.
> >>>>      
> >>>>> You'd create a different "document"
> for each
> >> gift, so
> >>>>>        
> >>>> you'd have multiple
> >>>>      
> >>>>> documents with the same Id, Name, and
> Address.
> >> Be
> >>>>>        
> >>>> careful, though,
> >>>>      
> >>>>> if you've defined Id as a UniqueKey,
> you'd
> >> only have
> >>>>>        
> >>>> one record/donor. You
> >>>>      
> >>>>> can handle this easily enough by
> making a
> >> composite
> >>>>>        
> >>>> key of Id+Gift Date
> >>>>      
> >>>>> (assuming no donor made more than one
> gift on
> >> exactly
> >>>>>        
> >>>> the same date).
> >>>>      
> >>>>> I know this goes completely against
> all the
> >> reflexes
> >>>>>        
> >>>> you've built up with
> >>>>      
> >>>>> working with DBs, but...
> >>>>> 
> >>>>> Can you give us a clue how many
> donations
> >> we're
> >>>>>        
> >>>> talking about here?
> >>>>      
> >>>>> You'd have to be working with a really
> big
> >> nonprofit
> >>>>>        
> >>>> to get enough documents
> >>>>      
> >>>>> to have to start worrying about making
> your
> >> index
> >>>>>        
> >>>> smaller.
> >>>>      
> >>>>> HTH
> >>>>> Erick
> >>>>> 
> >>>>> On Wed, Sep 15, 2010 at 1:41 PM,
> Andre
> >> Bickford <abickf...@softrek.com>wrote:
> >>>>> 
> >>>>>        
> >>>>>> I'm working on creating a solr
> index
> >> search for a
> >>>>>>         
> 
> >>>> charitable organization.
> >>>>      
> >>>>>> The solr index stores documents of
> donors.
> >> Each
> >>>>>>         
> 
> >>>> donor document has the
> >>>>      
> >>>>>> following four fields:
> >>>>>> 
> >>>>>> Id
> >>>>>> Name
> >>>>>> Address
> >>>>>> Gift Amount (multiValued)
> >>>>>> Gift Date (multiValued)
> >>>>>> 
> >>>>>> In our relational database, there
> is a
> >>>>>>         
> 
> >>>> one-to-many relationship between the
> >>>>      
> >>>>>> DONOR table and the GIFT table.
> One donor
> >> can of
> >>>>>>         
> 
> >>>> course give many gifts over
> >>>>      
> >>>>>> time. Consequently, I created the
> Gift
> >> Amount and
> >>>>>>         
> 
> >>>> Gift Date fields to be
> >>>>      
> >>>>>> mutiValued.
> >>>>>> 
> >>>>>> Now, consider the following query
> filtered
> >> for
> >>>>>>         
> 
> >>>> gifts last month between $0
> >>>>      
> >>>>>> and $100:
> >>>>>> 
> >>>>>> q=name:Jones
> >>>>>> fq=giftDate:[NOW/MONTH-1 TO
> NOW/MONTH]
> >>>>>> fq=giftAmount:[0 TO 100]
> >>>>>> 
> >>>>>> The results show me donors who
> donated ANY
> >> amount
> >>>>>>         
> 
> >>>> in the past month and
> >>>>      
> >>>>>> donors who had EVER in the past
> given a
> >> gift
> >>>>>>         
> 
> >>>> between $0 and $100. I was
> >>>>      
> >>>>>> hoping to only see donors who had
> given a
> >> gift
> >>>>>>         
> 
> >>>> between $0 and $100 in the
> >>>>      
> >>>>>> past month exclusively. I believe
> the
> >> problem is
> >>>>>>         
> 
> >>>> that I neglected to
> >>>>      
> >>>>>> consider that for two multiValued
> fields,
> >> while
> >>>>>>         
> 
> >>>> the values might align
> >>>>      
> >>>>>> "index wise", there is really no
> other
> >>>>>>         
> 
> >>>> association between the two fields,
> >>>>      
> >>>>>> so the filter query intersection
> isn't
> >> really
> >>>>>>         
> 
> >>>> behaving as I expected.
> >>>>      
> >>>>>> I think this is a fundamental
> question of
> >>>>>>         
> 
> >>>> one-to-many denormalization, but
> >>>>      
> >>>>>> obviously I'm not yet experienced
> enough
> >> with
> >>>>>>         
> 
> >>>> Lucene/Solr to find a
> >>>>      
> >>>>>> solution. As to why not just keep
> using a
> >>>>>>         
> 
> >>>> relational database, it's because
> >>>>      
> >>>>>> I'm trying to provide a faceting
> solution
> >> to
> >>>>>>         
> 
> >>>> "drill down" to donors. The
> >>>>      
> >>>>>> aforementioned fq parameters would
> come
> >> from
> >>>>>>         
> 
> >>>> faceting. Oh, that and Oracle
> >>>>      
> >>>>>> Text indexes are a PITA. :-)
> >>>>>> 
> >>>>>> Thanks for any help you can
> provide.
> >>>>>> 
> >>>>>> André Bickford
> >>>>>> Software Engineering Team Leader
> >>>>>> SofTrek Corporation
> >>>>>> 30 Bryant Woods North 
> Amherst, NY
> >> 14228
> >>>>>> 716.691.2800 x154 
> 800.442.9211 
> >> Fax:
> >>>>>>         
> 
> >>>> 716.691.2828
> >>>>      
> >>>>>> abickf...@softrek.com
> >> 
> >>>>>>         
> 
> >>>> www.softrek.com
> >>>>      
> >>>>>> 
> >>>>>>         
> 
> >>>> 
> >>>>      
> >>> 
> >>>    
> >> 
> 
>

Reply via email to