RE: Aggregated facet value counts?

Peter S Fri, 29 Jan 2010 06:28:43 -0800

Well, it wouldn't be 'every' combination - more of 'any' combination at 
query-time.
 
The 'arbitrary' part of the requirement is because it's not practical to 
predict every combination a user might ask for, although generally users would 
tend to search for similar/the same query combinations (but perhaps with 
different date ranges, for example).
 
If 'predicted aggregate fields' were calculated at index-time on, say, 10 
fields (the schema in question actually as 73 fields), that's 3,628,801 new 
fields. A large percentage of these would likely never be used (which ones 
would depend on the user, environment etc.).


Perhaps a more 'typical' use case than my network-based example would be a 
product search web page, where you want to show the number of products that are 
made by a manufacturer and within a certain price range (e.g. Sony [$600-$800] 
(15) ). To obtain the (15) facet count value, you would have to correlate the 
number of Sony products (say, (861)), and the products that fall into the [600 
TO 800] price range (say, (1226) ). The (15) would be the intersection of the 
Sony hits and the price range hits by 'manufacturer:Sony'. Am I right that 
filter queries could only do this for document hits if you know the field 
values ahead of time (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The 
facets could then be derived by simply counting the numFound for each result 
set.

 

If there were subsearch support in Solr (i.e. take the output of a query and 
use it as input into another) that included facets [perhaps there is such 
support?], it might be used to achieve this effect.


A custom query parser plugin could work, maybe? I suppose it would need to 
gather up all the separate facets and correlate them according to the input 
query (e.g. host and user, or manufacturer and price range). Such a mechanism 
would be crying out for caching, but perhaps it could leverage the existing 
field and query caches.
 

Peter

 


> From: erik.hatc...@gmail.com
> To: solr-user@lucene.apache.org
> Subject: Re: Aggregated facet value counts?
> Date: Fri, 29 Jan 2010 07:39:44 -0500
> 
> Creating values for every possible combination is what you're asking 
> Solr to do at query-time, and as far as I know there isn't really a 
> way to accomplish that like you're asking. Is the need really to be 
> arbitrary here?
> 
> Erik
> 
> On Jan 29, 2010, at 7:25 AM, Peter S wrote:
> 
> >
> > Hi Erik,
> >
> >
> >
> > Thanks for your reply. That's an interesting idea doing it at index- 
> > time, and a good idea for known field combinations.
> >
> > The only thing is........
> >
> > How to handle arbitrary field combinations? - i.e. to allow the 
> > caller to specify any combination of fields at query-time?
> >
> > So, yes, the data is available at index-time, but the combination 
> > isn't (short of creating fields for every possible combination).
> >
> >
> >
> > Peter
> >
> >
> >
> >> From: erik.hatc...@gmail.com
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Aggregated facet value counts?
> >> Date: Fri, 29 Jan 2010 06:30:27 -0500
> >>
> >> When faced with this type of situation where the data is entirely
> >> available at index-time, simply create an aggregated field that glues
> >> the two pieces together, and facet on that.
> >>
> >> Erik
> >>
> >> On Jan 29, 2010, at 6:16 AM, Peter S wrote:
> >>
> >>>
> >>> Hi,
> >>>
> >>>
> >>>
> >>> I was wondering if anyone had come across this use case, and if this
> >>> type of faceting is possible:
> >>>
> >>>
> >>>
> >>> The requirement is to build a query such that an aggregated facet
> >>> count of common (and'ed) field values form the basis of each
> >>> returned facet count.
> >>>
> >>>
> >>>
> >>> For example:
> >>>
> >>> Let's say I have a number of documents in an index with, among
> >>> others, the fields 'host' and 'user':
> >>>
> >>>
> >>>
> >>> Doc1 host:machine_1 user:user_1
> >>>
> >>> Doc2 host:machine_1 user:user_2
> >>>
> >>> Doc3 host:machine_1 user:user_1
> >>>
> >>> Doc3 host:machine_1 user:user_1
> >>>
> >>>
> >>>
> >>> Doc4 host:machine_2 user:user_1
> >>>
> >>> Doc5 host:machine_2 user:user_1
> >>>
> >>> Doc6 host:machine_2 user:user_4
> >>>
> >>>
> >>>
> >>> Doc7 host:machine_1 user:user_4
> >>>
> >>>
> >>>
> >>> Is it possible to get facets back that would give the count of
> >>> documents that have common host AND user values (preferably ordered
> >>> - i.e. host then user for this example, so as not to create a
> >>> factorial explosion)? Note that the caller wouldn't know what
> >>> machine and user values exist, only the field names.
> >>>
> >>> I've tried using facet queries in various ways to see if they could
> >>> work for this, but I believe facet queries work on a different plane
> >>> than this requirement (narrowing the term count, a.o.t. 
> >>> aggregating).
> >>>
> >>>
> >>>
> >>> For the example above, the desired result would be:
> >>>
> >>>
> >>>
> >>> machine_1/user_1 (3)
> >>>
> >>> machine_1/user_2 (1)
> >>>
> >>> machine_1/user_4 (1)
> >>>
> >>>
> >>>
> >>> machine_2/user_1 (2)
> >>>
> >>> machine_2/user_4 (1)
> >>>
> >>>
> >>>
> >>> Has anyone had a need for this type of faceting and found a way to
> >>> achieve it?
> >>>
> >>>
> >>>
> >>> Many thanks,
> >>>
> >>> Peter
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _________________________________________________________________
> >>> We want to hear all your funny, exciting and crazy Hotmail stories.
> >>> Tell us now
> >>> http://clk.atdmt.com/UKM/go/195013117/direct/01/
> >>
> > 
> > _________________________________________________________________
> > Tell us your greatest, weirdest and funniest Hotmail stories
> > http://clk.atdmt.com/UKM/go/195013117/direct/01/
> 

                                          
_________________________________________________________________
We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

RE: Aggregated facet value counts?

Reply via email to