Well, it wouldn't be 'every' combination - more of 'any' combination at query-time. The 'arbitrary' part of the requirement is because it's not practical to predict every combination a user might ask for, although generally users would tend to search for similar/the same query combinations (but perhaps with different date ranges, for example). If 'predicted aggregate fields' were calculated at index-time on, say, 10 fields (the schema in question actually as 73 fields), that's 3,628,801 new fields. A large percentage of these would likely never be used (which ones would depend on the user, environment etc.).
Perhaps a more 'typical' use case than my network-based example would be a product search web page, where you want to show the number of products that are made by a manufacturer and within a certain price range (e.g. Sony [$600-$800] (15) ). To obtain the (15) facet count value, you would have to correlate the number of Sony products (say, (861)), and the products that fall into the [600 TO 800] price range (say, (1226) ). The (15) would be the intersection of the Sony hits and the price range hits by 'manufacturer:Sony'. Am I right that filter queries could only do this for document hits if you know the field values ahead of time (e.g. fq=manufacturer:Sony&fq=price:[600 TO 800])? The facets could then be derived by simply counting the numFound for each result set. If there were subsearch support in Solr (i.e. take the output of a query and use it as input into another) that included facets [perhaps there is such support?], it might be used to achieve this effect. A custom query parser plugin could work, maybe? I suppose it would need to gather up all the separate facets and correlate them according to the input query (e.g. host and user, or manufacturer and price range). Such a mechanism would be crying out for caching, but perhaps it could leverage the existing field and query caches. Peter > From: erik.hatc...@gmail.com > To: solr-user@lucene.apache.org > Subject: Re: Aggregated facet value counts? > Date: Fri, 29 Jan 2010 07:39:44 -0500 > > Creating values for every possible combination is what you're asking > Solr to do at query-time, and as far as I know there isn't really a > way to accomplish that like you're asking. Is the need really to be > arbitrary here? > > Erik > > On Jan 29, 2010, at 7:25 AM, Peter S wrote: > > > > > Hi Erik, > > > > > > > > Thanks for your reply. That's an interesting idea doing it at index- > > time, and a good idea for known field combinations. > > > > The only thing is........ > > > > How to handle arbitrary field combinations? - i.e. to allow the > > caller to specify any combination of fields at query-time? > > > > So, yes, the data is available at index-time, but the combination > > isn't (short of creating fields for every possible combination). > > > > > > > > Peter > > > > > > > >> From: erik.hatc...@gmail.com > >> To: solr-user@lucene.apache.org > >> Subject: Re: Aggregated facet value counts? > >> Date: Fri, 29 Jan 2010 06:30:27 -0500 > >> > >> When faced with this type of situation where the data is entirely > >> available at index-time, simply create an aggregated field that glues > >> the two pieces together, and facet on that. > >> > >> Erik > >> > >> On Jan 29, 2010, at 6:16 AM, Peter S wrote: > >> > >>> > >>> Hi, > >>> > >>> > >>> > >>> I was wondering if anyone had come across this use case, and if this > >>> type of faceting is possible: > >>> > >>> > >>> > >>> The requirement is to build a query such that an aggregated facet > >>> count of common (and'ed) field values form the basis of each > >>> returned facet count. > >>> > >>> > >>> > >>> For example: > >>> > >>> Let's say I have a number of documents in an index with, among > >>> others, the fields 'host' and 'user': > >>> > >>> > >>> > >>> Doc1 host:machine_1 user:user_1 > >>> > >>> Doc2 host:machine_1 user:user_2 > >>> > >>> Doc3 host:machine_1 user:user_1 > >>> > >>> Doc3 host:machine_1 user:user_1 > >>> > >>> > >>> > >>> Doc4 host:machine_2 user:user_1 > >>> > >>> Doc5 host:machine_2 user:user_1 > >>> > >>> Doc6 host:machine_2 user:user_4 > >>> > >>> > >>> > >>> Doc7 host:machine_1 user:user_4 > >>> > >>> > >>> > >>> Is it possible to get facets back that would give the count of > >>> documents that have common host AND user values (preferably ordered > >>> - i.e. host then user for this example, so as not to create a > >>> factorial explosion)? Note that the caller wouldn't know what > >>> machine and user values exist, only the field names. > >>> > >>> I've tried using facet queries in various ways to see if they could > >>> work for this, but I believe facet queries work on a different plane > >>> than this requirement (narrowing the term count, a.o.t. > >>> aggregating). > >>> > >>> > >>> > >>> For the example above, the desired result would be: > >>> > >>> > >>> > >>> machine_1/user_1 (3) > >>> > >>> machine_1/user_2 (1) > >>> > >>> machine_1/user_4 (1) > >>> > >>> > >>> > >>> machine_2/user_1 (2) > >>> > >>> machine_2/user_4 (1) > >>> > >>> > >>> > >>> Has anyone had a need for this type of faceting and found a way to > >>> achieve it? > >>> > >>> > >>> > >>> Many thanks, > >>> > >>> Peter > >>> > >>> > >>> > >>> > >>> > >>> _________________________________________________________________ > >>> We want to hear all your funny, exciting and crazy Hotmail stories. > >>> Tell us now > >>> http://clk.atdmt.com/UKM/go/195013117/direct/01/ > >> > > > > _________________________________________________________________ > > Tell us your greatest, weirdest and funniest Hotmail stories > > http://clk.atdmt.com/UKM/go/195013117/direct/01/ > _________________________________________________________________ We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now http://clk.atdmt.com/UKM/go/195013117/direct/01/