> > Does the facet aggregation take place on the Solr search server, or the > Solr client? > Solr server.
Faceting is an expensive operation by nature, especially when the hits are large in number. Solr caches these values once computed. You might want to tweak cache related parameters in your solr config for better performance. Read up on the caching section ( http://wiki.apache.org/solr/SolrConfigXml#head-ffe19c34abf267ca2d49d9e7102feab8c79b5fb5) for details. Cheers Avlesh On Sat, Jul 11, 2009 at 12:01 AM, Bradford Stephens < bradfordsteph...@gmail.com> wrote: > Does the facet aggregation take place on the Solr search server, or > the Solr client? > > It's pretty slow for me -- on a machine with 8 cores/ 8 GB RAM, 50 > million document index (about 36M unique values in the "author" > field), a query that returns 131,000 hits takes about 20 seconds to > calculate the top 50 authors. The query I'm running is this: > > > http://dttest10:8983/solr/select/select?q=java&facet=true&facet.field=authorname > : > > > > On Thu, Jul 9, 2009 at 10:32 PM, Bradford > Stephens<bradfordsteph...@gmail.com> wrote: > > Oh, wow... I think that faceted search is the right path, especially > > since seeing this amazing site: > > > http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr > > > > I hope it's performant over hundreds of thousands of search results :) > > > > On Thu, Jul 9, 2009 at 10:13 PM, Bradford > > Stephens<bradfordsteph...@gmail.com> wrote: > >> It looks like field collapsing may be the key: > >> http://issues.apache.org/jira/browse/SOLR-236 > >> > >> But it also doesn't seem to be 'finalized' yet. I wonder how > >> performant it is with indexes of 50 million documents+? > >> > >> On Thu, Jul 9, 2009 at 9:42 PM, shb<suh...@gmail.com> wrote: > >>> you can refer to the facet search of solr, that might help you. > >>> > >>> 2009/7/10 Bradford Stephens <bradfordsteph...@gmail.com> > >>> > >>>> Greetings, > >>>> > >>>> We've been experimenting with grouping fields returned from document > >>>> search results in Lucene, and we haven't gotten anything very > >>>> encouraging. Basically, the more results we return, the longer it > >>>> takes -- tens of seconds. Probably because we're doing expensive disks > >>>> seeks. I'm hoping the SOLR crew out there may provide some insight :) > >>>> > >>>> What we're trying to do is similar to SQL's "GROUP BY". Let's say we > >>>> have documents indexed by keyword for a content body, and also indexed > >>>> by an Author name. If I search our document store (very large) for the > >>>> word "laptop", I would like to be able to calculate the 10 authors > >>>> that appeared the most. > >>>> > >>>> I've done some searching through the mailing list, but couldn't glean > >>>> much insight. What do you think? > >>>> > >>>> -- > >>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social > >>>> Media, and Computer Science > >>>> > >>> > >> > >> > >> > >> -- > >> http://www.roadtofailure.com -- The Fringes of Scalability, Social > >> Media, and Computer Science > >> > > > > > > > > -- > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > > Media, and Computer Science > > > > > > -- > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science >