Sow just make sure to use rows=1 ? -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
On Mon, Aug 3, 2009 at 5:51 PM, Yonik Seeley <yo...@lucidimagination.com>wrote: > On Mon, Aug 3, 2009 at 8:26 PM, Mark Bennett<mbenn...@ideaeng.com> wrote: > > Yonik, can you confirm reasoning below for 1.4 for a text field? > > The bit about warming? Looks right to me - a big base docset can > trigger short-circuit logic in the enum faceting code... using a > docset of size 1 currently avoids this. > > -Yonik > http://www.lucidimagination.com > > > > ( Of course faceting is so much faster in 1.4 anyway, it's probably worth > > the upgrade. > > https://issues.apache.org/jira/browse/SOLR-475 ) > > > > A warning for folks NOT using 1.4: > > > > At the bottom of this wiki page: (very bottom) > > http://wiki.apache.org/solr/SimpleFacetParameters > > It says: > > Warming > > facet.field queries using the term enumeration method can avoid the > > evaluation of some terms for greater efficiency. To force the evaluation > of > > all terms for warming, the base query should match a single document. > > > > I think this is OK in the newer version, because as of 1.4 the default is > > "fc", not "enum". But prior to 1.4 there was no fc! > > > > Wiki info on the default (enum vs. fc) > > http://wiki.apache.org/solr/SimpleFacetParameters > > > > facet.method > > This parameter indicates what type of algorithm/method to use when > > faceting a field. > > > > enum > > Enumerates all terms in a field, calculating the set intersection of > > documents that match the term with documents that match the query. This > was > > the default (and only) method for faceting multi-valued fields prior to > Solr > > 1.4. > > > > fc (stands for field cache) > > The facet counts are calculated by iterating over documents that match > > the query and summing the terms that appear in each document. This was > the > > default method for single valued fields prior to Solr 1.4. > > > > The default value is fc (except for BoolField) since it tends to use less > > memory and is faster when a field has many unique terms in the index. > > > > > > -- > > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 > > > > > > On Mon, Aug 3, 2009 at 2:49 PM, Yonik Seeley <yo...@lucidimagination.com > >wrote: > > > >> Sounds like faceting? > >> q=state:CA&facet=true&facet.field=title&facet.limit=1000 > >> > >> -Yonik > >> http://www.lucidimagination.com > >> > >> > >> On Mon, Aug 3, 2009 at 5:39 PM, Mark Bennett<mbenn...@ideaeng.com> > wrote: > >> > You can get a nice list of terms for a field using the Luke handler: > >> > http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000 > >> > > >> > But what I'd really like is to get the terms for the docs that match a > >> > particular slice of the index. > >> > > >> > For example, let's say I have records for all 50 states, but I want to > >> get > >> > the top 1,000 terms for documents in California. > >> > > >> > I'd like to add q or fq like this: > >> > > >> http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&q=state:CA > >> > OR > >> > > >> > http://localhost:8983/solr/admin/luke?fl=title&numTerms=1000&fq=state:CA > >> > > >> > Although I don't get any errors, this syntax doesn't seem to filter > the > >> > terms. Not a bug, nobody ever said it would. > >> > > >> > But has anybody written a utility to get term instances for a subset > of > >> the > >> > index, based on a query? And to be clear, I was hoping to get all of > the > >> > terms in matching documents, not just terms that are also present in > the > >> > query. > >> > > >> > Thanks, > >> > Mark > >> > > >> > -- > >> > Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com > >> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 > >> > > >> > > >