seems like the facet search is not all that suited for a full text field. ( http://search.lucidimagination.com/search/document/178f1a82ff19070c/solr_severe_error_when_doing_a_faceted_search#16562790cda76197 )
Maybe i should go another direction. I think that the HighFreqTerms approach, just not sure how to start. On Thu, Apr 21, 2011 at 2:23 AM, Ofer Fort <o...@tra.cx> wrote: > thanks, but that's what i started with, but it took an even longer time and > threw this: > Approaching too many values for UnInvertedField faceting on field 'text' : > bucket size=15560140 > Approaching too many values for UnInvertedField faceting on field 'text : > bucket size=15619075 > Exception during facet counts:org.apache.solr.common.SolrException: Too > many values for UnInvertedField faceting on field text > > > > On Thu, Apr 21, 2011 at 2:11 AM, Jonathan Rochkind <rochk...@jhu.edu>wrote: > >> I think faceting is probably the best way to do that, indeed. It might be >> slow, but it's kind of set up for exactly that case, I can't imagine any >> other technique being faster -- there's stuff that has to be done to look up >> the info you want. >> >> BUT, I see your problem: don't use facet.method=enum. Use >> facet.method=fc. Works a LOT better for very high arity fields (lots and >> lots of unique values) like you have. I bet you'll see significant speed-up >> if you use facet.method=fc instead, hopefully fast enough to be workable. >> >> With facet.method=enum, I would have indeed predicted it would be horribly >> slow, before solr 1.4 when facet.method=fc became available, it was nearly >> impossible to facet on very high arity fields, facet.method=fc is the magic. >> I think facet.method=fc is even the default in Solr 1.4+, if you hadn't >> explicitly set it to enum instead! >> >> Jonathan >> ________________________________________ >> From: Ofer Fort [ofer...@gmail.com] >> Sent: Wednesday, April 20, 2011 6:49 PM >> To: solr-user@lucene.apache.org >> Subject: Highest frequency terms for a subset of documents >> Hi, >> I am looking for the best way to find the terms with the highest frequency >> for a given subset of documents. (terms in the text field) >> My first thought was to do a count facet search , where the query defines >> the subset of documents and the facet.field is the text field, this gives >> me >> the result but it is very very slow. >> These are my params: >> <str name="facet">true</str> >> <str name="facet.offset">0</str> >> <str name="facet.mincount">3</str> >> <str name="indent">on</str> >> <str name="facet.limit">500</str> >> <str name="facet.method">enum</str> >> <str name="wt">xml</str> >> <str name="rows">0</str> >> <str name="version">2.2</str> >> <str name="facet.sort">count</str> >> <str name="q">in_subset:1</str> >> <str name="facet.field">text</str> >> </lst> >> >> The index contains 7M documents, the subset is about 200K. A simple query >> for the subset takes around 100ms, but the facet search takes 40s. >> >> Am i doing something wrong? >> >> If facet search is not the correct approach, i thought about using >> something >> like org.apache.lucene.misc.HighFreqTerms, but i'm not sure how to do this >> in solr. Should i implememt a request handler that executes this kind of >> code? >> >> thanks for any help >> > >