I am sorry for raising up this thread after 6 months.

But we have still problems with faceted search on full-text fields.

We try to get most frequent words in a text field that is created in 1 hour.
The faceted search takes too much time even the matching number of documents
(created_at within 1 HOUR) is constant (10-20K) as the total number of
documents increases (now 20M) the query gets slower. Solr throws exceptions
and does not respond. We have to restart and delete old docs. (3G RAM) Index
is around 2.2 GB.
And we store the data in solr as well. The documents are small.

$response = $solr->search('created_at:[NOW-'.$hours.'HOUR TO NOW]', 0, 1,
array( 'facet' => 'true', 'facet.field'=> $field, 'facet.mincount' => 1,
'facet.method' => 'enum', 'facet.enum.cache.minDf' => 100 ));

Yonik had suggested distributed search. But I am not sure if we set every
configuration correctly. For example the solr caches if they are related
with faceted searching.

We use default values:

<filterCache
      class="solr.FastLRUCache"
      size="512"
      initialSize="512"
      autowarmCount="0"/>


<queryResultCache
      class="solr.LRUCache"
      size="512"
      initialSize="512"
      autowarmCount="0"/>



Any help is appreciated.



On Sun, Jun 6, 2010 at 8:54 PM, Yonik Seeley <yo...@lucidimagination.com>wrote:

> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru <furkank...@gmail.com> wrote:
> > We try to provide real-time search. So the index is changing almost in
> every
> > minute.
> >
> > We commit for every 100 documents received.
> >
> > The facet search is executed every 5 mins.
>
> OK, that's the problem - pretty much every facet search is rebuilding
> the facet cache, which takes most of the time (and facet.fc is more
> expensive than facet.enum in this regard).
>
> One strategy is to use distributed search... have some big cores that
> don't change often, and then small cores for the new stuff that
> changes rapidly.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru

Reply via email to