Another thing you can try is trunk.  This specific case has been
improved by an order of magnitude recenty.
The case that has been sped up is initial population of the
filterCache, or when the filterCache can't hold all of the unique
values, or when faceting is configured to not use the filterCache much
of the time via facet.enum.cache.minDf.

-Yonik
http://www.lucidimagination.com

On Thu, Dec 16, 2010 at 6:39 PM, Furkan Kuru <furkank...@gmail.com> wrote:
> I am sorry for raising up this thread after 6 months.
>
> But we have still problems with faceted search on full-text fields.
>
> We try to get most frequent words in a text field that is created in 1 hour.
> The faceted search takes too much time even the matching number of documents
> (created_at within 1 HOUR) is constant (10-20K) as the total number of
> documents increases (now 20M) the query gets slower. Solr throws exceptions
> and does not respond. We have to restart and delete old docs. (3G RAM) Index
> is around 2.2 GB.
> And we store the data in solr as well. The documents are small.
>
> $response = $solr->search('created_at:[NOW-'.$hours.'HOUR TO NOW]', 0, 1,
> array( 'facet' => 'true', 'facet.field'=> $field, 'facet.mincount' => 1,
> 'facet.method' => 'enum', 'facet.enum.cache.minDf' => 100 ));
>
> Yonik had suggested distributed search. But I am not sure if we set every
> configuration correctly. For example the solr caches if they are related
> with faceted searching.
>
> We use default values:
>
> <filterCache
>       class="solr.FastLRUCache"
>       size="512"
>       initialSize="512"
>       autowarmCount="0"/>
>
>
> <queryResultCache
>       class="solr.LRUCache"
>       size="512"
>       initialSize="512"
>       autowarmCount="0"/>
>
>
>
> Any help is appreciated.
>
>
>
> On Sun, Jun 6, 2010 at 8:54 PM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
>>
>> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru <furkank...@gmail.com> wrote:
>> > We try to provide real-time search. So the index is changing almost in
>> > every
>> > minute.
>> >
>> > We commit for every 100 documents received.
>> >
>> > The facet search is executed every 5 mins.
>>
>> OK, that's the problem - pretty much every facet search is rebuilding
>> the facet cache, which takes most of the time (and facet.fc is more
>> expensive than facet.enum in this regard).
>>
>> One strategy is to use distributed search... have some big cores that
>> don't change often, and then small cores for the new stuff that
>> changes rapidly.
>>
>> -Yonik
>> http://www.lucidimagination.com
>
>
>
> --
> Furkan Kuru
>

Reply via email to