Re: Faceted Search Slows Down as index gets larger

2010-12-16 Thread Furkan Kuru
I am sorry for raising up this thread after 6 months.

But we have still problems with faceted search on full-text fields.

We try to get most frequent words in a text field that is created in 1 hour.
The faceted search takes too much time even the matching number of documents
(created_at within 1 HOUR) is constant (10-20K) as the total number of
documents increases (now 20M) the query gets slower. Solr throws exceptions
and does not respond. We have to restart and delete old docs. (3G RAM) Index
is around 2.2 GB.
And we store the data in solr as well. The documents are small.

$response = $solr->search('created_at:[NOW-'.$hours.'HOUR TO NOW]', 0, 1,
array( 'facet' => 'true', 'facet.field'=> $field, 'facet.mincount' => 1,
'facet.method' => 'enum', 'facet.enum.cache.minDf' => 100 ));

Yonik had suggested distributed search. But I am not sure if we set every
configuration correctly. For example the solr caches if they are related
with faceted searching.

We use default values:








Any help is appreciated.



On Sun, Jun 6, 2010 at 8:54 PM, Yonik Seeley wrote:

> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru  wrote:
> > We try to provide real-time search. So the index is changing almost in
> every
> > minute.
> >
> > We commit for every 100 documents received.
> >
> > The facet search is executed every 5 mins.
>
> OK, that's the problem - pretty much every facet search is rebuilding
> the facet cache, which takes most of the time (and facet.fc is more
> expensive than facet.enum in this regard).
>
> One strategy is to use distributed search... have some big cores that
> don't change often, and then small cores for the new stuff that
> changes rapidly.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru


Re: Faceted Search Slows Down as index gets larger

2010-06-06 Thread Furkan Kuru
Ok, I will have a look at distributed search, multi-core solr solution.

Thank you Yonik,

On Sun, Jun 6, 2010 at 8:54 PM, Yonik Seeley wrote:

> On Sun, Jun 6, 2010 at 1:12 PM, Furkan Kuru  wrote:
> > We try to provide real-time search. So the index is changing almost in
> every
> > minute.
> >
> > We commit for every 100 documents received.
> >
> > The facet search is executed every 5 mins.
>
> OK, that's the problem - pretty much every facet search is rebuilding
> the facet cache, which takes most of the time (and facet.fc is more
> expensive than facet.enum in this regard).
>
> One strategy is to use distributed search... have some big cores that
> don't change often, and then small cores for the new stuff that
> changes rapidly.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru


Re: Faceted Search Slows Down as index gets larger

2010-06-06 Thread Furkan Kuru
We try to provide real-time search. So the index is changing almost in every
minute.

We commit for every 100 documents received.

The facet search is executed every 5 mins.

Here is the stats result after facet search with normal facet.method=fc (it
took 95 seconds)

*name: * fieldValueCache   *class: * org.apache.solr.search.FastLRUCache   *
version: * 1.0   *description: * Concurrent LRU Cache(maxSize=1,
initialSize=10, minSize=9000, acceptableSize=9500, cleanupThread=false)   *
stats: * lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 34905
cumulative_hits : 2109
cumulative_hitratio : 0.06
cumulative_inserts : 16396
cumulative_evictions : 0


 *name: * filterCache   *class: * org.apache.solr.search.FastLRUCache   *
version: * 1.0   *description: * Concurrent LRU Cache(maxSize=512,
initialSize=512, minSize=460, acceptableSize=486, cleanupThread=false)   *
stats: * lookups : 0
hits : 0
hitratio : 0.00
inserts : 3
evictions : 0
size : 3
warmupTime : 0
cumulative_lookups : 24533601
cumulative_hits : 149859
cumulative_hitratio : 0.00
cumulative_inserts : 24501766
cumulative_evictions : 24036089


On Sun, Jun 6, 2010 at 3:27 PM, Yonik Seeley wrote:

> On Sun, Jun 6, 2010 at 7:38 AM, Furkan Kuru  wrote:
> > facet.limit = default value 100
> > facet.minCount is 1
> >
> > The document count that matches the query is 8-10K in average. I did not
> > calculate the terms (maybe using using facet.limit=-1 and
> facet.minCount=1)
> >
> > My index entirely fits into memory.
>
> How often is the index changing (how often are you committing).
> It takes time to build the UnInvertedField structure for the first
> facet request after the index changes.
>
> Also, with the normal facet.method=fc, after you run it, go to the
> statistics page and look for the whole entry for fieldValueCache (and
> cut'n'paste it here).
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru


Re: Faceted Search Slows Down as index gets larger

2010-06-06 Thread Furkan Kuru
facet.limit = default value 100
facet.minCount is 1

The document count that matches the query is 8-10K in average. I did not
calculate the terms (maybe using using facet.limit=-1 and facet.minCount=1)

My index entirely fits into memory.



On Sun, Jun 6, 2010 at 5:10 AM, Andy  wrote:

> This is strange.
>
> 1M unique facet terms and 10 terms per document -- sounds like this use
> case is exactly where fc would be faster. But your results  were the exact
> opposite.
>
> What value for facet.limit did you set?
>
> Was your 80/30 seconds query time spent mostly on returning the facet
> counts of all 1M of facet terms, or did you limit the number of facet terms
> returned to a small number?
>
> Also did your entire index fit within RAM?
>
>
> --- On Sat, 6/5/10, Furkan Kuru  wrote:
>
> > From: Furkan Kuru 
> > Subject: Re: Faceted Search Slows Down as index gets larger
> > To: solr-user@lucene.apache.org, yo...@lucidimagination.com
> > Date: Saturday, June 5, 2010, 8:40 AM
> > The documents full-text fields are
> > 140 chars length (tweets).
> >
> > Actually I had looked at those parameters and thought no
> > change was
> > neccessary because the terms per document would be few and
> > the unique term
> > count was nearly 1 M. I don't know exactly but average term
> > count per
> > document text can be 10 in my case.
> >
> > I think I still do not get why facet.method=enum is
> > faster.
> >
> >
> > On Sat, Jun 5, 2010 at 5:00 AM, Yonik Seeley  >wrote:
> >
> > > On Fri, Jun 4, 2010 at 7:33 PM, Andy 
> > wrote:
> > > > Yonik,
> > > >
> > > > Just curious why does using enum improve the
> > facet performance.
> > > >
> > > > Furkan was faceting on a text field with each
> > word being a facet value.
> > > I'd imagine that'd mean there's a large number of
> > facet values. According to
> > > the documentation (
> > > http://wiki.apache.org/solr/SimpleFacetParameters#facet.method)
> > > facet.method=fc is faster when a field has many unique
> > terms. So how come
> > > enum, not fc, is faster in this case?
> > >
> > > facet.method=fc is faster when there are many unique
> > terms, and
> > > relatively few terms per document.  A full-text
> > field doesn't fit that
> > > bill.
> > >
> > > > Also why use filterCache less?
> > >
> > > Take sup a lot of memory.
> > >
> > > -Yonik
> > > http://www.lucidimagination.com
> > >
> >
> >
> >
> > --
> > Furkan Kuru
> >
>
>
>
>


-- 
Furkan Kuru


Re: Faceted Search Slows Down as index gets larger

2010-06-05 Thread Furkan Kuru
The documents full-text fields are 140 chars length (tweets).

Actually I had looked at those parameters and thought no change was
neccessary because the terms per document would be few and the unique term
count was nearly 1 M. I don't know exactly but average term count per
document text can be 10 in my case.

I think I still do not get why facet.method=enum is faster.


On Sat, Jun 5, 2010 at 5:00 AM, Yonik Seeley wrote:

> On Fri, Jun 4, 2010 at 7:33 PM, Andy  wrote:
> > Yonik,
> >
> > Just curious why does using enum improve the facet performance.
> >
> > Furkan was faceting on a text field with each word being a facet value.
> I'd imagine that'd mean there's a large number of facet values. According to
> the documentation (
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.method)
> facet.method=fc is faster when a field has many unique terms. So how come
> enum, not fc, is faster in this case?
>
> facet.method=fc is faster when there are many unique terms, and
> relatively few terms per document.  A full-text field doesn't fit that
> bill.
>
> > Also why use filterCache less?
>
> Take sup a lot of memory.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Furkan Kuru


Re: Faceted Search Slows Down as index gets larger

2010-06-04 Thread Furkan Kuru
I am using 1.4 version.

I have tried your suggestion,

it takes around 25-30 seconds now.

Thank you,


On Fri, Jun 4, 2010 at 5:54 PM, Yonik Seeley wrote:

> Faceting on a full-text field is hard.
> What version of Solr are you using?
>
> If it's 1.4 or later, try setting
> facet.method=enum
>
> And to use the filterCache less, try
> facet.enum.cache.minDf=100
>
> -Yonik
> http://www.lucidimagination.com
>
> On Fri, Jun 4, 2010 at 10:31 AM, Furkan Kuru  wrote:
> > Hello,
> >
> > I have been dealing with real-time data.
> >
> > As the number of total indexed documents gets larger (now 5 M)
> >
> > a faceted search on a text field limited by the creation time, which we
> use
> > to find the most used word in all these text fields, gets slow down.
> >
> >
> > query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
> > facet.mincount=1
> >
> > the document count matching the query is around 9000.
> >
> >
> > It takes around 80 seconds in a decent computer with 4GB ram, quad core
> cpu
> >
> > I do not know the internal details of term indexing and their counts for
> > faceting.
> >
> > Any suggestion for speeding up this query is appreciated.
> >
> > Thanks in advance.
> >
> > --
> > Furkan Kuru
> >
>



-- 
Furkan Kuru


Faceted Search Slows Down as index gets larger

2010-06-04 Thread Furkan Kuru
Hello,

I have been dealing with real-time data.

As the number of total indexed documents gets larger (now 5 M)

a faceted search on a text field limited by the creation time, which we use
to find the most used word in all these text fields, gets slow down.


query string: created_time:[NOW-1HOUR TO NOW] facet.field=text
facet.mincount=1

the document count matching the query is around 9000.


It takes around 80 seconds in a decent computer with 4GB ram, quad core cpu

I do not know the internal details of term indexing and their counts for
faceting.

Any suggestion for speeding up this query is appreciated.

Thanks in advance.

-- 
Furkan Kuru


Re: facet order

2010-05-29 Thread Furkan Kuru
use: facet.sort=true


http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort


On Sat, May 29, 2010 at 3:53 PM,  wrote:

> Hi,
>
> how can i configuratively order facets according to total count of facet
> fields?
>
> for example - facets with the highest count be on top.
>
> facet1 [0]
> abc (20)
> def (18)
> ghi (16)
>
> facet2 [1]
> jkl (10)
> mno (9)
> pqr (2)
>
> thanks
>
> dev.
>
>


-- 
Furkan Kuru


getting documents sorted after a faceted search

2010-05-24 Thread Furkan Kuru
I apply a faceted search and  get document ids from the facet_field I have
used.

Then I search for these documents given ids. id:(id1 id2 ...)

But the order is not predictable. (It applies OR)

I do not want to sort documents again.

Is there any way to get documents in the given id order?


-- 
Furkan Kuru