Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Right, I haven't used SOLR-475 yet and am more familiar with
Bobo. I believe there are differences but I haven't gone into
them yet. As I'm using Solr 1.4 now, maybe I'll test the
UnInvertedField modality.

Feel free to report back results as I don't think I've seen much
yet?

On Thu, Aug 13, 2009 at 10:51 AM, Fuad Efendi wrote:
> SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to
> be); check this
> http://issues.apache.org/jira/browse/SOLR-475
> (and probably http://issues.apache.org/jira/browse/SOLR-711)
>
> -Original Message-
> From: Jason Rutherglen
>
> Yeah we need a performance comparison, I haven't had time to put
> one together. If/when I do I'll compare Bobo performance against
> Solr bitset intersection based facets, compare memory
> consumption.
>
> For near realtime Solr needs to cache and merge bitsets at the
> SegmentReader level, and Bobo needs to be upgraded to work with
> Lucene 2.9's searching at the segment level (currently it uses a
> MultiSearcher).
>
> Distributed search on either should be fairly straightforward?
>
> On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote:
>> It seems BOBO-Browse is alternate faceting engine; would be interesting to
>> compare performance with SOLR... Distributed?
>>
>>
>> -Original Message-
>> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
>> Sent: August-12-09 6:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: facet performance tips
>>
>> For your fields with many terms you may want to try Bobo
>> http://code.google.com/p/bobo-browse/ which could work well with your
>> case.
>>
>>
>>
>>
>>
>
>
>


RE: facet performance tips

2009-08-13 Thread Fuad Efendi
SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to
be); check this
http://issues.apache.org/jira/browse/SOLR-475
(and probably http://issues.apache.org/jira/browse/SOLR-711)

-Original Message-
From: Jason Rutherglen 

Yeah we need a performance comparison, I haven't had time to put
one together. If/when I do I'll compare Bobo performance against
Solr bitset intersection based facets, compare memory
consumption.

For near realtime Solr needs to cache and merge bitsets at the
SegmentReader level, and Bobo needs to be upgraded to work with
Lucene 2.9's searching at the segment level (currently it uses a
MultiSearcher).

Distributed search on either should be fairly straightforward?

On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote:
> It seems BOBO-Browse is alternate faceting engine; would be interesting to
> compare performance with SOLR... Distributed?
>
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: August-12-09 6:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> For your fields with many terms you may want to try Bobo
> http://code.google.com/p/bobo-browse/ which could work well with your
> case.
>
>
>
>
>




Re: facet performance tips

2009-08-13 Thread Jason Rutherglen
Yeah we need a performance comparison, I haven't had time to put
one together. If/when I do I'll compare Bobo performance against
Solr bitset intersection based facets, compare memory
consumption.

For near realtime Solr needs to cache and merge bitsets at the
SegmentReader level, and Bobo needs to be upgraded to work with
Lucene 2.9's searching at the segment level (currently it uses a
MultiSearcher).

Distributed search on either should be fairly straightforward?

On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendi wrote:
> It seems BOBO-Browse is alternate faceting engine; would be interesting to
> compare performance with SOLR... Distributed?
>
>
> -Original Message-
> From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
> Sent: August-12-09 6:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> For your fields with many terms you may want to try Bobo
> http://code.google.com/p/bobo-browse/ which could work well with your
> case.
>
>
>
>
>


RE: facet performance tips

2009-08-13 Thread Fuad Efendi
Interesting, it has "BoboRequestHandler implements SolrRequestHandler"
- easy to try it; and shards support



[Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be
interesting to
compare performance with SOLR... Distributed?


[Jason Rutherglen] For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.








RE: facet performance tips

2009-08-13 Thread Fuad Efendi
It seems BOBO-Browse is alternate faceting engine; would be interesting to
compare performance with SOLR... Distributed?


-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: August-12-09 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.






RE: facet performance tips

2009-08-13 Thread Fuad Efendi
I took 1.4 from trunk three days ago, it seems Ok for production (at least for 
my Master instance which is doing writes-only). I use the same config files.

500 000 terms are Ok too; I am using several millions with pre-1.3 SOLR taken 
from trunk.

However, do not try to "facet" (probably outdated term after SOLR-475) on 
generic queries such as [* TO *] (with huge resultset). For smaller query 
results (100,000 instead of 100,000,000) "counting terms" is fast enough (few 
milliseconds at http://www.tokenizer.org)

 

-Original Message-
From: Jérôme Etévé [mailto:jerome.et...@gmail.com] 
Sent: August-13-09 5:38 AM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

Thanks everyone for your advices.

I increased my filterCache, and the faceting performances improved greatly.

My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.

However, I'm planning to increase the number of terms to maybe around
500 000, so I guess this approach won't work anymore, as I doubt a 500
000 sized fieldCache would work.

So I guess my best move would be to upgrade to the soon to be 1.4
version of solr to benefit from its new faceting method.

I know this is a bit off-topic, but do you have a rough idea about
when 1.4 will be an official release?
As well, is the current trunk OK for production? Is it compatible with
1.3 configuration files?

Thanks !

Jerome.

2009/8/13 Stephen Duncan Jr :
> Note that depending on the profile of your field (full text and how many
> unique terms on average per document), the improvements from 1.4 may not
> apply, as you may exceed the limits of the new faceting technique in Solr
> 1.4.
> -Stephen
>
> On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher  wrote:
>
>> Yes, increasing the filterCache size will help with Solr 1.3 performance.
>>
>> Do note that trunk (soon Solr 1.4) has dramatically improved faceting
>> performance.
>>
>>Erik
>>
>>
>> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
>>
>>  Hi everyone,
>>>
>>>  I'm using some faceting on a solr index containing ~ 160K documents.
>>> I perform facets on multivalued string fields. The number of possible
>>> different values is quite large.
>>>
>>> Enabling facets degrades the performance by a factor 3.
>>>
>>> Because I'm using solr 1.3, I guess the facetting makes use of the
>>> filter cache to work. My filterCache is set
>>> to a size of 2048. I also noticed in my solr stats a very small ratio
>>> of cache hit (~ 0.01%).
>>>
>>> Can it be the reason why the faceting is slow? Does it make sense to
>>> increase the filterCache size so it matches more or less the number
>>> of different possible values for the faceted fields? Would that not
>>> make the memory usage explode?
>>>
>>> Thanks for your help !
>>>
>>> --
>>> Jerome Eteve.
>>>
>>> Chat with me live at http://www.eteve.net
>>>
>>> jer...@eteve.net
>>>
>>
>>
>
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net




Re: facet performance tips

2009-08-13 Thread Jérôme Etévé
Thanks everyone for your advices.

I increased my filterCache, and the faceting performances improved greatly.

My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.

However, I'm planning to increase the number of terms to maybe around
500 000, so I guess this approach won't work anymore, as I doubt a 500
000 sized fieldCache would work.

So I guess my best move would be to upgrade to the soon to be 1.4
version of solr to benefit from its new faceting method.

I know this is a bit off-topic, but do you have a rough idea about
when 1.4 will be an official release?
As well, is the current trunk OK for production? Is it compatible with
1.3 configuration files?

Thanks !

Jerome.

2009/8/13 Stephen Duncan Jr :
> Note that depending on the profile of your field (full text and how many
> unique terms on average per document), the improvements from 1.4 may not
> apply, as you may exceed the limits of the new faceting technique in Solr
> 1.4.
> -Stephen
>
> On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher  wrote:
>
>> Yes, increasing the filterCache size will help with Solr 1.3 performance.
>>
>> Do note that trunk (soon Solr 1.4) has dramatically improved faceting
>> performance.
>>
>>Erik
>>
>>
>> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
>>
>>  Hi everyone,
>>>
>>>  I'm using some faceting on a solr index containing ~ 160K documents.
>>> I perform facets on multivalued string fields. The number of possible
>>> different values is quite large.
>>>
>>> Enabling facets degrades the performance by a factor 3.
>>>
>>> Because I'm using solr 1.3, I guess the facetting makes use of the
>>> filter cache to work. My filterCache is set
>>> to a size of 2048. I also noticed in my solr stats a very small ratio
>>> of cache hit (~ 0.01%).
>>>
>>> Can it be the reason why the faceting is slow? Does it make sense to
>>> increase the filterCache size so it matches more or less the number
>>> of different possible values for the faceted fields? Would that not
>>> make the memory usage explode?
>>>
>>> Thanks for your help !
>>>
>>> --
>>> Jerome Eteve.
>>>
>>> Chat with me live at http://www.eteve.net
>>>
>>> jer...@eteve.net
>>>
>>
>>
>
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>



-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


Re: facet performance tips

2009-08-12 Thread Stephen Duncan Jr
Note that depending on the profile of your field (full text and how many
unique terms on average per document), the improvements from 1.4 may not
apply, as you may exceed the limits of the new faceting technique in Solr
1.4.
-Stephen

On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher  wrote:

> Yes, increasing the filterCache size will help with Solr 1.3 performance.
>
> Do note that trunk (soon Solr 1.4) has dramatically improved faceting
> performance.
>
>Erik
>
>
> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
>
>  Hi everyone,
>>
>>  I'm using some faceting on a solr index containing ~ 160K documents.
>> I perform facets on multivalued string fields. The number of possible
>> different values is quite large.
>>
>> Enabling facets degrades the performance by a factor 3.
>>
>> Because I'm using solr 1.3, I guess the facetting makes use of the
>> filter cache to work. My filterCache is set
>> to a size of 2048. I also noticed in my solr stats a very small ratio
>> of cache hit (~ 0.01%).
>>
>> Can it be the reason why the faceting is slow? Does it make sense to
>> increase the filterCache size so it matches more or less the number
>> of different possible values for the faceted fields? Would that not
>> make the memory usage explode?
>>
>> Thanks for your help !
>>
>> --
>> Jerome Eteve.
>>
>> Chat with me live at http://www.eteve.net
>>
>> jer...@eteve.net
>>
>
>


-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Re: facet performance tips

2009-08-12 Thread Jason Rutherglen
For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.

On Wed, Aug 12, 2009 at 12:02 PM, Fuad Efendi wrote:
> I am currently faceting on tokenized multi-valued field at
> http://www.tokenizer.org (25 mlns simple docs)
>
> It uses some home-made quick fixes similar to SOLR-475 (SOLR-711) and
> non-synchronized cache (similar to LingPipe's FastCache, SOLR-665, SOLR-667)
>
> Average "faceting" on query results: 0.2 - 0.3 seconds; without those
> patches - 20-50 seconds.
>
> I am going to upgrade to SOLR-1.4 from trunk (with SOLR-475 & SOLR-667) and
> to compare results...
>
>
>
>
> P.S.
> Avoid faceting on a field with heavy distribution of terms (such as few
> millions of terms in my case); It won't work in SOLR 1.3.
>
> TIP: use non-tokenized single-valued field for faceting, such as
> non-tokenized "country" field.
>
>
>
> P.P.S.
> Would be nice to load/stress
> http://alias-i.com/lingpipe/docs/api/com/aliasi/util/FastCache.html against
> putting CPU in a spin loop ConcurrentHashMap.
>
>
>
> -Original Message-
> From: Erik Hatcher [mailto:ehatc...@apache.org]
> Sent: August-12-09 2:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: facet performance tips
>
> Yes, increasing the filterCache size will help with Solr 1.3
> performance.
>
> Do note that trunk (soon Solr 1.4) has dramatically improved faceting
> performance.
>
>        Erik
>
> On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:
>
>> Hi everyone,
>>
>>  I'm using some faceting on a solr index containing ~ 160K documents.
>> I perform facets on multivalued string fields. The number of possible
>> different values is quite large.
>>
>> Enabling facets degrades the performance by a factor 3.
>>
>> Because I'm using solr 1.3, I guess the facetting makes use of the
>> filter cache to work. My filterCache is set
>> to a size of 2048. I also noticed in my solr stats a very small ratio
>> of cache hit (~ 0.01%).
>>
>> Can it be the reason why the faceting is slow? Does it make sense to
>> increase the filterCache size so it matches more or less the number
>> of different possible values for the faceted fields? Would that not
>> make the memory usage explode?
>>
>> Thanks for your help !
>>
>> --
>> Jerome Eteve.
>>
>> Chat with me live at http://www.eteve.net
>>
>> jer...@eteve.net
>
>
>
>


RE: facet performance tips

2009-08-12 Thread Fuad Efendi
I am currently faceting on tokenized multi-valued field at
http://www.tokenizer.org (25 mlns simple docs)

It uses some home-made quick fixes similar to SOLR-475 (SOLR-711) and
non-synchronized cache (similar to LingPipe's FastCache, SOLR-665, SOLR-667)

Average "faceting" on query results: 0.2 - 0.3 seconds; without those
patches - 20-50 seconds.

I am going to upgrade to SOLR-1.4 from trunk (with SOLR-475 & SOLR-667) and
to compare results...




P.S.
Avoid faceting on a field with heavy distribution of terms (such as few
millions of terms in my case); It won't work in SOLR 1.3.

TIP: use non-tokenized single-valued field for faceting, such as
non-tokenized "country" field.



P.P.S.
Would be nice to load/stress
http://alias-i.com/lingpipe/docs/api/com/aliasi/util/FastCache.html against
putting CPU in a spin loop ConcurrentHashMap.



-Original Message-
From: Erik Hatcher [mailto:ehatc...@apache.org] 
Sent: August-12-09 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

Yes, increasing the filterCache size will help with Solr 1.3  
performance.

Do note that trunk (soon Solr 1.4) has dramatically improved faceting  
performance.

Erik

On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:

> Hi everyone,
>
>  I'm using some faceting on a solr index containing ~ 160K documents.
> I perform facets on multivalued string fields. The number of possible
> different values is quite large.
>
> Enabling facets degrades the performance by a factor 3.
>
> Because I'm using solr 1.3, I guess the facetting makes use of the
> filter cache to work. My filterCache is set
> to a size of 2048. I also noticed in my solr stats a very small ratio
> of cache hit (~ 0.01%).
>
> Can it be the reason why the faceting is slow? Does it make sense to
> increase the filterCache size so it matches more or less the number
> of different possible values for the faceted fields? Would that not
> make the memory usage explode?
>
> Thanks for your help !
>
> -- 
> Jerome Eteve.
>
> Chat with me live at http://www.eteve.net
>
> jer...@eteve.net





Re: facet performance tips

2009-08-12 Thread Erik Hatcher
Yes, increasing the filterCache size will help with Solr 1.3  
performance.


Do note that trunk (soon Solr 1.4) has dramatically improved faceting  
performance.


Erik

On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:


Hi everyone,

 I'm using some faceting on a solr index containing ~ 160K documents.
I perform facets on multivalued string fields. The number of possible
different values is quite large.

Enabling facets degrades the performance by a factor 3.

Because I'm using solr 1.3, I guess the facetting makes use of the
filter cache to work. My filterCache is set
to a size of 2048. I also noticed in my solr stats a very small ratio
of cache hit (~ 0.01%).

Can it be the reason why the faceting is slow? Does it make sense to
increase the filterCache size so it matches more or less the number
of different possible values for the faceted fields? Would that not
make the memory usage explode?

Thanks for your help !

--
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net




RE: facet performance tips

2009-08-12 Thread Manepalli, Kalyan
Jerome,
Yes you need to increase the filterCache size to something close to 
unique number of facet elements. But also consider the RAM required to 
accommodate the increase. 
I did see a significant performance gain by increasing the filterCache size

Thanks,
Kalyan Manepalli

-Original Message-
From: Jérôme Etévé [mailto:jerome.et...@gmail.com] 
Sent: Wednesday, August 12, 2009 12:31 PM
To: solr-user@lucene.apache.org
Subject: facet performance tips

Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
I perform facets on multivalued string fields. The number of possible
different values is quite large.

Enabling facets degrades the performance by a factor 3.

Because I'm using solr 1.3, I guess the facetting makes use of the
filter cache to work. My filterCache is set
to a size of 2048. I also noticed in my solr stats a very small ratio
of cache hit (~ 0.01%).

Can it be the reason why the faceting is slow? Does it make sense to
increase the filterCache size so it matches more or less the number
of different possible values for the faceted fields? Would that not
make the memory usage explode?

Thanks for your help !

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net


facet performance tips

2009-08-12 Thread Jérôme Etévé
Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
I perform facets on multivalued string fields. The number of possible
different values is quite large.

Enabling facets degrades the performance by a factor 3.

Because I'm using solr 1.3, I guess the facetting makes use of the
filter cache to work. My filterCache is set
to a size of 2048. I also noticed in my solr stats a very small ratio
of cache hit (~ 0.01%).

Can it be the reason why the faceting is slow? Does it make sense to
increase the filterCache size so it matches more or less the number
of different possible values for the faceted fields? Would that not
make the memory usage explode?

Thanks for your help !

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net