subject:"Facet Performance"

Hi Yonik,
We are using Solr 6.5
Both studentId and grades are double:
  

We have 1.5 million records.

Thanks
Mikhail

-Original Message-
From: Yonik Seeley [mailto:ysee...@gmail.com] 
Sent: Sunday, April 30, 2017 1:04 PM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

It is odd there would be quite such a big performance delta.
What version of solr are you using?
What is the fieldType of "grades"?
-Yonik


On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem <mikhail.ibrah...@oracle.com> 
wrote:
> 1-
> studentId has docValue = true . it is of type double which is 
>  stored="true" docValues="true" multiValued="false" required="false"/>
>
>
> 2- If we just facet without aggregation it finishes in good time 60ms:
>
> json.facet={
>studentId:{
>   type:terms,
>   limit:-1,
>   field:" studentId "
>
>}
> }
>
>
> Thanks
>
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 10:44 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facet performance for aggregations
>
> Please enable doc values and try.
> There is a bug in the source code which causes json facet on string field to 
> run very slow. On numeric fields it runs fine with doc value enabled.
>
> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" 
> <mikhail.ibrah...@oracle.com>
> wrote:
>
>> Hi Vijay,
>> It is already numeric field.
>> It is huge difference between json and flat here. Do you know the 
>> reason for this? Is there a way to improve it ?
>>
>> -Original Message-
>> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
>> Sent: Sunday, April 30, 2017 9:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> Json facet on string fields run lot slower than on numeric fields. 
>> Try and see if you can represent studentid as a numeric field.
>>
>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>> <mikhail.ibrah...@oracle.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to do aggregation with JSON faceting but performance is 
>> > very bad for one of the requests:
>> >
>> > json.facet={
>> >
>> >studentId:{
>> >
>> >   type:terms,
>> >
>> >   limit:-1,
>> >
>> >   field:"studentId",
>> >
>> >   facet:{
>> >
>> >   x:"sum(grades)"
>> >
>> >       }
>> >
>> >}
>> >
>> > }
>> >
>> >
>> >
>> > This request finishes in 250 seconds, and we can't paginate for 
>> > this service for functional reason so we have to use limit:-1, and 
>> > the cardinality of the studentId is 7500.
>> >
>> >
>> >
>> > If I try the same with flat facet it finishes in 3 seconds :
>> > stats=true=true={!tag=piv1
>> > sum=true}grades={!stats=piv1}studentId
>> >
>> >
>> >
>> > We are hoping to use one approach json or flat for all our services.
>> > JSON facet performance is better for many case.
>> >
>> >
>> >
>> > Please advise on why the performance for this is so bad and if we 
>> > can improve it. Also what is the default algorithm used for json facet.
>> >
>> >
>> >
>> > Thanks
>> >
>> > Mikhail
>> >
>>

Re: JSON facet performance for aggregations

2017-04-30 Thread Yonik Seeley

It is odd there would be quite such a big performance delta.
What version of solr are you using?
What is the fieldType of "grades"?
-Yonik


On Sun, Apr 30, 2017 at 5:15 AM, Mikhail Ibraheem
<mikhail.ibrah...@oracle.com> wrote:
> 1-
> studentId has docValue = true . it is of type double which is  name="double" class="solr.TrieDoubleField" indexed="false" stored="true" 
> docValues="true" multiValued="false" required="false"/>
>
>
> 2- If we just facet without aggregation it finishes in good time 60ms:
>
> json.facet={
>studentId:{
>   type:terms,
>   limit:-1,
>   field:" studentId "
>
>}
> }
>
>
> Thanks
>
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 10:44 AM
> To: solr-user@lucene.apache.org
> Subject: RE: JSON facet performance for aggregations
>
> Please enable doc values and try.
> There is a bug in the source code which causes json facet on string field to 
> run very slow. On numeric fields it runs fine with doc value enabled.
>
> On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" <mikhail.ibrah...@oracle.com>
> wrote:
>
>> Hi Vijay,
>> It is already numeric field.
>> It is huge difference between json and flat here. Do you know the
>> reason for this? Is there a way to improve it ?
>>
>> -Original Message-
>> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
>> Sent: Sunday, April 30, 2017 9:58 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: JSON facet performance for aggregations
>>
>> Json facet on string fields run lot slower than on numeric fields. Try
>> and see if you can represent studentid as a numeric field.
>>
>> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem"
>> <mikhail.ibrah...@oracle.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I am trying to do aggregation with JSON faceting but performance is
>> > very bad for one of the requests:
>> >
>> > json.facet={
>> >
>> >studentId:{
>> >
>> >   type:terms,
>> >
>> >   limit:-1,
>> >
>> >   field:"studentId",
>> >
>> >   facet:{
>> >
>> >   x:"sum(grades)"
>> >
>> >   }
>> >
>> >}
>> >
>> > }
>> >
>> >
>> >
>> > This request finishes in 250 seconds, and we can't paginate for this
>> > service for functional reason so we have to use limit:-1, and the
>> > cardinality of the studentId is 7500.
>> >
>> >
>> >
>> > If I try the same with flat facet it finishes in 3 seconds :
>> > stats=true=true={!tag=piv1
>> > sum=true}grades={!stats=piv1}studentId
>> >
>> >
>> >
>> > We are hoping to use one approach json or flat for all our services.
>> > JSON facet performance is better for many case.
>> >
>> >
>> >
>> > Please advise on why the performance for this is so bad and if we
>> > can improve it. Also what is the default algorithm used for json facet.
>> >
>> >
>> >
>> > Thanks
>> >
>> > Mikhail
>> >
>>

RE: JSON facet performance for aggregations

1- 
studentId has docValue = true . it is of type double which is 


2- If we just facet without aggregation it finishes in good time 60ms:

json.facet={  
   studentId:{  
  type:terms,
  limit:-1,
  field:" studentId "
  
   }
}


Thanks


-Original Message-
From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com] 
Sent: Sunday, April 30, 2017 10:44 AM
To: solr-user@lucene.apache.org
Subject: RE: JSON facet performance for aggregations

Please enable doc values and try.
There is a bug in the source code which causes json facet on string field to 
run very slow. On numeric fields it runs fine with doc value enabled.

On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" <mikhail.ibrah...@oracle.com>
wrote:

> Hi Vijay,
> It is already numeric field.
> It is huge difference between json and flat here. Do you know the 
> reason for this? Is there a way to improve it ?
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 9:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> Json facet on string fields run lot slower than on numeric fields. Try 
> and see if you can represent studentid as a numeric field.
>
> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" 
> <mikhail.ibrah...@oracle.com>
> wrote:
>
> > Hi,
> >
> > I am trying to do aggregation with JSON faceting but performance is 
> > very bad for one of the requests:
> >
> > json.facet={
> >
> >studentId:{
> >
> >   type:terms,
> >
> >   limit:-1,
> >
> >   field:"studentId",
> >
> >   facet:{
> >
> >   x:"sum(grades)"
> >
> >   }
> >
> >}
> >
> > }
> >
> >
> >
> > This request finishes in 250 seconds, and we can't paginate for this 
> > service for functional reason so we have to use limit:-1, and the 
> > cardinality of the studentId is 7500.
> >
> >
> >
> > If I try the same with flat facet it finishes in 3 seconds :
> > stats=true=true={!tag=piv1
> > sum=true}grades={!stats=piv1}studentId
> >
> >
> >
> > We are hoping to use one approach json or flat for all our services.
> > JSON facet performance is better for many case.
> >
> >
> >
> > Please advise on why the performance for this is so bad and if we 
> > can improve it. Also what is the default algorithm used for json facet.
> >
> >
> >
> > Thanks
> >
> > Mikhail
> >
>

RE: JSON facet performance for aggregations

2017-04-30 Thread Vijay Tiwary

Please enable doc values and try.
There is a bug in the source code which causes json facet on string field
to run very slow. On numeric fields it runs fine with doc value enabled.

On Apr 30, 2017 1:41 PM, "Mikhail Ibraheem" <mikhail.ibrah...@oracle.com>
wrote:

> Hi Vijay,
> It is already numeric field.
> It is huge difference between json and flat here. Do you know the reason
> for this? Is there a way to improve it ?
>
> -Original Message-
> From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com]
> Sent: Sunday, April 30, 2017 9:58 AM
> To: solr-user@lucene.apache.org
> Subject: Re: JSON facet performance for aggregations
>
> Json facet on string fields run lot slower than on numeric fields. Try and
> see if you can represent studentid as a numeric field.
>
> On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" <mikhail.ibrah...@oracle.com>
> wrote:
>
> > Hi,
> >
> > I am trying to do aggregation with JSON faceting but performance is
> > very bad for one of the requests:
> >
> > json.facet={
> >
> >studentId:{
> >
> >   type:terms,
> >
> >   limit:-1,
> >
> >   field:"studentId",
> >
> >   facet:{
> >
> >   x:"sum(grades)"
> >
> >   }
> >
> >}
> >
> > }
> >
> >
> >
> > This request finishes in 250 seconds, and we can't paginate for this
> > service for functional reason so we have to use limit:-1, and the
> > cardinality of the studentId is 7500.
> >
> >
> >
> > If I try the same with flat facet it finishes in 3 seconds :
> > stats=true=true={!tag=piv1
> > sum=true}grades={!stats=piv1}studentId
> >
> >
> >
> > We are hoping to use one approach json or flat for all our services.
> > JSON facet performance is better for many case.
> >
> >
> >
> > Please advise on why the performance for this is so bad and if we can
> > improve it. Also what is the default algorithm used for json facet.
> >
> >
> >
> > Thanks
> >
> > Mikhail
> >
>

RE: JSON facet performance for aggregations

Hi Vijay,
It is already numeric field.
It is huge difference between json and flat here. Do you know the reason for 
this? Is there a way to improve it ?

-Original Message-
From: Vijay Tiwary [mailto:vijaykr.tiw...@gmail.com] 
Sent: Sunday, April 30, 2017 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

Json facet on string fields run lot slower than on numeric fields. Try and see 
if you can represent studentid as a numeric field.

On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" <mikhail.ibrah...@oracle.com>
wrote:

> Hi,
>
> I am trying to do aggregation with JSON faceting but performance is 
> very bad for one of the requests:
>
> json.facet={
>
>studentId:{
>
>   type:terms,
>
>   limit:-1,
>
>   field:"studentId",
>
>   facet:{
>
>   x:"sum(grades)"
>
>   }
>
>}
>
> }
>
>
>
> This request finishes in 250 seconds, and we can't paginate for this 
> service for functional reason so we have to use limit:-1, and the 
> cardinality of the studentId is 7500.
>
>
>
> If I try the same with flat facet it finishes in 3 seconds :
> stats=true=true={!tag=piv1
> sum=true}grades={!stats=piv1}studentId
>
>
>
> We are hoping to use one approach json or flat for all our services. 
> JSON facet performance is better for many case.
>
>
>
> Please advise on why the performance for this is so bad and if we can 
> improve it. Also what is the default algorithm used for json facet.
>
>
>
> Thanks
>
> Mikhail
>

Re: JSON facet performance for aggregations

2017-04-30 Thread Vijay Tiwary

Json facet on string fields run lot slower than on numeric fields. Try and
see if you can represent studentid as a numeric field.

On Apr 30, 2017 1:19 PM, "Mikhail Ibraheem" <mikhail.ibrah...@oracle.com>
wrote:

> Hi,
>
> I am trying to do aggregation with JSON faceting but performance is very
> bad for one of the requests:
>
> json.facet={
>
>studentId:{
>
>   type:terms,
>
>   limit:-1,
>
>   field:"studentId",
>
>   facet:{
>
>   x:"sum(grades)"
>
>   }
>
>}
>
> }
>
>
>
> This request finishes in 250 seconds, and we can't paginate for this
> service for functional reason so we have to use limit:-1, and the
> cardinality of the studentId is 7500.
>
>
>
> If I try the same with flat facet it finishes in 3 seconds :
> stats=true=true={!tag=piv1
> sum=true}grades={!stats=piv1}studentId
>
>
>
> We are hoping to use one approach json or flat for all our services. JSON
> facet performance is better for many case.
>
>
>
> Please advise on why the performance for this is so bad and if we can
> improve it. Also what is the default algorithm used for json facet.
>
>
>
> Thanks
>
> Mikhail
>

JSON facet performance for aggregations

2014-05-27 Thread Alice.H.Yang (mis.cnsh04.Newegg) 41493

Hi,

I am trying to do aggregation with JSON faceting but performance is very bad 
for one of the requests:

json.facet={  

   studentId:{  

  type:terms,

  limit:-1,

  field:"studentId",

  facet:{

  x:"sum(grades)"

  }

   }

}

 

This request finishes in 250 seconds, and we can't paginate for this service 
for functional reason so we have to use limit:-1, and the cardinality of the 
studentId is 7500.

 

If I try the same with flat facet it finishes in 3 seconds :  
stats=true=true={!tag=piv1 
sum=true}grades={!stats=piv1}studentId

 

We are hoping to use one approach json or flat for all our services. JSON facet 
performance is better for many case.

 

Please advise on why the performance for this is so bad and if we can improve 
it. Also what is the default algorithm used for json facet.

 

Thanks

Mikhail

Re: prefix facet performance

2017-04-24 Thread Yonik Seeley

In SimpleFacets.getFacetTermEnumCounts, we seek to the first term
matching the prefix using the index and then for each term after
compare the prefix until it no longer matches.

-Yonik


On Mon, Apr 24, 2017 at 5:04 AM, alessandro.benedetti
<a.benede...@sease.io> wrote:
> Thanks Yonik and Maria.
> It make sense, if we reduce the number of terms, term enum becomes a very
> good solution.
> @Yonik : do we still check the prefix on the term dictionary one by one, or
> an FST is used to identify the set of candidate terms ?
>
> I will check the code later,
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/prefix-facet-performance-tp4330684p4331553.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: prefix facet performance

2017-04-24 Thread alessandro.benedetti

Thanks Yonik and Maria.
It make sense, if we reduce the number of terms, term enum becomes a very
good solution.
@Yonik : do we still check the prefix on the term dictionary one by one, or
an FST is used to identify the set of candidate terms ?

I will check the code later,

Regards



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/prefix-facet-performance-tp4330684p4331553.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: prefix facet performance

2017-04-21 Thread Maria Muslea

I see. Once I specify a prefix the number of terms is MUCH smaller.

Thank you again for all your help.

Maria

On Fri, Apr 21, 2017 at 1:46 PM, Yonik Seeley  wrote:

> On Fri, Apr 21, 2017 at 4:25 PM, Maria Muslea 
> wrote:
> > The field is:
> >
> > 
> >
> > and using unique() I found that it has 700K+ unique values.
> >
> > The query before (that takes ~10s):
> >
> > wt=json=true=*:*=0=true=
> concept=A/
> >
> > the query after (that is almost instant):
> >
> > wt=json=true=*:*=0=true=
> concept=A/=enum'
>
> Ah, the fact that you specify a facet.prefix makes this perfectly
> aligned for the "enum" method, which can skip directly to the first
> term on-or-after "A/"
> facet.method=enum goes term-by-term, calculating the intersection with
> the facet domain.
> In this case, it's the number of terms that start with "A/" that
> matters, not the number of terms in the entire field (hence the
> speedup).
>
> -Yonik
>

Re: prefix facet performance

2017-04-21 Thread Yonik Seeley

On Fri, Apr 21, 2017 at 4:25 PM, Maria Muslea  wrote:
> The field is:
>
> 
>
> and using unique() I found that it has 700K+ unique values.
>
> The query before (that takes ~10s):
>
> wt=json=true=*:*=0=true=concept=A/
>
> the query after (that is almost instant):
>
> wt=json=true=*:*=0=true=concept=A/=enum'

Ah, the fact that you specify a facet.prefix makes this perfectly
aligned for the "enum" method, which can skip directly to the first
term on-or-after "A/"
facet.method=enum goes term-by-term, calculating the intersection with
the facet domain.
In this case, it's the number of terms that start with "A/" that
matters, not the number of terms in the entire field (hence the
speedup).

-Yonik

Re: prefix facet performance

2017-04-21 Thread Maria Muslea

The field is:



and using unique() I found that it has 700K+ unique values.

The query before (that takes ~10s):

wt=json=true=*:*=0=true=concept=A/

the query after (that is almost instant):

wt=json=true=*:*=0=true=concept=A/=enum'

Maria

On Fri, Apr 21, 2017 at 8:59 AM, alessandro.benedetti <a.benede...@sease.io>
wrote:

> That is quite interesting !
> You can use the stats module ( in association with the Json facets if you
> need it) to calculate an accurate approximation of the unique values [1]
> [2]
> .
>
> Good to know it improved your scenario, I may need to update my knowledge
> of
> term enum internals!
> Can you describe your schema configuration for the field and the way you
> were faceting before in comparison to the way you facet now ( with the
> related benefit)
>
> [1] https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
> [2] http://yonik.com/solr-count-distinct/
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/prefix-facet-performance-tp4330684p4331309.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: prefix facet performance

2017-04-21 Thread alessandro.benedetti

That is quite interesting !
You can use the stats module ( in association with the Json facets if you
need it) to calculate an accurate approximation of the unique values [1] [2]
.

Good to know it improved your scenario, I may need to update my knowledge of
term enum internals!
Can you describe your schema configuration for the field and the way you
were faceting before in comparison to the way you facet now ( with the
related benefit)

[1] https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
[2] http://yonik.com/solr-count-distinct/



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/prefix-facet-performance-tp4330684p4331309.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: prefix facet performance

2017-04-21 Thread Maria Muslea

Actually using facet.method=enum made a HUGE difference even in my case
where I have many unique values. I am happy with the query response time
now.

Is there a way in SOLR to count the unique values for a field? If not, I
could run the reindexing and count the unique values while I add them to
give you a more accurate count of how many I have (there is a good chance
that I have more than 500K).

Thanks,
Maria

On Fri, Apr 21, 2017 at 1:16 AM, alessandro.benedetti <a.benede...@sease.io>
wrote:

> Hi Maria,
> If you have 100-500.000 unique values for the field you are interested in,
> and the cardinality of your search results is actually quite small in
> comparison, I am not that sure term enum will help you that much ...
>
> To simplify, with the term enum approach, you iterate over each unique
> value, if it matches the prefix and then you count the intersection of the
> result set with the posting list for that term.
> In your case, your result set is likely to be much smaller than the number
> of unique values.
> I would assume you are using the fc approach, which in my opinion was not a
> bad idea.
> Let's start from the algorithm you are using and the schema config for your
> field,
>
> Cheers
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/prefix-facet-performance-tp4330684p4331221.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: prefix facet performance

2017-04-21 Thread alessandro.benedetti

Hi Maria,
If you have 100-500.000 unique values for the field you are interested in,
and the cardinality of your search results is actually quite small in
comparison, I am not that sure term enum will help you that much ...

To simplify, with the term enum approach, you iterate over each unique
value, if it matches the prefix and then you count the intersection of the
result set with the posting list for that term.
In your case, your result set is likely to be much smaller than the number
of unique values.
I would assume you are using the fc approach, which in my opinion was not a
bad idea.
Let's start from the algorithm you are using and the schema config for your
field,

Cheers



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
View this message in context: 
http://lucene.472066.n3.nabble.com/prefix-facet-performance-tp4330684p4331221.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: prefix facet performance

2017-04-18 Thread Maria Muslea

Hmmm, not sure. Probably in the range of 100K-500K.

Before writing the email I was just looking at:
http://yonik.com/facet-performance/

Wow, using facet.method=enum makes a big difference. I will read on it to
understand what it does.

Thank you so much.

Maria

On Tue, Apr 18, 2017 at 5:21 PM, Yonik Seeley <ysee...@gmail.com> wrote:

> How many unique values in the index?
> You could try facet.method=enum
>
> -Yonik
>
>
> On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea <maria.mus...@gmail.com>
> wrote:
> > Hi,
> >
> > I have ~40K documents in SOLR (not many) and a multivalued facet field
> that
> > contains at least 2K values per document.
> >
> > The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc,
> and
> > I use facet.prefix.
> >
> > q=*:*=0=true=concept=A/
> >
> >
> > with "concept" defined as:
> >
> >
> > 
> >
> >
> > This generates the output that I am looking for, but it takes more than
> 10
> > seconds per query.
> >
> >
> > Is there any way that I could improve the facet query performance for
> this
> > example?
> >
> >
> > Thank you,
> >
> > Maria
>

Re: prefix facet performance

2017-04-18 Thread Yonik Seeley

How many unique values in the index?
You could try facet.method=enum

-Yonik


On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea  wrote:
> Hi,
>
> I have ~40K documents in SOLR (not many) and a multivalued facet field that
> contains at least 2K values per document.
>
> The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc, and
> I use facet.prefix.
>
> q=*:*=0=true=concept=A/
>
>
> with "concept" defined as:
>
>
> 
>
>
> This generates the output that I am looking for, but it takes more than 10
> seconds per query.
>
>
> Is there any way that I could improve the facet query performance for this
> example?
>
>
> Thank you,
>
> Maria

prefix facet performance

2017-04-18 Thread Maria Muslea

Hi,

I have ~40K documents in SOLR (not many) and a multivalued facet field that
contains at least 2K values per document.

The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc, and
I use facet.prefix.

q=*:*=0=true=concept=A/


with "concept" defined as:





This generates the output that I am looking for, but it takes more than 10
seconds per query.


Is there any way that I could improve the facet query performance for this
example?


Thank you,

Maria

Re: 5.4 facet performance thumbs-up

2015-12-23 Thread Yonik Seeley

Awesome, thanks for the feedback!

-Yonik

On Tue, Dec 22, 2015 at 5:36 PM, Aigner, Max  wrote:
> I'm happy to report that we are seeing significant speed-ups in our queries 
> with Json facets on 5.4 vs regular facets on 5.1. Our queries contain mostly 
> terms facets, many of them with exclusion tags and prefix filtering.
> Nice work!

5.4 facet performance thumbs-up

2015-12-22 Thread Aigner, Max

I'm happy to report that we are seeing significant speed-ups in our queries 
with Json facets on 5.4 vs regular facets on 5.1. Our queries contain mostly 
terms facets, many of them with exclusion tags and prefix filtering.
Nice work!

答复: (Issue) How improve solr facet performance

Hi, Token

1.
I set the 3 fields with hundreds of values uses fc and the rest uses
enum, the performance is improved 2 times compared with no parameter, and then
I add facet.method=20 , the performance is improved about 4 times compared with
no parameter.
And I also tried setting 9 facet field to one copyfield, I test the
performance, it is improved about 2.5 times compared with no parameter.
So, It is improved a lot under your advice, thanks a lot.
2.
Now I have another performance issue, It's the group performance. The
number of data is as same as facet performance scenario.
When the keyword search hits about one million documents, the QTime is about
600ms.(It doesn't query the first time, it's in cache)

Query url:
select?fl=item_catalogq=default_search:paramterdefType=edismaxrows=50group=truegroup.field=item_group_idgroup.ngroups=truegroup.sort=stock4sort%20desc,final_price%20asc,is_selleritem%20ascsort=score%20desc,default_sort%20desc

It need Qtime about 600ms.

This query have two parameter:
1. fl one field
2. group=true,
group.ngroups=true

If I set group=false,, the QTime is only 1 ms.
But I need do group and group.ngroups, How can I improve the group performance
under this demand. Do you have some advice for me. I'm looking forward to your
reply.

Best Regards,
Alice Yang
+86-021-51530666*41493
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

-邮件原件-
发件人: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
发送时间: 2014年5月24日 15:17
收件人: solr-user@lucene.apache.org
主题: RE: (Issue) How improve solr facet performance

Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
1. I'm sorry, I have made a mistake, the total number of documents is 32
Million, not 320 Million.
2. The system memory is large for solr index, OS total has 256G, I set the
solr tomcat HEAPSIZE=-Xms25G -Xmx100G

100G is a very high number. What special requirements dictates such a large
heap size?

Reply: 9 fields I facet on.

Solr treats each facet separately and with facet.method=fc and 10M hits, this
means that it will iterate 9*10M = 90M document IDs and update the counters for
those.

Reply: 3 facet fields have one hundred unique values, other 6 facet fields'
unique values are between 3 to 15.

So very low cardinality. This is confirmed by your low response time of 6ms for
2925 hits.

And we test this scenario: If the number of facet fields' unique values is
less we add facet.method=enum, there is a little to improve performance.

That is a shame: enum is normally the simple answer to a setup like yours. Have
you tried fine-tuning your fc/enum selection, so that the 3 fields with
hundreds of values uses fc and the rest uses enum? That might halve your
response time.

Since the number of unique facets is so low, I do not think that DocValues can
help you here. Besides the fine-grained fc/enum-selection above, you could try
collapsing all 9 facet-fields into a single field. The idea behind this is that
for facet.method=fc, performing faceting on a field with (for example) 300
unique values takes practically the same amount of time as faceting on a field
with 1000 unique values: Faceting on a single slightly larger field is much
faster than faceting on 9 smaller fields. After faceting with facet.limit=-1 on
the single super-facet-field, you must match the returned values back to their
original fields:

If you have the facet-fields

field0: 34
field1: 187
field2: 78432
field3: 3
...

then collapse them by or-ing a field-specific mask that is bigger than the max
in any field, then put it all into a single field:

fieldAll: 0xA000 | 34
fieldAll: 0xA100 | 187
fieldAll: 0xA200 | 78432
fieldAll: 0xA300 | 3
...

perform the facet request on fieldAll with facet.limit=-1 and split the
resulting counts with

for (entry: facetResultAll) {
switch (0xFF00 entry.value) {
case 0xA000:
field0.add(entry.value, entry.count);
break;
case 0xA100:
field1.add(entry.value, entry.count);
break;
...
}
}

Regards,
Toke Eskildsen, State and University Library, Denmark

Re: 答复: (Issue) How improve solr facet performance

2014-05-27 Thread david.w.smi...@gmail.com

Alice,

RE grouping, try Solr 4.8’s new “collapse” qparser w/ “expand
SearchComponent. The ref guide has the docs. It’s usually a faster
equivalent approach to group=true

Do you care to comment further on NewEgg’s apparent switch from Endeca to
Solr? (confirm true/false and rationale)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Tue, May 27, 2014 at 4:17 AM, Alice.H.Yang (mis.cnsh04.Newegg) 41493
alice.h.y...@newegg.com wrote:

Hi, Token

1.
I set the 3 fields with hundreds of values uses fc and the rest
uses enum, the performance is improved 2 times compared with no parameter,
and then I add facet.method=20 , the performance is improved about 4 times
compared with no parameter.
And I also tried setting 9 facet field to one copyfield, I test
the performance, it is improved about 2.5 times compared with no parameter.
So, It is improved a lot under your advice, thanks a lot.
2.
Now I have another performance issue, It's the group performance.
The number of data is as same as facet performance scenario.
When the keyword search hits about one million documents, the QTime is
about 600ms.(It doesn't query the first time, it's in cache)

Query url:

select?fl=item_catalogq=default_search:paramterdefType=edismaxrows=50group=truegroup.field=item_group_idgroup.ngroups=truegroup.sort=stock4sort%20desc,final_price%20asc,is_selleritem%20ascsort=score%20desc,default_sort%20desc

It need Qtime about 600ms.

This query have two parameter:
1. fl one field
2. group=true,
group.ngroups=true

If I set group=false,, the QTime is only 1 ms.
But I need do group and group.ngroups, How can I improve the group
performance under this demand. Do you have some advice for me. I'm looking
forward to your reply.

Best Regards,
Alice Yang
+86-021-51530666*41493
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
1. I'm sorry, I have made a mistake, the total number of documents is
32 Million, not 320 Million.
2. The system memory is large for solr index, OS total has 256G, I set
the solr tomcat HEAPSIZE=-Xms25G -Xmx100G

100G is a very high number. What special requirements dictates such a
large heap size?

Reply: 9 fields I facet on.

Solr treats each facet separately and with facet.method=fc and 10M hits,
this means that it will iterate 9*10M = 90M document IDs and update the
counters for those.

Reply: 3 facet fields have one hundred unique values, other 6 facet
fields' unique values are between 3 to 15.

So very low cardinality. This is confirmed by your low response time of
6ms for 2925 hits.

And we test this scenario: If the number of facet fields' unique values
is less we add facet.method=enum, there is a little to improve performance.

That is a shame: enum is normally the simple answer to a setup like yours.
Have you tried fine-tuning your fc/enum selection, so that the 3 fields
with hundreds of values uses fc and the rest uses enum? That might halve
your response time.

Since the number of unique facets is so low, I do not think that DocValues
can help you here. Besides the fine-grained fc/enum-selection above, you
could try collapsing all 9 facet-fields into a single field. The idea
behind this is that for facet.method=fc, performing faceting on a field
with (for example) 300 unique values takes practically the same amount of
time as faceting on a field with 1000 unique values: Faceting on a single
slightly larger field is much faster than faceting on 9 smaller fields.
After faceting with facet.limit=-1 on the single super-facet-field, you
must match the returned values back to their original fields:

If you have the facet-fields

field0: 34
field1: 187
field2: 78432
field3: 3
...

then collapse them by or-ing a field-specific mask that is bigger than the
max in any field, then put it all into a single field:

fieldAll: 0xA000 | 34
fieldAll: 0xA100 | 187
fieldAll: 0xA200 | 78432
fieldAll: 0xA300 | 3
...

perform the facet request on fieldAll with facet.limit=-1 and split the
resulting counts with

for (entry: facetResultAll) {
switch (0xFF00 entry.value) {
case 0xA000:
field0.add(entry.value, entry.count);
break;
case 0xA100:
field1.add(entry.value, entry.count);
break;
...
}
}

Regards,
Toke Eskildsen, State and University Library, Denmark

RE: (Issue) How improve solr facet performance

2014-05-24 Thread Toke Eskildsen

Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
 1.  I'm sorry, I have made a mistake, the total number of documents is 32 
 Million, not 320 Million.
 2.  The system memory is large for solr index, OS total has 256G, I set the 
 solr tomcat HEAPSIZE=-Xms25G -Xmx100G

100G is a very high number. What special requirements dictates such a large 
heap size?

 Reply:  9 fields I facet on.

Solr treats each facet separately and with facet.method=fc and 10M hits, this 
means that it will iterate 9*10M = 90M document IDs and update the counters for 
those.

 Reply:  3 facet fields have one hundred unique values, other 6 facet fields' 
 unique values are between 3 to 15.

So very low cardinality. This is confirmed by your low response time of 6ms for 
2925 hits.

 And we test this scenario:  If the number of facet fields' unique values is 
 less we add facet.method=enum, there is a little to improve performance.

That is a shame: enum is normally the simple answer to a setup like yours. Have 
you tried fine-tuning your fc/enum selection, so that the 3 fields with 
hundreds of values uses fc and the rest uses enum? That might halve your 
response time.


Since the number of unique facets is so low, I do not think that DocValues can 
help you here. Besides the fine-grained fc/enum-selection above, you could try 
collapsing all 9 facet-fields into a single field. The idea behind this is that 
for facet.method=fc, performing faceting on a field with (for example) 300 
unique values takes practically the same amount of time as faceting on a field 
with 1000 unique values: Faceting on a single slightly larger field is much 
faster than faceting on 9 smaller fields. After faceting with facet.limit=-1 on 
the single super-facet-field, you must match the returned values back to their 
original fields:


If you have the facet-fields

field0: 34
field1: 187
field2: 78432
field3: 3
...

then collapse them by or-ing a field-specific mask that is bigger than the max 
in any field, then put it all into a single field:

fieldAll: 0xA000 | 34
fieldAll: 0xA100 | 187
fieldAll: 0xA200 | 78432
fieldAll: 0xA300 | 3
...

perform the facet request on fieldAll with facet.limit=-1 and split the 
resulting counts with

for (entry: facetResultAll) {
  switch (0xFF00  entry.value) {
case 0xA000:
  field0.add(entry.value, entry.count);
  break;
case 0xA100:
  field1.add(entry.value, entry.count);
  break;
...
  }
}


Regards,
Toke Eskildsen, State and University Library, Denmark

fw: (Issue) How improve solr facet performance

2014-05-23 Thread Alice.H.Yang (mis.cnsh04.Newegg) 41493

Hi, Solr Developer

  Thanks very much for your timely reply.

1.  I'm sorry, I have made a mistake, the total number of documents is 32 
Million, not 320 Million.
2.  The system memory is large for solr index, OS total has 256G, I set the 
solr tomcat HEAPSIZE=-Xms25G -Xmx100G

-How many fields are you faceting on?

Reply:  9 fields I facet on.

- How many unique values does your facet fields have (approximately)?

Reply:  3 facet fields have one hundred unique values, other 6 facet fields' 
unique values are between 3 to 15. 


- What is the content of your facets (Strings, numbers?)

Reply:  9 fields are all numbers.

- Which facet.method do you use?

Reply:  Used the default facet.method=fc

And we test this scenario:  If the number of facet fields' unique values is 
less we add facet.method=enum, there is a little to improve performance.

- What is the response time with faceting and a few thousand hits?

Reply:   result name=response numFound=2925 start=0  
   QTime is  int name=QTime6/int 


Best Regards,
Alice Yang
+86-021-51530666*41493
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Friday, May 23, 2014 8:08 PM
To: d...@lucene.apache.org
Subject: Re: (Issue) How improve solr facet performance

On Fri, 2014-05-23 at 11:45 +0200, Alice.H.Yang (mis.cnsh04.Newegg)
41493 wrote:
We are blocked by solr facet performance when query hits many 
 documents. (about 10,000,000)

[320M documents, immediate response for plain search with 1M hits]

 But when we add several facet.field to do facet ,QTime  increaseto 
 220ms or more.

It is not clear whether your observation of increased response time is due to 
many hits or faceting in itself.

- How many fields are you faceting on?
- How many unique values does your facet fields have (approximately)?
- What is the content of your facets (Strings, numbers?)
- Which facet.method do you use?
- What is the response time with faceting and a few thousand hits?

 Do you have some advice on how improve the facet performance when hit 
 many documents.

That depends on whether your bottleneck is the hitcount itself, the number of 
unique facet values or something third like I/O.


- Toke Eskildsen, State and University Library, Denmark



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
commands, e-mail: dev-h...@lucene.apache.org

RE: Facet performance

2013-10-23 Thread Toke Eskildsen

On Tue, 2013-10-22 at 17:25 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 On Tue, October 22, 2013 11:54 AM Andre Bois-Crettez wrote:
  This is with Solr 1.4.
 Really ?
 This sound really outdated to me.
 Have you tried a tried more recent version, 4.5 just went out ?
 
 Sorry, can't.  Too much `grown' stuff.

I did not see that. I guess I parsed it as 4.1.

Well, that rules out DocValues and fcs (as far as I remember). I am a
bit surprised that the limit on #terms with fc is also in 1.4. I thought
it was introduced in a later version.

We too has been in a position where upgrading was hard due to homegrown
addons. We even scrapped some DidYouMean-like functionality when going
from 3.x to 4.x, but 4.x was so much better that there were little
choice.

Last suggestion for using fc: Create 2 or more CONTENT-fields and choose
between them randomly when indexing. Facet on all the CONTENT fields and
merge the results. It will take a bit more RAM though, so it is still
out on your (assumedly) 32 bit machine.

Regards,
Toke Eskildsen, State and University Library, Denmark

RE: Facet performance

2013-10-23 Thread Lemke, Michael SZ/HZA-ZSW

On Tue, October 22, 2013 5:23 PM Michael Lemke wrote:
On Tue, October 22, 2013 9:23 AM Toke Eskildsen wrote:
On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 QTime fc:
never returns, webserver restarts itself after 30 min with 100% CPU 
 load

It might be because it dies due to garbage collection. But since more
memory (as your test server presumably has) just leads to the too many
values-error, there isn't much to do.

Essentially, fc is out then.


 QTime=41205  facet.prefix=q=frequent_word  
 numFound=44532
 
 Same query repeated:
 QTime=225810 facet.prefix=q=ottomotor  
 numFound=909
 QTime=199839 facet.prefix=q=ottomotor  
 numFound=909

I am stumped on this, sorry. I do not understand why the 'ottomotor'
query can take 5 times as long as the 'frequent_word'-one.

I looked into this some more this morning.  I noticed the java process was 
doing
a lot of I/O as shown in Process Explorer.  For the frequent_word it read 
about 
180MB, for ottomotor is was about seven times as much, ~ 1,200 MB.


Got another observation today.  The response time for q=ottomotor depends on 
facet.limit:

QTime=59300  facet.limit=2
QTime=69395  facet.limit=4
QTime=85208  facet.limit=6
QTime=158150 facet.limit=8
QTime=186276 facet.limit=10
QTime=231763 facet.limit=15
QTime=260437 facet.limit=20
QTime=312268 facet.limit=30

For q=frequent_word the result is much less pronounced and shows only
for facet.limit = 15 :

QTime=0  facet.limit=0
QTime=20535  facet.limit=1
QTime=13456  facet.limit=2
QTime=13925  facet.limit=4
QTime=13705  facet.limit=6
QTime=13924  facet.limit=8
QTime=13799  facet.limit=10
QTime=14361  facet.limit=15
QTime=14704  facet.limit=20
QTime=15189  facet.limit=30
QTime=16783  facet.limit=50
QTime=57128  facet.limit=500

Looks to me for solr to collect enough facets to fulfill the limit constraint
it has to read much more of the index in the case of the infrequent word.

jconsole didn't show anything unusual according to our more experienced Java 
experts here.  Nor was the machine swapping.

Is it possible to screw up an index such that this sort of faceting leads to
constant reading of the index?  Something like full table scans in a db?


Michael

RE: Facet performance

2013-10-22 Thread Toke Eskildsen

On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 QTime enum:
  1st call: 1200
  subsequent calls: 200

Those numbers seems fine.

 QTime fc:
never returns, webserver restarts itself after 30 min with 100% CPU 
 load

It might be because it dies due to garbage collection. But since more
memory (as your test server presumably has) just leads to the too many
values-error, there isn't much to do.

 QTime=41205  facet.prefix=q=frequent_word  
 numFound=44532
 
 Same query repeated:
 QTime=225810 facet.prefix=q=ottomotor  
 numFound=909
 QTime=199839 facet.prefix=q=ottomotor  
 numFound=909

I am stumped on this, sorry. I do not understand why the 'ottomotor'
query can take 5 times as long as the 'frequent_word'-one.

 QTime=185948 facet.prefix=q=ottomotor  
 numFound=909
 
 QTime=3344   facet.prefix=d   q=ottomotor  
 numFound=909

Fits with expectations.

 - Documents in your index
 13,434,414
 
 - Unique values in the CONTENT field
 Not sure how to get this.  In luke I find
 21,797,514 term count CONTENT

Those are the relevant numbers for faceting. There is a limit of 2^24
(16M) terms for facet.method=enum, although I am a bit unsure if that is
for the whole index or per segment.

Come to think of it, if you have a multi-segmented index, you might want
to try facet.method.fcs. It should have faster startup than fc and
better performance than enum for fields with a large number of unique
values. Memory requirements should be between fc and enum.

 - Xmx
 The maximum the system allows me to get: 1612m
 
 Maybe I have a hopelessly under-dimensioned server for this sort of things?

Well, 1612m should be enough for the faceting in itself; it it the
startup that is the killer. 

A rule of thumb for fc is that the internal structure takes at least
#docs*log(#references) + #references*log(#unique_values) bytes

If your content field is a description, let's say that each description
has 40 words, which gives us 500M references from documents to facet
values. This translates to
13M*log(500M) + 500M*log(22M) bytes ~= 13M*29 + 500M*25 bytes ~= 380MB.

Taking into account that building the structure has an overhead of 2-3
times that, we are approaching the memory limit of 1612m. If the index
is updated, a new facet structure is build all over again while the old
structure is still in memory.


If you need better performance on your large field I would suggest, in
order of priority:

- facet.method=fcs
- facet.method=fcs with DocValues
- Shard your index and use facet.method=fc
- SOLR-2412 (https://issues.apache.org/jira/browse/SOLR-2412)

SOLR-2412 is a last resort, but it does have the same speed as
facet.method=fc only without the 16M unique values limitation.

Regards,
Toke Eskildsen, State and University Library, Denmark

Re: Facet performance

2013-10-22 Thread Andre Bois-Crettez


This is with Solr 1.4.

Really ?
This sound really outdated to me.
Have you tried a tried more recent version, 4.5 just went out ?

--
André Bois-Crettez

Software Architect
Search Developer
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.

RE: Facet performance

2013-10-22 Thread Lemke, Michael SZ/HZA-ZSW

On Tue, October 22, 2013 9:23 AM Toke Eskildsen wrote:
On Mon, 2013-10-21 at 16:57 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 QTime fc:
never returns, webserver restarts itself after 30 min with 100% CPU 
 load

It might be because it dies due to garbage collection. But since more
memory (as your test server presumably has) just leads to the too many
values-error, there isn't much to do.

Essentially, fc is out then.


 QTime=41205  facet.prefix=q=frequent_word  
 numFound=44532
 
 Same query repeated:
 QTime=225810 facet.prefix=q=ottomotor  
 numFound=909
 QTime=199839 facet.prefix=q=ottomotor  
 numFound=909

I am stumped on this, sorry. I do not understand why the 'ottomotor'
query can take 5 times as long as the 'frequent_word'-one.

I looked into this some more this morning.  I noticed the java process was doing
a lot of I/O as shown in Process Explorer.  For the frequent_word it read about 
180MB, for ottomotor is was about seven times as much, ~ 1,200 MB.

jconsole didn’t show anything unusual according to our more experienced Java 
experts here.  Nor was the machine swapping.

Is it possible to screw up an index such that this sort of faceting leads to
constant reading of the index?  Something like full table scans in a db?

Michael

RE: Facet performance

2013-10-22 Thread Lemke, Michael SZ/HZA-ZSW

On Tue, October 22, 2013 11:54 AM Andre Bois-Crettez wrote:

 This is with Solr 1.4.
Really ?
This sound really outdated to me.
Have you tried a tried more recent version, 4.5 just went out ?

Sorry, can't.  Too much `grown' stuff.

Michael

RE: Facet performance

2013-10-21 Thread Toke Eskildsen

On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
  Unfortunately the enum-solution is normally quite slow when there
  are enough unique values to trigger the too many  values-exception.
  [...]
 
 [...] And yes, the fc method was terribly slow in a case where it did
 work.  Something like 20 minutes whereas enum returned within a few
 seconds.

Err.. What? That sounds _very_ strange. You have millions of unique
values so fc should be a lot faster than enum, not the other way around.

I assume the 20 minutes was for the first call. How fast does subsequent
calls return for fc?


Maybe you could provide some approximate numbers?

- Documents in your index
- Unique values in the CONTENT field
- Hits are returned from a typical query
- Xmx

Regards,
Toke Eskildsen, State and University Library, Denmark

RE: Facet performance

2013-10-21 Thread Lemke, Michael SZ/HZA-ZSW

On Mon, October 21, 2013 10:04 AM, Toke Eskildsen wrote:
On Fri, 2013-10-18 at 18:30 +0200, Lemke, Michael SZ/HZA-ZSW wrote:
 Toke Eskildsen wrote:
  Unfortunately the enum-solution is normally quite slow when there
  are enough unique values to trigger the too many  values-exception.
  [...]
 
 [...] And yes, the fc method was terribly slow in a case where it did
 work.  Something like 20 minutes whereas enum returned within a few
 seconds.

Err.. What? That sounds _very_ strange. You have millions of unique
values so fc should be a lot faster than enum, not the other way around.

I assume the 20 minutes was for the first call. How fast does subsequent
calls return for fc?

QTime enum:
 1st call: 1200
 subsequent calls: 200

QTime fc:
   never returns, webserver restarts itself after 30 min with 100% CPU load


This is on the test system, the production system managed to return with
... Too many values for UnInvertedField faceting 

However, I also have different faceting queries I played with today.

One complete example:

q=ottomotorfacet.field=CONTENTfacet=truefacet.prefix=facet.limit=10facet.mincount=1facet.method=enumrows=0

These are the results, all with facet.method=enum (fc doesn't work).  They
were executed in the sequence shown on an otherwise unused server:

QTime=41205  facet.prefix=q=frequent_word  
numFound=44532

Same query repeated:
QTime=225810 facet.prefix=q=ottomotor  
numFound=909
QTime=199839 facet.prefix=q=ottomotor  
numFound=909

QTime=0  facet.prefix=q=ottomotor jkdhwjfh 
numFound=0
QTime=0  facet.prefix=q=jkdhwjfh   
numFound=0

QTime=185948 facet.prefix=q=ottomotor  
numFound=909

QTime=3344   facet.prefix=d   q=ottomotor  
numFound=909
QTime=3078   facet.prefix=d   q=ottomotor  
numFound=909
QTime=3141   facet.prefix=d   q=ottomotor  
numFound=909

The response time is obviously not dependent on the number of documents found.
Caching doesn't kick in either.



Maybe you could provide some approximate numbers?

I'll try, see below.  Thanks for asking and having a closer look.


- Documents in your index
13,434,414

- Unique values in the CONTENT field
Not sure how to get this.  In luke I find
21,797,514 term count CONTENT

Is that what you mean?

- Hits are returned from a typical query
Hm, that can be anything between 0 and 40,000 or more.
Or do you mean from the facets?  Or do my tests above
answer it?

- Xmx
The maximum the system allows me to get: 1612m


Maybe I have a hopelessly under-dimensioned server for this sort of things?

Thanks a lot for your help,
Michael

Facet performance

2013-10-18 Thread Lemke, Michael SZ/HZA-ZSW

I am working with Solr facet fields and come across a 
performance problem I don't understand. Consider these 
two queries:

1. 
q=wordfacet.field=CONTENTfacet=truefacet.prefix=facet.limit=10facet.mincount=1facet.method=enumrows=0

2. 
q=wordfacet.field=CONTENTfacet=truefacet.prefix=afacet.limit=10facet.mincount=1facet.method=enumrows=0

The only difference is am empty facet.prefix in the first query.

The first query returns after some 20 seconds (QTime 2 in the result) while 
the second one takes only 80 msec (QTime 80). Why is this?

And as side note: facet.method=fc makes the queries run 'forever' and 
eventually 
fail with org.apache.solr.common.SolrException: Too many values for 
UnInvertedField faceting on field CONTENT.

This is with Solr 1.4.

RE: Facet performance

2013-10-18 Thread Toke Eskildsen

Lemke, Michael  SZ/HZA-ZSW [lemke...@schaeffler.com] wrote:
 1. 
 q=wordfacet.field=CONTENTfacet=truefacet.prefix=facet.limit=10facet.mincount=1facet.method=enumrows=0
 2. 
 q=wordfacet.field=CONTENTfacet=truefacet.prefix=afacet.limit=10facet.mincount=1facet.method=enumrows=0

 The only difference is am empty facet.prefix in the first query.

 The first query returns after some 20 seconds (QTime 2 in the result) 
 while
 the second one takes only 80 msec (QTime 80). Why is this?

If you index was just opened when you issued your queries, the first request 
will be notably slower than the second as the facet values might not be in the 
disk cache.

Furthermore, for enum the difference between no prefix and some prefix is huge. 
As enum iterates values first (as opposed to fc that iterates hits first), 
limiting to only the values that starts with 'a' ought to speed up retrieval by 
a factor 10 or more.

 And as side note: facet.method=fc makes the queries run 'forever' and 
 eventually
 fail with org.apache.solr.common.SolrException: Too many values for 
 UnInvertedField faceting on field CONTENT.

An internal memory structure optimization in Solr limits the amount of possible 
unique values when using fc. It is not a bug as such, but more a consequence of 
a choice. Unfortunately the enum-solution is normally quite slow when there are 
enough unique values to trigger the too many values-exception. I know too 
little about the structures for DocValues to say if they will help here, but 
you might want to take a look at those.

- Toke Eskildsen

RE: Facet performance

2013-10-18 Thread Lemke, Michael SZ/HZA-ZSW

Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote:
1.
q=wordfacet.field=CONTENTfacet=truefacet.prefix=facet.limit=10facet.mincount=1facet.method=enumrows=0
2.
q=wordfacet.field=CONTENTfacet=truefacet.prefix=afacet.limit=10facet.mincount=1facet.method=enumrows=0

The only difference is am empty facet.prefix in the first query.

The first query returns after some 20 seconds (QTime 2 in the result)
while
the second one takes only 80 msec (QTime 80). Why is this?

If you index was just opened when you issued your queries, the first request
will be notably slower than the second as the facet values might not be in
the disk cache.

I know but it shouldn't be orders of magnitudes as in this example, should it?

Furthermore, for enum the difference between no prefix and some prefix is
huge. As enum iterates values first (as opposed to fc that iterates hits
first), limiting to only the values that starts with 'a' ought to speed up
retrieval by a factor 10 or more.

Thanks. That is what we sort of figured but it's good to know for sure. Of
course it begs the question if there is a way to speed this up?

And as side note: facet.method=fc makes the queries run 'forever' and
eventually
fail with org.apache.solr.common.SolrException: Too many values for
UnInvertedField faceting on field CONTENT.

An internal memory structure optimization in Solr limits the amount of
possible unique values when using fc. It is not a bug as such, but more a
consequence of a choice. Unfortunately the enum-solution is normally quite
slow when there are enough unique values to trigger the too many
values-exception. I know too little about the structures for DocValues to say
if they will help here, but you might want to take a look at those.

What is DocValues? Haven't heard of it yet. And yes, the fc method was
terribly slow in a case where it did work. Something like 20 minutes whereas
enum returned within a few seconds.

Michael

Re: Facet performance

2013-10-18 Thread Otis Gospodnetic

DocValues is the new black
http://wiki.apache.org/solr/DocValues

Otis
--
Solr ElasticSearch Support -- http://sematext.com/
SOLR Performance Monitoring -- http://sematext.com/spm

On Fri, Oct 18, 2013 at 12:30 PM, Lemke, Michael SZ/HZA-ZSW
lemke...@schaeffler.com wrote:
Toke Eskildsen [mailto:t...@statsbiblioteket.dk] wrote:
Lemke, Michael SZ/HZA-ZSW [lemke...@schaeffler.com] wrote:
1.
q=wordfacet.field=CONTENTfacet=truefacet.prefix=facet.limit=10facet.mincount=1facet.method=enumrows=0
2.
q=wordfacet.field=CONTENTfacet=truefacet.prefix=afacet.limit=10facet.mincount=1facet.method=enumrows=0

The only difference is am empty facet.prefix in the first query.

The first query returns after some 20 seconds (QTime 2 in the result)
while
the second one takes only 80 msec (QTime 80). Why is this?

If you index was just opened when you issued your queries, the first request
will be notably slower than the second as the facet values might not be in
the disk cache.

I know but it shouldn't be orders of magnitudes as in this example, should it?

Thanks. That is what we sort of figured but it's good to know for sure. Of
course it begs the question if there is a way to speed this up?

And as side note: facet.method=fc makes the queries run 'forever' and
eventually
fail with org.apache.solr.common.SolrException: Too many values for
UnInvertedField faceting on field CONTENT.

An internal memory structure optimization in Solr limits the amount of
possible unique values when using fc. It is not a bug as such, but more a
consequence of a choice. Unfortunately the enum-solution is normally quite
slow when there are enough unique values to trigger the too many
values-exception. I know too little about the structures for DocValues to
say if they will help here, but you might want to take a look at those.

What is DocValues? Haven't heard of it yet. And yes, the fc method was
terribly slow in a case where it did work. Something like 20 minutes whereas
enum returned within a few seconds.

Michael

RE: Facet performance

2013-10-18 Thread Chris Hostetter


:  1. 
q=wordfacet.field=CONTENTfacet=truefacet.prefix=facet.limit=10facet.mincount=1facet.method=enumrows=0
:  2. 
q=wordfacet.field=CONTENTfacet=truefacet.prefix=afacet.limit=10facet.mincount=1facet.method=enumrows=0
: 
:  The only difference is am empty facet.prefix in the first query.

: If you index was just opened when you issued your queries, the first 
: request will be notably slower than the second as the facet values might 
: not be in the disk cache.
: 
: I know but it shouldn't be orders of magnitudes as in this example, should it?

in and of itself: it can be if your index is large enough and none of the 
disk pages are in the file system buffer.

more significantly however, is that depending on how big your filterCache 
is, the first request could eaisly be caching all of filters needed for 
the second query -- at a minimum it's definitely caching your main query 
which will be re-used and save a lot of time independent of hte faceting.


-Hoss

Re: Multivalued fields and facet performance

2011-01-10 Thread Otis Gospodnetic

Hi Howard,

This is normal.  Your first query is reading a bunch of index data from disk 
and 
your RAM is then caching it.  If your first query involves sorting, some more 
data for FieldCache is being read and stored.  If there are multiple sort 
fields, one such thing for each.  If facets are involves, more of that stuff.  
If you are optimizing your index you are likely to be forcing more disk IO

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Howard Lee how...@workdigital.co.uk
 To: solr-user@lucene.apache.org
 Sent: Mon, January 10, 2011 8:59:03 AM
 Subject: Multivalued fields and facet performance
 
 Hi,
 
 I'd appreciate some explanation on what may be going on in the  following
 scenario using multivalued fields and facets.
 
 Solr version:  1.5
 
 Our index contains 35 million docs, and our search is using 2  multivalued
 fields as facets. There are approx 5 million different values in  one field
 and 5000 in the other. We are seeing the following, and I'm curious  as what
 is actually happening in the background.
 
 The first search can  take up to 5 minutes, all subsequent queries of any q
 return in under a  second. This is fine unless you are the first search or
 new  searcher.
 
 I plan on adding a first searcher and new searcher in the  config to avoid
 long delays every time the index is updated (once a day) but  I have concerns
 of the length of the delay in launching a new searcher, and  whether this is
 causing too much overhead.
 
 Can someone explain to me  what processes are going on in the backgroud that
 cause  this behaviour  so I can understand the implications or make some
 adjustments in the config  to compensate.
 
 thanx
 
 Howard

Re: Multivalued fields and facet performance

2011-01-10 Thread Howard Lee

Otis,
The reason I ask is that I run a number of sites on Solr, some with 10
million+ docs faceting on similar types of data, and have not seen anywhere
near this length of initial delay. The main difference is that these sites
facet on single value fields rather that multivalued and that this site is
searching on 3 times the volume of data. Would switching to single valued
(I'd rather not) make much of a  difference.

I've also noticed that multivalued fields aren't populating the lucene field
cache. Is this the correct behaviour.

Regards

Howard

On 10 January 2011 14:55, Otis Gospodnetic otis_gospodne...@yahoo.comwrote:

 Hi Howard,

 This is normal.  Your first query is reading a bunch of index data from
 disk and
 your RAM is then caching it.  If your first query involves sorting, some
 more
 data for FieldCache is being read and stored.  If there are multiple sort
 fields, one such thing for each.  If facets are involves, more of that
 stuff.
 If you are optimizing your index you are likely to be forcing more disk
 IO

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Howard Lee how...@workdigital.co.uk
  To: solr-user@lucene.apache.org
  Sent: Mon, January 10, 2011 8:59:03 AM
  Subject: Multivalued fields and facet performance
 
  Hi,
 
  I'd appreciate some explanation on what may be going on in the  following
  scenario using multivalued fields and facets.
 
  Solr version:  1.5
 
  Our index contains 35 million docs, and our search is using 2
  multivalued
  fields as facets. There are approx 5 million different values in  one
 field
  and 5000 in the other. We are seeing the following, and I'm curious  as
 what
  is actually happening in the background.
 
  The first search can  take up to 5 minutes, all subsequent queries of any
 q
  return in under a  second. This is fine unless you are the first search
 or
  new  searcher.
 
  I plan on adding a first searcher and new searcher in the  config to
 avoid
  long delays every time the index is updated (once a day) but  I have
 concerns
  of the length of the delay in launching a new searcher, and  whether this
 is
  causing too much overhead.
 
  Can someone explain to me  what processes are going on in the backgroud
 that
  cause  this behaviour  so I can understand the implications or make some
  adjustments in the config  to compensate.
 
  thanx
 
  Howard
 




-- 
WORKDIGITAL LTD
workdigital.co.uk
32-34 Broadwick Street
W1A 2HG London, UK

Howard Lee
CEO

M  +44(0)7931 476 766
E  how...@workdigital.co.uk

workhound.co.uk - salarytrack.co.uk - twitterjobsearch.com -
dreamjobalert.co.uk - recruitmentadnetwork.com

facet performance when number of values is large

2010-03-03 Thread Andy

I have a facet field whose values are created by users. So potentially there 
could be a very large number of values. is that going to be a problem 
performance-wise?

A few more questions to help me understand how facet works:
- after the filter cache warmed up, will the (if any) performance problems 
caused by large number of facet values go away?
I thought that would be the case but according to the benchmark here: 
http://wiki.apache.org/solr/HierarchicalFaceting
SOLR-64 still had very poor performance even after the filter caches are warmed 

- In the wiki it was stated that facet.method=fc is excellent for situations 
where the number of indexed values for the field is high. Would that be the 
solution?

Re: facet performance tips

2009-08-13 Thread Jérôme Etévé

Thanks everyone for your advices.

I increased my filterCache, and the faceting performances improved greatly.

My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.

However, I'm planning to increase the number of terms to maybe around
500 000, so I guess this approach won't work anymore, as I doubt a 500
000 sized fieldCache would work.

So I guess my best move would be to upgrade to the soon to be 1.4
version of solr to benefit from its new faceting method.

I know this is a bit off-topic, but do you have a rough idea about
when 1.4 will be an official release?
As well, is the current trunk OK for production? Is it compatible with
1.3 configuration files?

Thanks !

Jerome.

2009/8/13 Stephen Duncan Jr stephen.dun...@gmail.com:
 Note that depending on the profile of your field (full text and how many
 unique terms on average per document), the improvements from 1.4 may not
 apply, as you may exceed the limits of the new faceting technique in Solr
 1.4.
 -Stephen

 On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher ehatc...@apache.org wrote:

 Yes, increasing the filterCache size will help with Solr 1.3 performance.

 Do note that trunk (soon Solr 1.4) has dramatically improved faceting
 performance.

Erik


 On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:

  Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
 I perform facets on multivalued string fields. The number of possible
 different values is quite large.

 Enabling facets degrades the performance by a factor 3.

 Because I'm using solr 1.3, I guess the facetting makes use of the
 filter cache to work. My filterCache is set
 to a size of 2048. I also noticed in my solr stats a very small ratio
 of cache hit (~ 0.01%).

 Can it be the reason why the faceting is slow? Does it make sense to
 increase the filterCache size so it matches more or less the number
 of different possible values for the faceted fields? Would that not
 make the memory usage explode?

 Thanks for your help !

 --
 Jerome Eteve.

 Chat with me live at http://www.eteve.net

 jer...@eteve.net





 --
 Stephen Duncan Jr
 www.stephenduncanjr.com




-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

RE: facet performance tips

I took 1.4 from trunk three days ago, it seems Ok for production (at least for 
my Master instance which is doing writes-only). I use the same config files.

500 000 terms are Ok too; I am using several millions with pre-1.3 SOLR taken 
from trunk.

However, do not try to facet (probably outdated term after SOLR-475) on 
generic queries such as [* TO *] (with huge resultset). For smaller query 
results (100,000 instead of 100,000,000) counting terms is fast enough (few 
milliseconds at http://www.tokenizer.org)

 

-Original Message-
From: Jérôme Etévé [mailto:jerome.et...@gmail.com] 
Sent: August-13-09 5:38 AM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

Thanks everyone for your advices.

I increased my filterCache, and the faceting performances improved greatly.

My faceted field can have at the moment ~4 different terms, so I
did set a filterCache size of 5 and it works very well.

However, I'm planning to increase the number of terms to maybe around
500 000, so I guess this approach won't work anymore, as I doubt a 500
000 sized fieldCache would work.

So I guess my best move would be to upgrade to the soon to be 1.4
version of solr to benefit from its new faceting method.

I know this is a bit off-topic, but do you have a rough idea about
when 1.4 will be an official release?
As well, is the current trunk OK for production? Is it compatible with
1.3 configuration files?

Thanks !

Jerome.

2009/8/13 Stephen Duncan Jr stephen.dun...@gmail.com:
 Note that depending on the profile of your field (full text and how many
 unique terms on average per document), the improvements from 1.4 may not
 apply, as you may exceed the limits of the new faceting technique in Solr
 1.4.
 -Stephen

 On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher ehatc...@apache.org wrote:

 Yes, increasing the filterCache size will help with Solr 1.3 performance.

 Do note that trunk (soon Solr 1.4) has dramatically improved faceting
 performance.

Erik


 On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:

  Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
 I perform facets on multivalued string fields. The number of possible
 different values is quite large.

 Enabling facets degrades the performance by a factor 3.

 Because I'm using solr 1.3, I guess the facetting makes use of the
 filter cache to work. My filterCache is set
 to a size of 2048. I also noticed in my solr stats a very small ratio
 of cache hit (~ 0.01%).

 Can it be the reason why the faceting is slow? Does it make sense to
 increase the filterCache size so it matches more or less the number
 of different possible values for the faceted fields? Would that not
 make the memory usage explode?

 Thanks for your help !

 --
 Jerome Eteve.

 Chat with me live at http://www.eteve.net

 jer...@eteve.net





 --
 Stephen Duncan Jr
 www.stephenduncanjr.com




-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

RE: facet performance tips

It seems BOBO-Browse is alternate faceting engine; would be interesting to
compare performance with SOLR... Distributed?

-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: August-12-09 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.

RE: facet performance tips

Interesting, it has BoboRequestHandler implements SolrRequestHandler
- easy to try it; and shards support



[Fuad Efendi] It seems BOBO-Browse is alternate faceting engine; would be
interesting to
compare performance with SOLR... Distributed?


[Jason Rutherglen] For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen

Yeah we need a performance comparison, I haven't had time to put
one together. If/when I do I'll compare Bobo performance against
Solr bitset intersection based facets, compare memory
consumption.

For near realtime Solr needs to cache and merge bitsets at the
SegmentReader level, and Bobo needs to be upgraded to work with
Lucene 2.9's searching at the segment level (currently it uses a
MultiSearcher).

Distributed search on either should be fairly straightforward?

On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendif...@efendi.ca wrote:
 It seems BOBO-Browse is alternate faceting engine; would be interesting to
 compare performance with SOLR... Distributed?


 -Original Message-
 From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
 Sent: August-12-09 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: facet performance tips

 For your fields with many terms you may want to try Bobo
 http://code.google.com/p/bobo-browse/ which could work well with your
 case.

RE: facet performance tips

SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to
be); check this
http://issues.apache.org/jira/browse/SOLR-475
(and probably http://issues.apache.org/jira/browse/SOLR-711)

-Original Message-
From: Jason Rutherglen 

Yeah we need a performance comparison, I haven't had time to put
one together. If/when I do I'll compare Bobo performance against
Solr bitset intersection based facets, compare memory
consumption.

For near realtime Solr needs to cache and merge bitsets at the
SegmentReader level, and Bobo needs to be upgraded to work with
Lucene 2.9's searching at the segment level (currently it uses a
MultiSearcher).

Distributed search on either should be fairly straightforward?

On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendif...@efendi.ca wrote:
 It seems BOBO-Browse is alternate faceting engine; would be interesting to
 compare performance with SOLR... Distributed?


 -Original Message-
 From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
 Sent: August-12-09 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: facet performance tips

 For your fields with many terms you may want to try Bobo
 http://code.google.com/p/bobo-browse/ which could work well with your
 case.

Re: facet performance tips

2009-08-13 Thread Jason Rutherglen

Right, I haven't used SOLR-475 yet and am more familiar with
Bobo. I believe there are differences but I haven't gone into
them yet. As I'm using Solr 1.4 now, maybe I'll test the
UnInvertedField modality.

Feel free to report back results as I don't think I've seen much
yet?

On Thu, Aug 13, 2009 at 10:51 AM, Fuad Efendif...@efendi.ca wrote:
 SOLR-1.4-trunk uses terms counting instead of bitset intersects (seems to
 be); check this
 http://issues.apache.org/jira/browse/SOLR-475
 (and probably http://issues.apache.org/jira/browse/SOLR-711)

 -Original Message-
 From: Jason Rutherglen

 Yeah we need a performance comparison, I haven't had time to put
 one together. If/when I do I'll compare Bobo performance against
 Solr bitset intersection based facets, compare memory
 consumption.

 For near realtime Solr needs to cache and merge bitsets at the
 SegmentReader level, and Bobo needs to be upgraded to work with
 Lucene 2.9's searching at the segment level (currently it uses a
 MultiSearcher).

 Distributed search on either should be fairly straightforward?

 On Thu, Aug 13, 2009 at 9:55 AM, Fuad Efendif...@efendi.ca wrote:
 It seems BOBO-Browse is alternate faceting engine; would be interesting to
 compare performance with SOLR... Distributed?


 -Original Message-
 From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
 Sent: August-12-09 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: facet performance tips

 For your fields with many terms you may want to try Bobo
 http://code.google.com/p/bobo-browse/ which could work well with your
 case.

facet performance tips

2009-08-12 Thread Jérôme Etévé

Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
I perform facets on multivalued string fields. The number of possible
different values is quite large.

Enabling facets degrades the performance by a factor 3.

Because I'm using solr 1.3, I guess the facetting makes use of the
filter cache to work. My filterCache is set
to a size of 2048. I also noticed in my solr stats a very small ratio
of cache hit (~ 0.01%).

Can it be the reason why the faceting is slow? Does it make sense to
increase the filterCache size so it matches more or less the number
of different possible values for the faceted fields? Would that not
make the memory usage explode?

Thanks for your help !

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

RE: facet performance tips

2009-08-12 Thread Manepalli, Kalyan

Jerome,
Yes you need to increase the filterCache size to something close to 
unique number of facet elements. But also consider the RAM required to 
accommodate the increase. 
I did see a significant performance gain by increasing the filterCache size

Thanks,
Kalyan Manepalli

-Original Message-
From: Jérôme Etévé [mailto:jerome.et...@gmail.com] 
Sent: Wednesday, August 12, 2009 12:31 PM
To: solr-user@lucene.apache.org
Subject: facet performance tips

Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
I perform facets on multivalued string fields. The number of possible
different values is quite large.

Enabling facets degrades the performance by a factor 3.

Because I'm using solr 1.3, I guess the facetting makes use of the
filter cache to work. My filterCache is set
to a size of 2048. I also noticed in my solr stats a very small ratio
of cache hit (~ 0.01%).

Can it be the reason why the faceting is slow? Does it make sense to
increase the filterCache size so it matches more or less the number
of different possible values for the faceted fields? Would that not
make the memory usage explode?

Thanks for your help !

-- 
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

RE: facet performance tips

2009-08-12 Thread Fuad Efendi

I am currently faceting on tokenized multi-valued field at
http://www.tokenizer.org (25 mlns simple docs)

It uses some home-made quick fixes similar to SOLR-475 (SOLR-711) and
non-synchronized cache (similar to LingPipe's FastCache, SOLR-665, SOLR-667)

Average faceting on query results: 0.2 - 0.3 seconds; without those
patches - 20-50 seconds.

I am going to upgrade to SOLR-1.4 from trunk (with SOLR-475  SOLR-667) and
to compare results...




P.S.
Avoid faceting on a field with heavy distribution of terms (such as few
millions of terms in my case); It won't work in SOLR 1.3.

TIP: use non-tokenized single-valued field for faceting, such as
non-tokenized country field.



P.P.S.
Would be nice to load/stress
http://alias-i.com/lingpipe/docs/api/com/aliasi/util/FastCache.html against
putting CPU in a spin loop ConcurrentHashMap.



-Original Message-
From: Erik Hatcher [mailto:ehatc...@apache.org] 
Sent: August-12-09 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: facet performance tips

Yes, increasing the filterCache size will help with Solr 1.3  
performance.

Do note that trunk (soon Solr 1.4) has dramatically improved faceting  
performance.

Erik

On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:

 Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
 I perform facets on multivalued string fields. The number of possible
 different values is quite large.

 Enabling facets degrades the performance by a factor 3.

 Because I'm using solr 1.3, I guess the facetting makes use of the
 filter cache to work. My filterCache is set
 to a size of 2048. I also noticed in my solr stats a very small ratio
 of cache hit (~ 0.01%).

 Can it be the reason why the faceting is slow? Does it make sense to
 increase the filterCache size so it matches more or less the number
 of different possible values for the faceted fields? Would that not
 make the memory usage explode?

 Thanks for your help !

 -- 
 Jerome Eteve.

 Chat with me live at http://www.eteve.net

 jer...@eteve.net

Re: facet performance tips

2009-08-12 Thread Erik Hatcher

Yes, increasing the filterCache size will help with Solr 1.3  
performance.


Do note that trunk (soon Solr 1.4) has dramatically improved faceting  
performance.


Erik

On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:


Hi everyone,

 I'm using some faceting on a solr index containing ~ 160K documents.
I perform facets on multivalued string fields. The number of possible
different values is quite large.

Enabling facets degrades the performance by a factor 3.

Because I'm using solr 1.3, I guess the facetting makes use of the
filter cache to work. My filterCache is set
to a size of 2048. I also noticed in my solr stats a very small ratio
of cache hit (~ 0.01%).

Can it be the reason why the faceting is slow? Does it make sense to
increase the filterCache size so it matches more or less the number
of different possible values for the faceted fields? Would that not
make the memory usage explode?

Thanks for your help !

--
Jerome Eteve.

Chat with me live at http://www.eteve.net

jer...@eteve.net

Re: facet performance tips

2009-08-12 Thread Jason Rutherglen

For your fields with many terms you may want to try Bobo
http://code.google.com/p/bobo-browse/ which could work well with your
case.

On Wed, Aug 12, 2009 at 12:02 PM, Fuad Efendif...@efendi.ca wrote:
 I am currently faceting on tokenized multi-valued field at
 http://www.tokenizer.org (25 mlns simple docs)

 It uses some home-made quick fixes similar to SOLR-475 (SOLR-711) and
 non-synchronized cache (similar to LingPipe's FastCache, SOLR-665, SOLR-667)

 Average faceting on query results: 0.2 - 0.3 seconds; without those
 patches - 20-50 seconds.

 I am going to upgrade to SOLR-1.4 from trunk (with SOLR-475  SOLR-667) and
 to compare results...




 P.S.
 Avoid faceting on a field with heavy distribution of terms (such as few
 millions of terms in my case); It won't work in SOLR 1.3.

 TIP: use non-tokenized single-valued field for faceting, such as
 non-tokenized country field.



 P.P.S.
 Would be nice to load/stress
 http://alias-i.com/lingpipe/docs/api/com/aliasi/util/FastCache.html against
 putting CPU in a spin loop ConcurrentHashMap.



 -Original Message-
 From: Erik Hatcher [mailto:ehatc...@apache.org]
 Sent: August-12-09 2:12 PM
 To: solr-user@lucene.apache.org
 Subject: Re: facet performance tips

 Yes, increasing the filterCache size will help with Solr 1.3
 performance.

 Do note that trunk (soon Solr 1.4) has dramatically improved faceting
 performance.

        Erik

 On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:

 Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
 I perform facets on multivalued string fields. The number of possible
 different values is quite large.

 Enabling facets degrades the performance by a factor 3.

 Because I'm using solr 1.3, I guess the facetting makes use of the
 filter cache to work. My filterCache is set
 to a size of 2048. I also noticed in my solr stats a very small ratio
 of cache hit (~ 0.01%).

 Can it be the reason why the faceting is slow? Does it make sense to
 increase the filterCache size so it matches more or less the number
 of different possible values for the faceted fields? Would that not
 make the memory usage explode?

 Thanks for your help !

 --
 Jerome Eteve.

 Chat with me live at http://www.eteve.net

 jer...@eteve.net

Re: facet performance tips

2009-08-12 Thread Stephen Duncan Jr

Note that depending on the profile of your field (full text and how many
unique terms on average per document), the improvements from 1.4 may not
apply, as you may exceed the limits of the new faceting technique in Solr
1.4.
-Stephen

On Wed, Aug 12, 2009 at 2:12 PM, Erik Hatcher ehatc...@apache.org wrote:

 Yes, increasing the filterCache size will help with Solr 1.3 performance.

 Do note that trunk (soon Solr 1.4) has dramatically improved faceting
 performance.

Erik


 On Aug 12, 2009, at 1:30 PM, Jérôme Etévé wrote:

  Hi everyone,

  I'm using some faceting on a solr index containing ~ 160K documents.
 I perform facets on multivalued string fields. The number of possible
 different values is quite large.

 Enabling facets degrades the performance by a factor 3.

 Because I'm using solr 1.3, I guess the facetting makes use of the
 filter cache to work. My filterCache is set
 to a size of 2048. I also noticed in my solr stats a very small ratio
 of cache hit (~ 0.01%).

 Can it be the reason why the faceting is slow? Does it make sense to
 increase the filterCache size so it matches more or less the number
 of different possible values for the faceted fields? Would that not
 make the memory usage explode?

 Thanks for your help !

 --
 Jerome Eteve.

 Chat with me live at http://www.eteve.net

 jer...@eteve.net





-- 
Stephen Duncan Jr
www.stephenduncanjr.com

Re: Facet Performance

2008-07-31 Thread Funtick


Hoss,

This is still extremely interesting area for possible improvements; I simply
don't want the topic to die 
http://www.nabble.com/Facet-Performance-td7746964.html

http://issues.apache.org/jira/browse/SOLR-665
http://issues.apache.org/jira/browse/SOLR-667
http://issues.apache.org/jira/browse/SOLR-669

I am currently using faceting on single-valued _tokenized_ field with huge
amount of documents; _unsynchronized_ version of FIFOCache; 1.5 seconds
average response time (for faceted queries only!)

I think we can use additional cache for facet results (to store calculated
values!); Lucene's FieldCache can be used only for non-tokenized
single-valued non-bollean fields

-Fuad



hossman_lucene wrote:
 
 
 : Unfortunately which strategy will be chosen is currently undocumented
 : and control is a bit oblique:  If the field is tokenized or multivalued
 : or Boolean, the FilterQuery method will be used; otherwise the
 : FieldCache method.  I expect I or others will improve that shortly.
 
 Bear in mind, what's provide out of the box is SimpleFacets ... it's
 designed to meet simple faceting needs ... when you start talking about
 100s or thousands of constraints per facet, you are getting outside the
 scope of what it was intended to serve efficiently.
 
 At a certain point the only practical thing to do is write a custom
 request handler that makes the best choices for your data.
 
 For the record: a really simple patch someone could submit would be to
 make add an optional field based param indicating which type of faceting
 (termenum/fieldcache) should be used to generate the list of terms and
 then make SimpleFacets.getFacetFieldCounts use that and call the
 apprpriate method insteado calling getTermCounts -- that way you could
 force one or the other if you know it's better for your data/query.
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Facet-Performance-tp7746964p18756500.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet Performance


Yonik Seeley wrote:


1) facet on single-valued strings if you can
2) if you can't do (1) then enlarge the fieldcache so that the number
of filters (one per possible term in the field you are filtering on)
can fit.


I changed the filterCache to the following:
   filterCache
 class=solr.LRUCache
 size=25600
 initialSize=5120
 autowarmCount=1024/

However a search that normally takes .04s is taking 74 seconds once I 
use the facets since I am faceting on 4 fields.


Can you suggest a better configuration that would solve this performance 
issue, or should I not use faceting?
I figure I could run the query twice, once limited to 20 records and 
then again with the limit set to the total number of records and develop 
my own facets.  I have infact done this before with a different back-end 
and my code is processed in under .01 seconds.


Why is faceting so slow?

Andrew

Re: Facet Performance


Chris Hostetter wrote:


: Could you suggest a better configuration based on this?

If that's what your stats look like after a single request, then i would
guess you would need to make your cache size at least 1.6 million in order
for it to be of any use in improving your facet speed.
 

Would this have any strong impacts on my system?  Should I just set it 
to an even 2 million to allow for growth?



: My data is 492,000 records of book data.  I am faceting on 4 fields:
: author, subject, language, format.
: Format and language are fairly simple as their are only a few unique
: terms.  Author and subject however are much different in that there are
: thousands of unique terms.

by the looks of it, you have a lot more then a few thousand unique terms
in those two fields ... are you tokenizing on these fields?  that's
probably not what you want for ields you're going to facet on.
 

All of these fields are set as string in my schema, so if I understand 
the fields correctly, they are not being tokenized.  I also have an 
author field that is set as text for searching.


Thanks
Andrew

Re: Facet Performance

2006-12-08 Thread Yonik Seeley


On 12/8/06, Andrew Nagy [EMAIL PROTECTED] wrote:

Chris Hostetter wrote:

: Could you suggest a better configuration based on this?

If that's what your stats look like after a single request, then i would
guess you would need to make your cache size at least 1.6 million in order
for it to be of any use in improving your facet speed.


Would this have any strong impacts on my system?  Should I just set it
to an even 2 million to allow for growth?


Change the following in solrconfig.xml, and you should be fine with a
higher setting.
useFilterForSortedQuerytrue/useFilterForSortedQuery
to
useFilterForSortedQueryfalse/useFilterForSortedQuery

That will prevent the filtercache from being used for anything but
filters and faceting, so if you set it to high, it won't be utilized
anyway.


: My data is 492,000 records of book data.  I am faceting on 4 fields:
: author, subject, language, format.
: Format and language are fairly simple as their are only a few unique
: terms.  Author and subject however are much different in that there are
: thousands of unique terms.

by the looks of it, you have a lot more then a few thousand unique terms
in those two fields ... are you tokenizing on these fields?  that's
probably not what you want for ields you're going to facet on.


All of these fields are set as string in my schema


Are they multivalued, and do they need to be.
Anything that is of type string and not multivalued will use the
lucene FieldCache rather than the filterCache.

-Yonik

Re: Facet Performance


Yonik Seeley wrote:


Are they multivalued, and do they need to be.
Anything that is of type string and not multivalued will use the
lucene FieldCache rather than the filterCache.


The author field is multivalued.  Will this be a strong performance issue?

I could make multiple author fields as to not have the multivalued field 
and then only facet on the first author.


Thanks
Andrew

Re: Facet Performance

2006-12-08 Thread J.J. Larrea

Andrew Nagy, ditto on what Yonik said.  Here is some further elaboration:

I am doing much the same thing (faceting on Author etc.). When my Author field 
was defined as a solr.TextField, even using solr.KeywordTokenizerFactory so it 
wasn't actually tokenized, the faceting code chose the QueryFilter approach, 
and faceting on Author for 100k+ document took about 4 seconds.

When I changed the field to string e.g. solr.StrField, the faceting code 
recognized it as untokenized and used the FieldCache approach.  Times have 
dropped to about 120ms for the first query (when the FieldCache is generated) 
and  10ms for subsequent queries returning a few thousand results.  Quite a 
difference.

The strategy must be chosen on a field-by-field basis.  While QueryFilter is 
excellent for fields with a small set of enumerated values such as Language or 
Format, it is inappropriate for large value sets such as Author.

Unfortunately which strategy will be chosen is currently undocumented and 
control is a bit oblique:  If the field is tokenized or multivalued or Boolean, 
the FilterQuery method will be used; otherwise the FieldCache method.  I expect 
I or others will improve that shortly.

- J.J.

At 2:58 PM -0500 12/8/06, Yonik Seeley wrote:
Right, if any of these are tokenized, then you could make them
non-tokenized (use string type).  If they really need to be
tokenized (author for example), then you could use copyField to make
another copy to a non-tokenized field that you can use for faceting.

After that, as Hoss suggests, run a single faceting query with all 4
fields and look at the filterCache statistics.  Take the lookups
number and multiply it by, say, 1.5 to leave some room for future
growth, and use that as your cache size.  You probably want to bump up
both initialSize and autowarmCount as well.

The first query will still be slow.  The second should be relatively fast.
You may hit an OOM error.  Increase the JVM heap size if this happens.

-Yonik

Re: Facet Performance

2006-12-08 Thread Yonik Seeley


On 12/8/06, J.J. Larrea [EMAIL PROTECTED] wrote:

Unfortunately which strategy will be chosen is currently undocumented and 
control is a bit oblique:  If the field is tokenized or multivalued or Boolean, 
the FilterQuery method will be used; otherwise the FieldCache method.


If anyone had time some of this could be documented here:
http://wiki.apache.org/solr/SimpleFacetParameters
The wiki is open to all.

Or perhaps a new top level FacetedSearching page that references
SimpleFacetParameters

-Yonik

Re: Facet Performance


J.J. Larrea wrote:


Unfortunately which strategy will be chosen is currently undocumented and 
control is a bit oblique:  If the field is tokenized or multivalued or Boolean, 
the FilterQuery method will be used; otherwise the FieldCache method.  I expect 
I or others will improve that shortly.
 

Good to hear, cause I can't really get away with not having a 
multi-valued field for author.


Im really excited by solr and really impressed so far.

Thanks!
Andrew

Re: Facet Performance

2006-12-08 Thread Chris Hostetter


: Unfortunately which strategy will be chosen is currently undocumented
: and control is a bit oblique:  If the field is tokenized or multivalued
: or Boolean, the FilterQuery method will be used; otherwise the
: FieldCache method.  I expect I or others will improve that shortly.

Bear in mind, what's provide out of the box is SimpleFacets ... it's
designed to meet simple faceting needs ... when you start talking about
100s or thousands of constraints per facet, you are getting outside the
scope of what it was intended to serve efficiently.

At a certain point the only practical thing to do is write a custom
request handler that makes the best choices for your data.

For the record: a really simple patch someone could submit would be to
make add an optional field based param indicating which type of faceting
(termenum/fieldcache) should be used to generate the list of terms and
then make SimpleFacets.getFacetFieldCounts use that and call the
apprpriate method insteado calling getTermCounts -- that way you could
force one or the other if you know it's better for your data/query.



-Hoss

Re: Facet Performance