(Issue) How improve solr group performance

2014-05-28 Thread Alice.H.Yang (mis.cnsh04.Newegg) 41493
Hi, all
Does anybody has some advice for me on solr group performance. I have 
no idea on the group performance.

To David Smiley
I am not responsible for endeca, It's a pity ,I have no comment on 

Best Regards,
Alice Yang
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

发件人: david.w.smi...@gmail.com [mailto:david.w.smi...@gmail.com] 
发送时间: 2014年5月27日 21:29
收件人: solr-user@lucene.apache.org
主题: Re: 答复: (Issue) How improve solr facet performance


RE grouping, try Solr 4.8’s new “collapse” qparser w/ “expand
SearchComponent.  The ref guide has the docs.  It’s usually a faster equivalent 
approach to group=true

Do you care to comment further on NewEgg’s apparent switch from Endeca to Solr? 
 (confirm true/false and rationale)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer 

On Tue, May 27, 2014 at 4:17 AM, Alice.H.Yang (mis.cnsh04.Newegg) 41493  
alice.h.y...@newegg.com wrote:

 Hi, Token

 I set the 3 fields with hundreds of values uses fc and the 
 rest uses enum, the performance is improved 2 times compared with no 
 parameter, and then I add facet.method=20 , the performance is 
 improved about 4 times compared with no parameter.
 And I also tried setting 9 facet field to one copyfield, I 
 test the performance, it is improved about 2.5 times compared with no 
 So, It is improved a lot under your advice, thanks a lot.
 Now I have another performance issue, It's the group performance.
 The number of data is as same as facet performance scenario.
 When the keyword search hits about one million documents, the QTime is 
 about 600ms.(It doesn't query the first time, it's in cache)

 Query url:


 It need Qtime about 600ms.

 This query have two parameter:
 1. fl one field
 2. group=true, 

 If I set group=false,, the QTime is only 1 ms.
 But I need do group and group.ngroups, How can I improve the group 
 performance under this demand. Do you have some advice for me. I'm 
 looking forward to your reply.

 Best Regards,
 Alice Yang
 Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

 发件人: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
 发送时间: 2014年5月24日 15:17
 收件人: solr-user@lucene.apache.org
 主题: RE: (Issue) How improve solr facet performance

 Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
  1.  I'm sorry, I have made a mistake, the total number of documents 
 32 Million, not 320 Million.
  2.  The system memory is large for solr index, OS total has 256G, I 
 the solr tomcat HEAPSIZE=-Xms25G -Xmx100G

 100G is a very high number. What special requirements dictates such a 
 large heap size?

  Reply:  9 fields I facet on.

 Solr treats each facet separately and with facet.method=fc and 10M 
 hits, this means that it will iterate 9*10M = 90M document IDs and 
 update the counters for those.

  Reply:  3 facet fields have one hundred unique values, other 6 facet
 fields' unique values are between 3 to 15.

 So very low cardinality. This is confirmed by your low response time 
 of 6ms for 2925 hits.

  And we test this scenario:  If the number of facet fields' unique 
 is less we add facet.method=enum, there is a little to improve performance.

 That is a shame: enum is normally the simple answer to a setup like yours.
 Have you tried fine-tuning your fc/enum selection, so that the 3 
 fields with hundreds of values uses fc and the rest uses enum? That 
 might halve your response time.

 Since the number of unique facets is so low, I do not think that 
 DocValues can help you here. Besides the fine-grained 
 fc/enum-selection above, you could try collapsing all 9 facet-fields 
 into a single field. The idea behind this is that for facet.method=fc, 
 performing faceting on a field with (for example) 300 unique values 
 takes practically the same amount of time as faceting on a field with 
 1000 unique values: Faceting on a single slightly larger field is much faster 
 than faceting on 9 smaller fields.
 After faceting with facet.limit=-1 on the single super-facet-field, 
 you must match the returned values back to their original fields:

 If you have the facet-fields

 field0: 34
 field1: 187
 field2: 78432
 field3: 3

 then collapse them by or-ing a field-specific mask that is bigger than 
 the max in any field, then put it all into a single field:

 fieldAll: 0xA000 | 34
 fieldAll: 0xA100 | 187
 fieldAll: 0xA200 | 78432
 fieldAll: 0xA300 | 3


答复: (Issue) How improve solr facet performance

2014-05-27 Thread Alice.H.Yang (mis.cnsh04.Newegg) 41493
Hi, Token

I set the 3 fields with hundreds of values uses fc and the rest uses 
enum, the performance is improved 2 times compared with no parameter, and then 
I add facet.method=20 , the performance is improved about 4 times compared with 
no parameter.
And I also tried setting 9 facet field to one copyfield, I test the 
performance, it is improved about 2.5 times compared with no parameter.
So, It is improved a lot under your advice, thanks a lot.
Now I have another performance issue, It's the group performance. The 
number of data is as same as facet performance scenario. 
When the keyword search hits about one million documents, the QTime is about 
600ms.(It doesn't query the first time, it's in cache)

Query url: 

It need Qtime about 600ms.

This query have two parameter: 
1. fl one field 
2. group=true, 

If I set group=false,, the QTime is only 1 ms.
But I need do group and group.ngroups, How can I improve the group performance 
under this demand. Do you have some advice for me. I'm looking forward to your 

Best Regards,
Alice Yang
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

发件人: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
发送时间: 2014年5月24日 15:17
收件人: solr-user@lucene.apache.org
主题: RE: (Issue) How improve solr facet performance

Alice.H.Yang (mis.cnsh04.Newegg) 41493 [alice.h.y...@newegg.com] wrote:
 1.  I'm sorry, I have made a mistake, the total number of documents is 32 
 Million, not 320 Million.
 2.  The system memory is large for solr index, OS total has 256G, I set the 
 solr tomcat HEAPSIZE=-Xms25G -Xmx100G

100G is a very high number. What special requirements dictates such a large 
heap size?

 Reply:  9 fields I facet on.

Solr treats each facet separately and with facet.method=fc and 10M hits, this 
means that it will iterate 9*10M = 90M document IDs and update the counters for 

 Reply:  3 facet fields have one hundred unique values, other 6 facet fields' 
 unique values are between 3 to 15.

So very low cardinality. This is confirmed by your low response time of 6ms for 
2925 hits.

 And we test this scenario:  If the number of facet fields' unique values is 
 less we add facet.method=enum, there is a little to improve performance.

That is a shame: enum is normally the simple answer to a setup like yours. Have 
you tried fine-tuning your fc/enum selection, so that the 3 fields with 
hundreds of values uses fc and the rest uses enum? That might halve your 
response time.

Since the number of unique facets is so low, I do not think that DocValues can 
help you here. Besides the fine-grained fc/enum-selection above, you could try 
collapsing all 9 facet-fields into a single field. The idea behind this is that 
for facet.method=fc, performing faceting on a field with (for example) 300 
unique values takes practically the same amount of time as faceting on a field 
with 1000 unique values: Faceting on a single slightly larger field is much 
faster than faceting on 9 smaller fields. After faceting with facet.limit=-1 on 
the single super-facet-field, you must match the returned values back to their 
original fields:

If you have the facet-fields

field0: 34
field1: 187
field2: 78432
field3: 3

then collapse them by or-ing a field-specific mask that is bigger than the max 
in any field, then put it all into a single field:

fieldAll: 0xA000 | 34
fieldAll: 0xA100 | 187
fieldAll: 0xA200 | 78432
fieldAll: 0xA300 | 3

perform the facet request on fieldAll with facet.limit=-1 and split the 
resulting counts with

for (entry: facetResultAll) {
  switch (0xFF00  entry.value) {
case 0xA000:
  field0.add(entry.value, entry.count);
case 0xA100:
  field1.add(entry.value, entry.count);

Toke Eskildsen, State and University Library, Denmark

fw: (Issue) How improve solr facet performance

2014-05-23 Thread Alice.H.Yang (mis.cnsh04.Newegg) 41493
Hi, Solr Developer

  Thanks very much for your timely reply.

1.  I'm sorry, I have made a mistake, the total number of documents is 32 
Million, not 320 Million.
2.  The system memory is large for solr index, OS total has 256G, I set the 
solr tomcat HEAPSIZE=-Xms25G -Xmx100G

-How many fields are you faceting on?

Reply:  9 fields I facet on.

- How many unique values does your facet fields have (approximately)?

Reply:  3 facet fields have one hundred unique values, other 6 facet fields' 
unique values are between 3 to 15. 

- What is the content of your facets (Strings, numbers?)

Reply:  9 fields are all numbers.

- Which facet.method do you use?

Reply:  Used the default facet.method=fc

And we test this scenario:  If the number of facet fields' unique values is 
less we add facet.method=enum, there is a little to improve performance.

- What is the response time with faceting and a few thousand hits?

Reply:   result name=response numFound=2925 start=0  
   QTime is  int name=QTime6/int 

Best Regards,
Alice Yang
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk]
Sent: Friday, May 23, 2014 8:08 PM
To: d...@lucene.apache.org
Subject: Re: (Issue) How improve solr facet performance

On Fri, 2014-05-23 at 11:45 +0200, Alice.H.Yang (mis.cnsh04.Newegg)
41493 wrote:
We are blocked by solr facet performance when query hits many 
 documents. (about 10,000,000)

[320M documents, immediate response for plain search with 1M hits]

 But when we add several facet.field to do facet ,QTime  increaseto 
 220ms or more.

It is not clear whether your observation of increased response time is due to 
many hits or faceting in itself.

- How many fields are you faceting on?
- How many unique values does your facet fields have (approximately)?
- What is the content of your facets (Strings, numbers?)
- Which facet.method do you use?
- What is the response time with faceting and a few thousand hits?

 Do you have some advice on how improve the facet performance when hit 
 many documents.

That depends on whether your bottleneck is the hitcount itself, the number of 
unique facet values or something third like I/O.

- Toke Eskildsen, State and University Library, Denmark

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
commands, e-mail: dev-h...@lucene.apache.org