Re: new faceting algorithm

2008-12-12 Thread wojtekpia

It looks like my filterCache was too big. I reduced my filterCache size from
700,000 to 20,000 (without changing the heap size) and all my performance
issues went away. I experimented with various GC settings, but none of them
made a significant difference.

I see a 16% increase in throughput by applying this patch.


Yonik Seeley wrote:
 
 ... This can be a big chunk of memory
 per-request, and is most likely what changed your GC profile (i.e.
 changing the GC settings may help).
 
 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20984502.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-12-07 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 2:28 PM, wojtekpia [EMAIL PROTECTED] wrote:
 I'm seeing some strange behavior with my garbage collector that disappears
 when I turn off this optimization.

I just changed the new faceting code to use a solr cache.
Look for the fieldValueCache on the statistics page now.

It just occurred to me that there is a big difference in how memory is
used with facet.method=fc.
Since we traverse documents and count up terms, we need to allocate an
int[nTerms]
to accumulate those counts.  This can be a big chunk of memory
per-request, and is most likely what changed your GC profile (i.e.
changing the GC settings may help).

-Yonik


 I'm running load tests on my deployment.
 For the first few minutes, everything is fine (and this patch does make
 things faster - I haven't quantified the improvement yet). After that, the
 garbage collector stops collecting. Specifically, the new generation part of
 the heap is full, but never garbage collected, and the old generation is
 emptied, then never gets anything more. This throttles Solr performance
 (average response times that used to be ~500ms are now ~25s).

 I described my deployment scenario in an earlier post:
 http://www.nabble.com/Throughput-Optimization-td20335132.html

 Does it sound like the new faceting algorithm could be the culprit?


 wojtekpia wrote:

 Definitely, but it'll take me a few days. I'll also report findings on
 SOLR-465. (I've been on holiday for a few weeks)


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 wojtek, you can report back the numbers if possible

 It would be nice to know how the new impl performs in real-world






 --
 View this message in context: 
 http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: new faceting algorithm

2008-12-05 Thread Till Kinstler

Yonik Seeley schrieb:


We'd love some feedback on how it works to
ensure that it actually is a win for the majority and should be the
default.


I just did a quick test using Solr nightly 2008-11-30. I have an index 
of about 2.9 mil bibliographic records, size: 16G. I tested facetting 
author names, each index document may contain multiple author names, so 
author names go into a multivalued field (not analyzed). Queries used 
for testing were extracted from log files of a prototype application.
With facet.method=enum, 50 request threads, I get an average response 
time of about 19(!) ms, no cache evictions. With 1 request thread: 
about 1800 ms.
With facet.method=fc, 50 threads I get an average response time of 
around 300 ms. 1 thread: 16 ms.

Seems to be a major improvement at first sight :-)

Regards,
Till

--
Till Kinstler
Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG)
Platz der Göttinger Sieben 1, D 37073 Göttingen
[EMAIL PROTECTED], +49 (0) 551 39-13431, http://www.gbv.de


Re: new faceting algorithm

2008-12-05 Thread Andre Hagenbruch
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Till Kinstler schrieb:

Hi,

 I just did a quick test using Solr nightly 2008-11-30. I have an index
 of about 2.9 mil bibliographic records, size: 16G. I tested facetting
 author names, each index document may contain multiple author names, so
 author names go into a multivalued field (not analyzed). Queries used
 for testing were extracted from log files of a prototype application.
 With facet.method=enum, 50 request threads, I get an average response
 time of about 19(!) ms, no cache evictions. With 1 request thread:
 about 1800 ms.
 With facet.method=fc, 50 threads I get an average response time of
 around 300 ms. 1 thread: 16 ms.
 Seems to be a major improvement at first sight :-)

same here: multi valued author fields were the bottleneck with 1.3 for
us, too. I'm currently testing with 1.5 million records, ~1.2 million of
which have values for the author field, but with ~2 million distinct
values. With Solr 1.3 we had average response times of 15000-25000 ms
for 10 parallel requests (depending on cache settings), with 1.4 they
are now down to 230 ms...

Regards,

Andre
- --
Andre Hagenbruch
Projekt Integriertes Bibliotheksportal
Universitaetsbibliothek Bochum, Etage 4/Raum 6
Fon: +49 234 3229346, Fax: +49 234 3214736
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk5G5kACgkQ3wuzs9k1icVbOACgta0COUoOJGRN93puG2LzBJZU
t1EAn3od/3CmD9zE0ioo/yjQ5YrHv+1m
=80sA
-END PGP SIGNATURE-


Re: new faceting algorithm

2008-12-05 Thread Peter Keegan
Hi Yonik,

May I ask in which class(es) this improvement was made? I've been using the
DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a
Lucene based app. to do faceting.

Thanks,
Peter


On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 A new faceting algorithm has been committed to the development version
 of Solr, and should be available in the next nightly test build (will
 be dated 11-25).  This change should generally improve field faceting
 where the field has many unique values but relatively few values per
 document.  This new algorithm is now the default for multi-valued
 fields (including tokenized fields) so you shouldn't have to do
 anything to enable it.  We'd love some feedback on how it works to
 ensure that it actually is a win for the majority and should be the
 default.

 -Yonik



Re: new faceting algorithm

2008-12-05 Thread Rob Casson
very similar situation to those already reported.  2.9M bilbiographic
records, with authors being the (previous) bottleneck, and the one
we're starting to test with the new algorithm.

so far, no load tests, but just in single requests i'm seeing the same
improvements...phenomenal improvements, btw, with most example queries
taking less than 1/100th of the time

always very impressed with this project/product, and just thought i'd
add a me-too to the list...cheers, and have a great weekend,

rob

On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 A new faceting algorithm has been committed to the development version
 of Solr, and should be available in the next nightly test build (will
 be dated 11-25).  This change should generally improve field faceting
 where the field has many unique values but relatively few values per
 document.  This new algorithm is now the default for multi-valued
 fields (including tokenized fields) so you shouldn't have to do
 anything to enable it.  We'd love some feedback on how it works to
 ensure that it actually is a win for the majority and should be the
 default.

 -Yonik



Re: new faceting algorithm

2008-12-05 Thread Koji Sekiguchi

Peter,

It is UnInvertedField class. See also:
https://issues.apache.org/jira/browse/SOLR-475


Peter Keegan wrote:

Hi Yonik,

May I ask in which class(es) this improvement was made? I've been using the
DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a
Lucene based app. to do faceting.

Thanks,
Peter

  




Re: new faceting algorithm

2008-12-04 Thread wojtekpia

I'm seeing some strange behavior with my garbage collector that disappears
when I turn off this optimization. I'm running load tests on my deployment.
For the first few minutes, everything is fine (and this patch does make
things faster - I haven't quantified the improvement yet). After that, the
garbage collector stops collecting. Specifically, the new generation part of
the heap is full, but never garbage collected, and the old generation is
emptied, then never gets anything more. This throttles Solr performance
(average response times that used to be ~500ms are now ~25s). 

I described my deployment scenario in an earlier post:
http://www.nabble.com/Throughput-Optimization-td20335132.html

Does it sound like the new faceting algorithm could be the culprit?


wojtekpia wrote:
 
 Definitely, but it'll take me a few days. I'll also report findings on
 SOLR-465. (I've been on holiday for a few weeks)
 
 
 Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 wojtek, you can report back the numbers if possible
 
 It would be nice to know how the new impl performs in real-world
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 2:28 PM, wojtekpia [EMAIL PROTECTED] wrote:
 I'm seeing some strange behavior with my garbage collector that disappears
 when I turn off this optimization. I'm running load tests on my deployment.
 For the first few minutes, everything is fine (and this patch does make
 things faster - I haven't quantified the improvement yet). After that, the
 garbage collector stops collecting. Specifically, the new generation part of
 the heap is full, but never garbage collected, and the old generation is
 emptied, then never gets anything more.

Are you doing commits at any time?
One possibility is the caching mechanism (weak-ref on the
IndexReader)... that's going to be changing soon hopefully.

-Yonik


 This throttles Solr performance
 (average response times that used to be ~500ms are now ~25s).

 I described my deployment scenario in an earlier post:
 http://www.nabble.com/Throughput-Optimization-td20335132.html

 Does it sound like the new faceting algorithm could be the culprit?


 wojtekpia wrote:

 Definitely, but it'll take me a few days. I'll also report findings on
 SOLR-465. (I've been on holiday for a few weeks)


 Noble Paul നോബിള്‍ नोब्ळ् wrote:

 wojtek, you can report back the numbers if possible

 It would be nice to know how the new impl performs in real-world






 --
 View this message in context: 
 http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: new faceting algorithm

2008-12-04 Thread wojtekpia


Yonik Seeley wrote:
 
 
 Are you doing commits at any time?
 One possibility is the caching mechanism (weak-ref on the
 IndexReader)... that's going to be changing soon hopefully.
 
 -Yonik
 


No commits during this test. Should I start looking into my heap size
distribution and garbage collector selection?
-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20841219.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-12-04 Thread Yonik Seeley
On Thu, Dec 4, 2008 at 2:57 PM, wojtekpia [EMAIL PROTECTED] wrote:
 Yonik Seeley wrote:

 Are you doing commits at any time?
 One possibility is the caching mechanism (weak-ref on the
 IndexReader)... that's going to be changing soon hopefully.

 -Yonik


 No commits during this test. Should I start looking into my heap size
 distribution and garbage collector selection?

Hmmm, OK.  The other big difference would then be that retrieving the
top facets requires creating a Lucene TermEnum (not all facet values
are stored in memory).  The lucene version in Solr has changed since I
did long running tests... with various Lucene changes to thread-local
caching, etc.  I'll try and reproduce.  Or maybe this is somehow a GC
bug just tickled by the current caching mechanism? (weak hash map)

-Yonik


Re: new faceting algorithm

2008-12-02 Thread wojtekpia

Is there a configurable way to switch to the previous implementation? I'd
like to see exactly how it affects performance in my case.


Yonik Seeley wrote:
 
 And if you want to verify that the new faceting code has indeed kicked
 in, some statistics are logged, like:
 
 Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field features, memSize=14584, time=47,
 phase1=47,
  nTerms=285, bigTerms=99, termInstances=186
 
 -Yonik
 
 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-12-02 Thread Yonik Seeley
On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote:
 Is there a configurable way to switch to the previous implementation? I'd
 like to see exactly how it affects performance in my case.

Thanks for the reminder, I need to document this in the wiki.

facet.method=enum  (enumerate terms and do intersections, the old default)
facet.method=fc  (fieldcache method, the new default)

-Yonik


 Yonik Seeley wrote:

 And if you want to verify that the new faceting code has indeed kicked
 in, some statistics are logged, like:

 Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field features, memSize=14584, time=47,
 phase1=47,
  nTerms=285, bigTerms=99, termInstances=186

 -Yonik



 --
 View this message in context: 
 http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: new faceting algorithm

2008-12-02 Thread Noble Paul നോബിള്‍ नोब्ळ्
wojtek, you can report back the numbers if possible

It would be nice to know how the new impl performs in real-world

On Tue, Dec 2, 2008 at 11:45 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote:
 Is there a configurable way to switch to the previous implementation? I'd
 like to see exactly how it affects performance in my case.

 Thanks for the reminder, I need to document this in the wiki.

 facet.method=enum  (enumerate terms and do intersections, the old default)
 facet.method=fc  (fieldcache method, the new default)

 -Yonik


 Yonik Seeley wrote:

 And if you want to verify that the new faceting code has indeed kicked
 in, some statistics are logged, like:

 Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field features, memSize=14584, time=47,
 phase1=47,
  nTerms=285, bigTerms=99, termInstances=186

 -Yonik



 --
 View this message in context: 
 http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html
 Sent from the Solr - User mailing list archive at Nabble.com.






-- 
--Noble Paul


Re: new faceting algorithm

2008-12-02 Thread wojtekpia

Definitely, but it'll take me a few days. I'll also report findings on
SOLR-465. (I've been on holiday for a few weeks)


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 wojtek, you can report back the numbers if possible
 
 It would be nice to know how the new impl performs in real-world
 
 
 

-- 
View this message in context: 
http://www.nabble.com/new-faceting-algorithm-tp20674902p20798456.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: new faceting algorithm

2008-11-24 Thread Yonik Seeley
And if you want to verify that the new faceting code has indeed kicked
in, some statistics are logged, like:

Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert
INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47,
 nTerms=285, bigTerms=99, termInstances=186

-Yonik

On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 A new faceting algorithm has been committed to the development version
 of Solr, and should be available in the next nightly test build (will
 be dated 11-25).  This change should generally improve field faceting
 where the field has many unique values but relatively few values per
 document.  This new algorithm is now the default for multi-valued
 fields (including tokenized fields) so you shouldn't have to do
 anything to enable it.  We'd love some feedback on how it works to
 ensure that it actually is a win for the majority and should be the
 default.

 -Yonik