Re: new faceting algorithm
It looks like my filterCache was too big. I reduced my filterCache size from 700,000 to 20,000 (without changing the heap size) and all my performance issues went away. I experimented with various GC settings, but none of them made a significant difference. I see a 16% increase in throughput by applying this patch. Yonik Seeley wrote: ... This can be a big chunk of memory per-request, and is most likely what changed your GC profile (i.e. changing the GC settings may help). -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20984502.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
On Thu, Dec 4, 2008 at 2:28 PM, wojtekpia [EMAIL PROTECTED] wrote: I'm seeing some strange behavior with my garbage collector that disappears when I turn off this optimization. I just changed the new faceting code to use a solr cache. Look for the fieldValueCache on the statistics page now. It just occurred to me that there is a big difference in how memory is used with facet.method=fc. Since we traverse documents and count up terms, we need to allocate an int[nTerms] to accumulate those counts. This can be a big chunk of memory per-request, and is most likely what changed your GC profile (i.e. changing the GC settings may help). -Yonik I'm running load tests on my deployment. For the first few minutes, everything is fine (and this patch does make things faster - I haven't quantified the improvement yet). After that, the garbage collector stops collecting. Specifically, the new generation part of the heap is full, but never garbage collected, and the old generation is emptied, then never gets anything more. This throttles Solr performance (average response times that used to be ~500ms are now ~25s). I described my deployment scenario in an earlier post: http://www.nabble.com/Throughput-Optimization-td20335132.html Does it sound like the new faceting algorithm could be the culprit? wojtekpia wrote: Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള് नोब्ळ् wrote: wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
Yonik Seeley schrieb: We'd love some feedback on how it works to ensure that it actually is a win for the majority and should be the default. I just did a quick test using Solr nightly 2008-11-30. I have an index of about 2.9 mil bibliographic records, size: 16G. I tested facetting author names, each index document may contain multiple author names, so author names go into a multivalued field (not analyzed). Queries used for testing were extracted from log files of a prototype application. With facet.method=enum, 50 request threads, I get an average response time of about 19(!) ms, no cache evictions. With 1 request thread: about 1800 ms. With facet.method=fc, 50 threads I get an average response time of around 300 ms. 1 thread: 16 ms. Seems to be a major improvement at first sight :-) Regards, Till -- Till Kinstler Verbundzentrale des Gemeinsamen Bibliotheksverbundes (VZG) Platz der Göttinger Sieben 1, D 37073 Göttingen [EMAIL PROTECTED], +49 (0) 551 39-13431, http://www.gbv.de
Re: new faceting algorithm
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Till Kinstler schrieb: Hi, I just did a quick test using Solr nightly 2008-11-30. I have an index of about 2.9 mil bibliographic records, size: 16G. I tested facetting author names, each index document may contain multiple author names, so author names go into a multivalued field (not analyzed). Queries used for testing were extracted from log files of a prototype application. With facet.method=enum, 50 request threads, I get an average response time of about 19(!) ms, no cache evictions. With 1 request thread: about 1800 ms. With facet.method=fc, 50 threads I get an average response time of around 300 ms. 1 thread: 16 ms. Seems to be a major improvement at first sight :-) same here: multi valued author fields were the bottleneck with 1.3 for us, too. I'm currently testing with 1.5 million records, ~1.2 million of which have values for the author field, but with ~2 million distinct values. With Solr 1.3 we had average response times of 15000-25000 ms for 10 parallel requests (depending on cache settings), with 1.4 they are now down to 230 ms... Regards, Andre - -- Andre Hagenbruch Projekt Integriertes Bibliotheksportal Universitaetsbibliothek Bochum, Etage 4/Raum 6 Fon: +49 234 3229346, Fax: +49 234 3214736 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkk5G5kACgkQ3wuzs9k1icVbOACgta0COUoOJGRN93puG2LzBJZU t1EAn3od/3CmD9zE0ioo/yjQ5YrHv+1m =80sA -END PGP SIGNATURE-
Re: new faceting algorithm
Hi Yonik, May I ask in which class(es) this improvement was made? I've been using the DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a Lucene based app. to do faceting. Thanks, Peter On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A new faceting algorithm has been committed to the development version of Solr, and should be available in the next nightly test build (will be dated 11-25). This change should generally improve field faceting where the field has many unique values but relatively few values per document. This new algorithm is now the default for multi-valued fields (including tokenized fields) so you shouldn't have to do anything to enable it. We'd love some feedback on how it works to ensure that it actually is a win for the majority and should be the default. -Yonik
Re: new faceting algorithm
very similar situation to those already reported. 2.9M bilbiographic records, with authors being the (previous) bottleneck, and the one we're starting to test with the new algorithm. so far, no load tests, but just in single requests i'm seeing the same improvements...phenomenal improvements, btw, with most example queries taking less than 1/100th of the time always very impressed with this project/product, and just thought i'd add a me-too to the list...cheers, and have a great weekend, rob On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A new faceting algorithm has been committed to the development version of Solr, and should be available in the next nightly test build (will be dated 11-25). This change should generally improve field faceting where the field has many unique values but relatively few values per document. This new algorithm is now the default for multi-valued fields (including tokenized fields) so you shouldn't have to do anything to enable it. We'd love some feedback on how it works to ensure that it actually is a win for the majority and should be the default. -Yonik
Re: new faceting algorithm
Peter, It is UnInvertedField class. See also: https://issues.apache.org/jira/browse/SOLR-475 Peter Keegan wrote: Hi Yonik, May I ask in which class(es) this improvement was made? I've been using the DocSet, DocList, BitDocSet, HashDocSet from Solr from a few years ago with a Lucene based app. to do faceting. Thanks, Peter
Re: new faceting algorithm
I'm seeing some strange behavior with my garbage collector that disappears when I turn off this optimization. I'm running load tests on my deployment. For the first few minutes, everything is fine (and this patch does make things faster - I haven't quantified the improvement yet). After that, the garbage collector stops collecting. Specifically, the new generation part of the heap is full, but never garbage collected, and the old generation is emptied, then never gets anything more. This throttles Solr performance (average response times that used to be ~500ms are now ~25s). I described my deployment scenario in an earlier post: http://www.nabble.com/Throughput-Optimization-td20335132.html Does it sound like the new faceting algorithm could be the culprit? wojtekpia wrote: Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള് नोब्ळ् wrote: wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
On Thu, Dec 4, 2008 at 2:28 PM, wojtekpia [EMAIL PROTECTED] wrote: I'm seeing some strange behavior with my garbage collector that disappears when I turn off this optimization. I'm running load tests on my deployment. For the first few minutes, everything is fine (and this patch does make things faster - I haven't quantified the improvement yet). After that, the garbage collector stops collecting. Specifically, the new generation part of the heap is full, but never garbage collected, and the old generation is emptied, then never gets anything more. Are you doing commits at any time? One possibility is the caching mechanism (weak-ref on the IndexReader)... that's going to be changing soon hopefully. -Yonik This throttles Solr performance (average response times that used to be ~500ms are now ~25s). I described my deployment scenario in an earlier post: http://www.nabble.com/Throughput-Optimization-td20335132.html Does it sound like the new faceting algorithm could be the culprit? wojtekpia wrote: Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള് नोब्ळ् wrote: wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20840622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
Yonik Seeley wrote: Are you doing commits at any time? One possibility is the caching mechanism (weak-ref on the IndexReader)... that's going to be changing soon hopefully. -Yonik No commits during this test. Should I start looking into my heap size distribution and garbage collector selection? -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20841219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
On Thu, Dec 4, 2008 at 2:57 PM, wojtekpia [EMAIL PROTECTED] wrote: Yonik Seeley wrote: Are you doing commits at any time? One possibility is the caching mechanism (weak-ref on the IndexReader)... that's going to be changing soon hopefully. -Yonik No commits during this test. Should I start looking into my heap size distribution and garbage collector selection? Hmmm, OK. The other big difference would then be that retrieving the top facets requires creating a Lucene TermEnum (not all facet values are stored in memory). The lucene version in Solr has changed since I did long running tests... with various Lucene changes to thread-local caching, etc. I'll try and reproduce. Or maybe this is somehow a GC bug just tickled by the current caching mechanism? (weak hash map) -Yonik
Re: new faceting algorithm
Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Yonik Seeley wrote: And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47, nTerms=285, bigTerms=99, termInstances=186 -Yonik -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote: Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Thanks for the reminder, I need to document this in the wiki. facet.method=enum (enumerate terms and do intersections, the old default) facet.method=fc (fieldcache method, the new default) -Yonik Yonik Seeley wrote: And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47, nTerms=285, bigTerms=99, termInstances=186 -Yonik -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world On Tue, Dec 2, 2008 at 11:45 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Tue, Dec 2, 2008 at 1:10 PM, wojtekpia [EMAIL PROTECTED] wrote: Is there a configurable way to switch to the previous implementation? I'd like to see exactly how it affects performance in my case. Thanks for the reminder, I need to document this in the wiki. facet.method=enum (enumerate terms and do intersections, the old default) facet.method=fc (fieldcache method, the new default) -Yonik Yonik Seeley wrote: And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47, nTerms=285, bigTerms=99, termInstances=186 -Yonik -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20797812.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: new faceting algorithm
Definitely, but it'll take me a few days. I'll also report findings on SOLR-465. (I've been on holiday for a few weeks) Noble Paul നോബിള് नोब्ळ् wrote: wojtek, you can report back the numbers if possible It would be nice to know how the new impl performs in real-world -- View this message in context: http://www.nabble.com/new-faceting-algorithm-tp20674902p20798456.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: new faceting algorithm
And if you want to verify that the new faceting code has indeed kicked in, some statistics are logged, like: Nov 24, 2008 11:14:32 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field features, memSize=14584, time=47, phase1=47, nTerms=285, bigTerms=99, termInstances=186 -Yonik On Mon, Nov 24, 2008 at 11:12 PM, Yonik Seeley [EMAIL PROTECTED] wrote: A new faceting algorithm has been committed to the development version of Solr, and should be available in the next nightly test build (will be dated 11-25). This change should generally improve field faceting where the field has many unique values but relatively few values per document. This new algorithm is now the default for multi-valued fields (including tokenized fields) so you shouldn't have to do anything to enable it. We'd love some feedback on how it works to ensure that it actually is a win for the majority and should be the default. -Yonik