[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369433#comment-16369433 ] Nikolay Khitrin commented on SOLR-8096: --- Please take a look at LUCENE-8178 patch, I've got up to 2 - 2.5x facetting performance boost on real index (35M docs) by DocValues block unpacking and position lookup reducing. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153967#comment-16153967 ] Shawn Heisey commented on SOLR-8096: bq. I never used the so called optimize functionality so far and now realized that the index is completely rebuild which means e.g. duplication of disk space. Actually we can't do this because our infrastructure isn't designed for this. I think the Solr reference guide may be missing one of the most critical recommendations with *any* Lucene-based software: Always run with enough disk space so that your index can triple in size temporarily. This recommendation is not just for running an optimize -- normal segment merging that happens during indexing can also double the size of the index temporarily. There is only one scenario I know of that can actually triple the index size (temporarily). It is a very specific scenario that may be uncommon in practice, but does happen in the wild. Therefore perhaps the recommendation should be amended a little bit to read: "Always run with enough disk space so your indexes can double in size temporarily, unless you frequently perform reindexes without deleting all the index data first, in which case you should allow for the index to triple in size temporarily." > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153322#comment-16153322 ] Guenter Hipler commented on SOLR-8096: -- [~emaijala] I can understand your argument. I came across another hurdle. I never used the so called optimize functionality so far and now realized that the index is completely rebuild which means e.g. duplication of disk space. Actually we can't do this because our infrastructure isn't designed for this. Not to talk about the more complicated work flows we have to take into account. In contrast to yesterday at the moment I think using the conventional master / slave model as we are running it by now together with the uif method for facets isn't an option for us. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153272#comment-16153272 ] Ere Maijala commented on SOLR-8096: --- [~guenterh], I believe one should be able to use Solr's date range fields as intended. Besides, it's not possible to handle multivalued date ranges with ints. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152866#comment-16152866 ] Guenter Hipler commented on SOLR-8096: -- I run a lot of tests in the last days (partially I could use old archived queries from our productive system based on 4.10 together with the original query times so I was able to compare the processing times) My findings: * using uif method for multifield valued fields but without docvalues (this doesn't work at all) seems to solve most of our current use - cases * [~emaijala] >> Trying to use facet.method=uif with a solr.DateRangeField causes the following exception: << we use only Int types for publishing dates - this works for range facets. Perhaps a possibility for you? * all our disks are SSD based - the index is not cached in memory, this wouldn't be possible for us with an 110G index * So in general I think our overdue update from version 4.10 to 6.x now might be an option * the use case described by [~emaijala] where facet buckets > 200 are causing a performance penal is from my point of view not very often - so I guess/hope we can live with this * But I have a great concern: I think it's problematic if we have to run an aggressive policy for merging segments quite often because it's really resource intensive * my question: [~yo...@apache.org] Yonik, do you have an idea/plan how to unify (to bring together) the diverged developments in the Lucene area (docvalues) with the current Solr facet algorithms? I think it's no option to make only some optimizations here and there at least in the medium-term view I would be happy to support this process with hints and metrics from the user side Günter > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151085#comment-16151085 ] Toke Eskildsen commented on SOLR-8096: -- [~elyograg] using the Lucene faceting code is discussed in part in SOLR-7296. As I have zero experience with that code, I cannot say how hard it would be to implement. [~emaijala] for what it's worth, I find all of your point to be valid. Faceting tweaks is too much of a dark art. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150525#comment-16150525 ] Ere Maijala commented on SOLR-8096: --- Chiming in as one of those affected by performance issues with faceting. I've been testing with a 57 million record index of bibliographic data. A faceting request that used to take around 20ms in Solr 4.10.2 is at least 2600ms in Solr 6.6.0. While in general I find it fine to change the default behavior to something that works better than before for a majority of use cases, there should be a way to maintain performance in other cases. My main issue at the moment is that even facet.method=uif is slow if you request more than a few items. In a smaller test index of 6 million records I can get the top 20 results in 4ms, but facet.limit=200 takes ~100ms and facet.limit=2000 takes ~1300ms (the facet has 1960 buckets). Params user for the query: q=*:*=0=true=building=1=[20-2000]=true=uif Anyway, here's a list of issues that, for me, seem to be contribute to all the confusion around faceting performance: # As far as I can see, facet.method=uif is completely undocumented apart from a short entry in release notes. # Also undocumented is the fact (as observed during testing) that docValues must not be enabled for facet.method=uif to do any good. Otherwise the performance can be even worse than with FC. # There's no proper documentation on what the introduction of docValues means in practice. There are several articles about what good it brings but I couldn't find much of analysis on any possible downsides. # facet.method=uif with Solr 6.6.0 is still very slow compared to that in Solr 4.10.2 if you request more than a few entries. # There was no way to get back UIF before SOLR-8466. # Changes in behavior haven't really been documented. This is how the introduction of docValues was documented in the release notes of Solr 4.2.0: "SOLR-3855, SOLR-4490: Doc values support". That doesn't help a poor developer like me to get the big picture. Then I read in https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/ that compared to what we used to have _"DocValues aim to alleviate both of these problems while keeping performance comparable."_ Of course that's just something I read on internet, but so far it's the best description of docValues I've read and makes it sound like there won't be significant performance differences. # It should be possible to make an informed decision to go with something that uses more JVM memory and is slower to warm up if required by the use-case. This is difficult because information is so scattered and the Solr reference guide doesn't go into much detail. For instance the effect of docValues is not mentioned in the reference guide where facet.method is described. # Solr'd documentation on DocValues (https://lucene.apache.org/solr/guide/6_6/docvalues.html) highlights the positive effects it has on performance, memory consumption etc. It starts with _"DocValues are a way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing."_ That sounds like something you should enable as quickly as possible to reap the benefits! # Discussions about docValues in solr-user list also mostly recomment enabling docValues without discussing any caveats. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. >
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149190#comment-16149190 ] Yonik Seeley commented on SOLR-8096: bq. For them, enabling docValues, which is supposed to be the magic bullet for faceting performance, causes performance to get even worse. Yep. DocValues is a better default because it uses little heap memory compared to the FieldCache. But in general, docValues can be slower than the old 4.x fieldCache, and definitely slower than UnInvertedField for multi-valued faceting. For dense fields, the newest iterator-based docValues is also somewhat slower than the old docValues. This isn't just Solr... for example, sorting on dense docValues fields is also slower since the cut-over to iterator docValues. Anyway, specific use-cases can pretty much always be sped up, but there's no magic bullet and we need to tackle them one at a time. For example, facet.method=uif was added to re-enable access to the UnInvertedField faceting method. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149168#comment-16149168 ] Patrick Schemitz commented on SOLR-8096: Vielen Dank für Ihre Nachricht. Ich bin bis auf weiteres nicht im Büro erreichbar. (Vorr. wieder ab 04.09.) Bitte wenden Sie sich in dringenden Fällen an Isabel KrautBitte beachten Sie, dass Ihre E-Mail während meiner Abwesenheit nicht weitergeleitet wird. Viele Grüße aus Karlsruhe Patrick Schemitz -- Dr. Patrick Schemitz Senior Scientist billiger.de solute gmbh > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149165#comment-16149165 ] Shawn Heisey commented on SOLR-8096: Discussing how we got here and who might be to blame is not something to do here. The fact is that current Solr versions have a major performance regression for faceting, and probably for other things like grouping. In the last couple of weeks, someone on the solr-user mailing list has encountered very slow results with our most recent version (6.6.0 right now) compared to 4.x versions. For them, enabling docValues, which is supposed to be the magic bullet for faceting performance, causes performance to get even worse. If I had any understanding of how this code worked and the precise reasons it has become slower, I would be working on a solution. For those Solr committers who *do* know that part of the code: Is there anything a user can do to speed this up? Is there anything we can do in the Solr code to fix the regression? Possibly insane idea: Can Solr leverage the faceting code in Lucene itself? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: facetcache.diff, simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949319#comment-15949319 ] Erick Erickson commented on SOLR-8096: -- OK, what's the status of this JIRA? Last comment was 9 months ago > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331393#comment-15331393 ] Alessandro Benedetti commented on SOLR-8096: Yes David, you know, not the cleanest solution but it's the same approach used for a lot of other legacy facet method "bugs" or incompatibility. The debug for the facet method applied is already in the trunk,part of SOLR-9176, will be logged both the method in input by the user and the method selected by Solr . I can contribute a small patch in the afternoon to force UIF when docValues are not available . Cheers > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331106#comment-15331106 ] David Smiley commented on SOLR-8096: bq. A work-around could be to force UIF if you have selected FC/FCS without docValues. +1. Then it's just as before (in 4x); no? Separately it'd be nice if debug output showed which method was chosen. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329683#comment-15329683 ] Alessandro Benedetti commented on SOLR-8096: Mmmm actually it is still a regression. If you were using fc/fcs without docValues, you will still see the regression. A work-around could be to force UIF if you have selected FC/FCS without docValues. But I definitely don't like that much this approach in "hiding" legacy facets bugs under forcing of other methods :( What do you think ? Cheers > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329588#comment-15329588 ] David Smiley commented on SOLR-8096: Since facet.method=uif SOLR-8466 and now that facet.method=enum works again SOLR-9176 is there anything left to do here or should it be closed? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308387#comment-15308387 ] Joel Bernstein commented on SOLR-8096: -- I haven't reviewed the code, but if the enum faceting is actually using FCS then this is a bug. It would also explain the regression on enum faceting that has been reported. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308099#comment-15308099 ] Alessandro Benedetti commented on SOLR-8096: We found finally our suspect ! https://issues.apache.org/jira/browse/SOLR-9176 I would like an opinion soon, it seems to me, a mistake, as the code is not equivalent and in the commit message there is no reason why we have lost the possibility of using the term Enum for single valued numeric fields. For static indexes I can confirm this causes a visible performance regression ( very hidden as you think to use Term Enum while actually Solr uses FCS under the hood) > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298469#comment-15298469 ] Alessandro Benedetti commented on SOLR-8096: Just adding some additional information as I just incurred on the issue with Solr 6.0 : Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high cardinality on top of grouping. Groping was not affecting at all. All the symptoms are there, Solr 4.10.2 around 150 ms and Solr 6.0 around 550 ms . The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0. In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio of 0.96 . Switching from enum to fc to fcs to uif did not change that much. Moving to DocValues didn't improve that much the situation ( but I was on an optimized index, so I need to try the multi-segmented one according to [~mkhludnev] contribution in Solr 5.4.0 ) . Moving to field collapsing moved down the query to 110-120 ms ( but this is normal, we were faceting on 260 /1 million orignal docs) Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination with field collapsing we reached 80-90 ms when warmed. What are the plan for the future related this ? Do we want to deprecate the legacy facets implementation and move everything to Json facets ( like it happened with the UIF ) ? So backward compatible but different implementation ? Cheers > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0 >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132395#comment-15132395 ] Nikolay Khitrin commented on SOLR-8096: --- I can confirm performance issue for new Solr 5.4.1 1725212 with 40M docs index and 6.7M unique terms per multivalued field. Facetting takes more than 3 seconds on Solr 5.4.1 and 190ms on Solr 4.4.0 over near-identical indexes. My opinion is that DocValues API is JIT-unfriendly. LongValues.get is not monomorphic call and in single running Solr instance there are at least DirectMonotonicReader$2 and several DirectReader.DirectPackedReader* implementations in use. It is very good approach in OOP terms, but for facetting we need read a lot of memory (for ex. from memory-mapped inputs) really fast and SortedSetDocValues-LongValues-RandomAccessInput chain should inline and compile into simple memory read assembly. UnInvertedField, itself, is very solid class and can be optimized by JIT really hard. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > Attachments: simple_facets.diff > > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065639#comment-15065639 ] Jamie Johnson commented on SOLR-8096: - Sorry if this is the incorrect place for this, but I took a stab at trying to implement supporting uninverted field based facets in SimpleFacets. i am not sure it's 100% there but I think it's close. The basic approach was to leverage as much as possible from the JSON Faceting API since that is the only consumer of the UIF that I could find. This meant I had to make some classes public that were previously package protected (Perhaps moving SimpleFacets into the facet package would have been better?). Additionally, I had to make FacetProcessor aware of Grouping so that the docset would be adjusted appropriately for grouping requests with truncate set to true. Also I did this by adding DV as a new FacetMethod and made it so this is what triggers using DocValues vs FC which currently triggers it. Perhaps it would be more appropriate to add a new FacetMethod named UIF and leave FC alone? I'm open to suggestions here. Last significant difference from the 4.10.4 implementation is I didn't attempt to use the Lucene FieldCache at all since it was made package protected. The 4.10.4 implementation used that in cases, but this should be inline with what JSON Facets is doing. The commits are attached as a patch to this ticket (I'm happy to spawn off a new ticket if it's more appropriate) and also available at https://github.com/jej2003/lucene-solr > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065185#comment-15065185 ] Jamie Johnson commented on SOLR-8096: - While some (all?) of the performance issues are addressed, would it not still be useful to add an option to support either faceting approach? I understand the benefits of DocValues but we have a case where the facets need to be calculated based on an access level the user has. Simply storing in a separate field is not an option because the access controls are complex. Given that the JSON Facet API allows developers to choose the faceting method it would seem reasonable to provide similar functionality here, no? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948700#comment-14948700 ] Uwe Reh commented on SOLR-8096: --- Uwe Schindler wrote: "Please keep in mind that it took about half a year until the first one recognized a problem like this, which makes me think that only few people are using those mostly-static indexes." I reported the issue in the list and I disagree this view at both points. 1) I noticed the the Problem months ago, but I thought, that it was a problem of my poorly configured SolrCloud. I had to postpone the project. 2) Nearly all productive installations I know, are still running on Solr 4.x some even with Solr 3.6. The applications have been designed years ago. For all of them, there was no need to change a boring but well running production environment. Yes, the new features for faceting have been announced, but I had no time to follow. Since there was no warning in the release notes, I thought it's a good idea to upgrade first. Sorry for being a bit off topic Uwe > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948050#comment-14948050 ] Bill Bell commented on SOLR-8096: - Are we adding it back and adding an option to enable it? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942371#comment-14942371 ] Mike Murphy commented on SOLR-8096: --- Sorry Uwe, you are right as I said to Erick. I will delete the accusations. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909836#comment-14909836 ] Mikhail Khludnev commented on SOLR-8096: [~ysee...@gmail.com], I suppose benchmarking post SOLR-7730 (5.4.0) shows fewer gain between DV and UnInvertedField. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was removed as part of LUCENE-5666, causing > severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. > edit: removed "secret" adverb by request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907870#comment-14907870 ] Uwe Schindler commented on SOLR-8096: - bq. Use of the highly optimized faceting that Solr had for multi-valued fields over relatively static indexes was secretly removed as part of LUCENE-5666, causing severe performance regressions. Hi, the removal was not "secret". Removal of FieldCache from Lucene (and replacement by UninvertingReader) was discussed on the Issue tracker, although interest by Solr people was small. I think this is the main issue here. Sometimes it would be good to have Solr committers taking part of discussions on Lucene issues. If you want to make Solr bettre, you should also help in making Lucene better! The old field cache was also put into a separate module (with the new DocValues emulating-API), because we (Lucene Committers) knew that Solr still uses it. Sure, we could have used UninvertingReader on top of SlowCompositeReaderWrapper, but this would bring other slowness! So the committers decided to step forward and remove the top-level facetting (which was long overdue). It was announced in several talks about Lucene 5 that FieldCache was removed and all facetting in Solr was implicitely changed to only use per segment field caches (e.g., see my talk @ focdem 2015, JAX 2015, or berlinbuzzwords - around one of the last slides). Maybe there should have been added a changes entry also to the Solr CHANGES.txt about this, but The CHANGES.txt about this entry was, the first line mentions that facetting in Solr is involved. Any Solr committer could have looked into the code and bring up complaints about those changes in the issue tracker also after this commit has been done: {quote} * LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache. For FieldCache functionality, use UninvertingReader in lucene/misc (or implement your own FilterReader). UninvertingReader is more efficient: supports multi-valued numeric fields, detects when a multi-valued field is single-valued, reuses caches of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access without insanity). "Insanity" is no longer possible unless you explicitly want it. Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*. Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which takes the same selectors. Add helper methods to DocValues.java that are better suited for search code (never return null, etc). (Mike McCandless, Robert Muir) {quote} bq. The people who did this are elasticsearch employees. That is one way to deal with Solr's faster faceting! This is speculation and really a bad behaviour on an Open Source issue tracker. We should discuss here about technical stuff, not make any assumptions about what people intend to do. This statement was posted by a person ([~mmurphy3141]) who I never met in person, and who really seldem took place in Lucene/Solr discussions at all. So I don't think we should count on that. It is also bad behaviour to accuse committers on twitter about sabotage: https://twitter.com/mmurphy3141/status/647254551356162048; please don't do this. I would ask to remove this tweet, thanks. I was informed about the changes mentioned here and I strongly agree with the committers behind LUCENE-5666. I was always in favour of removing those top-level facetting algorithms. So they still have my strong +1. On my Solr customers I have seen nobody who complained about slow top-level facetting (because I told them long time ago to no longer use those outdated top-level algorithms if they have dynamic indexes). The right thing to do for Solr people would be to remove those top-level stuff completely. This is no longer fitting the new reader structure (composite and atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new structure in Lucene 4). Lucene 3 is now several years retired already! So there was long time to fix Solr's facetting to go away from top-level. People with static indexes can still force merge their index and will have the same performance with the new algorithms. Please keep in mind that it took about half a year until the first one recognized a problem like this, which makes me think that only few people are using those mostly-static indexes. *We should work on this issue to fix the issue, not accuse people, thanks!* > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908063#comment-14908063 ] Yonik Seeley commented on SOLR-8096: Once again, UnInvertedField was not part of the lucene FieldCache. It was a Solr class cached in SolrIndexSearcher (via fieldValueCache), did not implement the DocValues API, etc. The *lucene* FieldCache was made package protected (an implementation detail) so one would need to access it via DocValues. That's what the issue was about. bq. So the committers decided to step forward and remove the top-level facetting (which was long overdue). Where was this discussion? I see nothing about it on LUCENE-5666 And of course I would have given a -1 to such a change for being dogmatic over practical and not caring about our users. bq. I was informed about the changes mentioned here Where did this discussion take place? I can't find it in any public forum. bq. I was always in favour of removing those top-level facetting algorithms. So they still have my strong +1. With no benchmarking of how the replacement performed? No option to use the old method if a user *wanted* to? Without any public discussion of the impacts? Without any note in Solr's CHANGES? So you were strongly for the change, but you knew I'd most likely be against it, right (based on previous discussions about top-level data structures)? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907515#comment-14907515 ] Mike Murphy commented on SOLR-8096: --- The people who did this are elasticsearch employees. That is one way to deal with Solr's faster faceting! This smells like the VW pollution scandal for lucene/solr/elasticsearch, except perhaps no consequences for those who pulled it off? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907568#comment-14907568 ] Yonik Seeley commented on SOLR-8096: bq. Are you sure it was secret and not just a mistake? Yes. - This algorithm had been relied apon by many since 2008 (SOLR-475), and completely removing it's use and replacing it would obviously warrant discussion, benchmarks, etc. - This was a massive patch, and relevant changes should be called out, esp if changes seem unrelated to the issue's description. - If you search the JIRA issue, "UnInvertedField" *never* appears. (the linked issues mention it now, but those were added by us after the fact) - The issue's title is "Add UninvertingReader" and the description had to do with Lucene's FieldCache, which UnInvertedField is not part of. - There is *no* mention of the issue or changes anywhere in Solr's CHANGES.txt - When asked to comment on impacts of this massive patch, the answer given was "Is the CHANGES.txt entry not good here? The docvalues apis did not change..." - The CHANGES entry for lucene made no mention of the change to Solr or UnInvertedField. - Although the UnInvertedField code was left behind (as dead code), the removal of the use of UnInvertedField was *not* by mistake - you can see by the test code that was explicitly removed. (TestFaceting.java) Exactly what other conclusion is there to draw? Massive incompetence? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907552#comment-14907552 ] Mike Murphy commented on SOLR-8096: --- Are you sure it was secret and not just a mistake? > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907587#comment-14907587 ] Mike Murphy commented on SOLR-8096: --- Erik, you're right. I do not know what the motivations were. Although after looking at it, the evidence is compelling that there was a coverup. What do you think the motivation was? The fact that it was elasticsearch employees could have also been a coincidence. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907563#comment-14907563 ] Erick Erickson commented on SOLR-8096: -- [~mmurphy3141] Whoa! I don't know whether your comment was meant sarcastically or in any other humorous sense, but it stands a very good chance of being seriously received no matter what your intent. Let's find out what the background is here before casting aspersions. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8096) Major faceting performance regressions
[ https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907614#comment-14907614 ] Erick Erickson commented on SOLR-8096: -- I'm not going there. Speculating about motives does not have any _useful_ outcome, it just provides fodder for flame wars. Discussing fixing the issues is far more fruitful. And less painful for me to read. > Major faceting performance regressions > -- > > Key: SOLR-8096 > URL: https://issues.apache.org/jira/browse/SOLR-8096 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk >Reporter: Yonik Seeley >Priority: Critical > > Use of the highly optimized faceting that Solr had for multi-valued fields > over relatively static indexes was *secretly removed* as part of LUCENE-5666, > causing severe performance regressions. > Here are some quick benchmarks to gauge the damage, on a 5M document index, > with each field having between 0 and 5 values per document. *Higher numbers > represent worse 5x performance*. > Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time > ||...|| Percent of index being faceted > ||num_unique_values|| 10% || 50% || 90% || > |10 | 351.17% | 1587.08% | 3057.28% | > |100 | 158.10% | 203.61% | 1421.93% | > |1000 | 143.78% | 168.01% | 1325.87% | > |1| 137.98% | 175.31% | 1233.97% | > |10 | 142.98% | 159.42% | 1252.45% | > |100 | 255.15% | 165.17% | 1236.75% | > For example, a field with 1000 unique values in the whole index, faceting > with 5x took 143% of the 4x time, when ~10% of the docs in the index were > faceted. > One user who brought the performance problem to our attention: > http://markmail.org/message/ekmqh4ocbkwxv3we > "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3) > The disabling of the UnInvertedField algorithm was previously discovered in > SOLR-7190, but we didn't know just how bad the problem was at that time. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org