[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2018-02-19 Thread Nikolay Khitrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369433#comment-16369433
 ] 

Nikolay Khitrin commented on SOLR-8096:
---

Please take a look at LUCENE-8178 patch, I've got up to 2 - 2.5x facetting 
performance boost on real index (35M docs) by DocValues block unpacking and 
position lookup reducing.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-09-05 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153967#comment-16153967
 ] 

Shawn Heisey commented on SOLR-8096:


bq. I never used the so called optimize functionality so far and now realized 
that the index is completely rebuild which means e.g. duplication of disk 
space. Actually we can't do this because our infrastructure isn't designed for 
this.

I think the Solr reference guide may be missing one of the most critical 
recommendations with *any* Lucene-based software:  Always run with enough disk 
space so that your index can triple in size temporarily.  This recommendation 
is not just for running an optimize -- normal segment merging that happens 
during indexing can also double the size of the index temporarily.

There is only one scenario I know of that can actually triple the index size 
(temporarily).  It is a very specific scenario that may be uncommon in 
practice, but does happen in the wild.  Therefore perhaps the recommendation 
should be amended a little bit to read:  "Always run with enough disk space so 
your indexes can double in size temporarily, unless you frequently perform 
reindexes without deleting all the index data first, in which case you should 
allow for the index to triple in size temporarily."


> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-09-05 Thread Guenter Hipler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153322#comment-16153322
 ] 

Guenter Hipler commented on SOLR-8096:
--

[~emaijala] I can understand your argument.

I came across another hurdle. I never used the so called optimize functionality 
so far and now realized that the index is completely rebuild which means e.g. 
duplication of disk space. Actually we can't do this because our infrastructure 
isn't designed for this. Not to talk about the more complicated work flows we 
have to  take into account.
In contrast to yesterday at the moment I think using the conventional master / 
slave model as we are running it by now together with the uif method for facets 
isn't an option for us.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-09-05 Thread Ere Maijala (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153272#comment-16153272
 ] 

Ere Maijala commented on SOLR-8096:
---

[~guenterh], I believe one should be able to use Solr's date range fields as 
intended. Besides, it's not possible to handle multivalued date ranges with 
ints.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-09-04 Thread Guenter Hipler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152866#comment-16152866
 ] 

Guenter Hipler commented on SOLR-8096:
--

I run a lot of tests in the last days (partially I could use old archived 
queries from our productive system based on 4.10 together with the original 
query times so I was able to compare the processing times) My findings:

* using uif method for multifield valued fields but without docvalues (this 
doesn't work at all) seems to solve most of our current use - cases
* [~emaijala] >> Trying to use facet.method=uif with a solr.DateRangeField 
causes the following exception:  <<
we use only Int types for publishing dates - this works for range facets. 
Perhaps a possibility for you?
* all our disks are SSD based - the index is not cached in memory, this 
wouldn't be possible for us with an 110G index
* So in general I think our overdue update from version 4.10 to 6.x now might 
be an option
* the use case described by [~emaijala] where facet buckets > 200 are causing a 
performance penal is from my point of view not very often - so I guess/hope we 
can live with this   
* But I have a great concern:
I think it's problematic if we have to run an aggressive policy for merging 
segments quite often because it's really resource intensive

* my question:
[~yo...@apache.org] Yonik, do you have an idea/plan how to unify (to bring 
together) the diverged developments in the Lucene area (docvalues) with the 
current Solr facet algorithms? I think it's no option to make only some 
optimizations here and there at least in the medium-term view
I would be happy to support this process with hints and metrics from the user 
side

Günter 


> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-09-01 Thread Toke Eskildsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151085#comment-16151085
 ] 

Toke Eskildsen commented on SOLR-8096:
--

[~elyograg] using the Lucene faceting code is discussed in part in SOLR-7296. 
As I have zero experience with that code, I cannot say how hard it would be to 
implement.

[~emaijala] for what it's worth, I find all of your point to be valid. Faceting 
tweaks is too much of a dark art.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-09-01 Thread Ere Maijala (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150525#comment-16150525
 ] 

Ere Maijala commented on SOLR-8096:
---

Chiming in as one of those affected by performance issues with faceting. I've 
been testing with a 57 million record index of bibliographic data. A faceting 
request that used to take around 20ms in Solr 4.10.2 is at least 2600ms in Solr 
6.6.0. While in general I find it fine to change the default behavior to 
something that works better than before for a majority of use cases, there 
should be a way to maintain performance in other cases. 

My main issue at the moment is that even facet.method=uif is slow if you 
request more than a few items. In a smaller test index of 6 million records I 
can get the top 20 results in 4ms, but facet.limit=200 takes ~100ms and 
facet.limit=2000 takes ~1300ms (the facet has 1960 buckets). Params user for 
the query:

q=*:*=0=true=building=1=[20-2000]=true=uif
 

Anyway, here's a list of issues that, for me, seem to be contribute to all the 
confusion around faceting performance:

# As far as I can see, facet.method=uif is completely undocumented apart from a 
short entry in release notes.
# Also undocumented is the fact (as observed during testing) that docValues 
must not be enabled for facet.method=uif to do any good. Otherwise the 
performance can be even worse than with FC.
# There's no proper documentation on what the introduction of docValues means 
in practice. There are several articles about what good it brings but I 
couldn't find much of analysis on any possible downsides.
# facet.method=uif with Solr 6.6.0 is still very slow compared to that in Solr 
4.10.2 if you request more than a few entries.
# There was no way to get back UIF before SOLR-8466.
# Changes in behavior haven't really been documented. This is how the 
introduction of docValues was documented in the release notes of Solr 4.2.0: 
"SOLR-3855, SOLR-4490: Doc values support". That doesn't help a poor developer 
like me to get the big picture. Then I read in 
https://lucidworks.com/2013/04/02/fun-with-docvalues-in-solr-4-2/ that compared 
to what we used to have _"DocValues aim to alleviate both of these problems 
while keeping performance comparable."_ Of course that's just something I read 
on internet, but so far it's the best description of docValues I've read and 
makes it sound like there won't be significant performance differences.
# It should be possible to make an informed decision to go with something that 
uses more JVM memory and is slower to warm up if required by the use-case. This 
is difficult because information is so scattered and the Solr reference guide 
doesn't go into much detail. For instance the effect of docValues is not 
mentioned in the reference guide where facet.method is described.
# Solr'd documentation on DocValues 
(https://lucene.apache.org/solr/guide/6_6/docvalues.html) highlights the 
positive effects it has on performance, memory consumption etc. It starts with 
_"DocValues are a way of recording field values internally that is more 
efficient for some purposes, such as sorting and faceting, than traditional 
indexing."_ That sounds like something you should enable as quickly as possible 
to reap the benefits!
# Discussions about docValues in solr-user list also mostly recomment enabling 
docValues without discussing any caveats.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> 

[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-08-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149190#comment-16149190
 ] 

Yonik Seeley commented on SOLR-8096:


bq. For them, enabling docValues, which is supposed to be the magic bullet for 
faceting performance, causes performance to get even worse.

Yep.  DocValues is a better default because it uses little heap memory compared 
to the FieldCache.  But in general, docValues can be slower than the old 4.x 
fieldCache, and definitely slower than UnInvertedField for multi-valued 
faceting.  For dense fields, the newest iterator-based docValues is also 
somewhat slower than the old docValues.  This isn't just Solr... for example, 
sorting on dense docValues fields is also slower since the cut-over to iterator 
docValues.  Anyway, specific use-cases can pretty much always be sped up, but 
there's no magic bullet and we need to tackle them one at a time.  For example, 
facet.method=uif was added to re-enable access to the UnInvertedField faceting 
method.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-08-31 Thread Patrick Schemitz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149168#comment-16149168
 ] 

Patrick Schemitz commented on SOLR-8096:


Vielen Dank für Ihre Nachricht.

Ich bin bis auf weiteres nicht im Büro erreichbar. (Vorr. wieder ab 04.09.)
Bitte wenden Sie sich in dringenden Fällen an Isabel Kraut 

Bitte beachten Sie, dass Ihre E-Mail während meiner Abwesenheit nicht 
weitergeleitet wird.

Viele Grüße aus Karlsruhe
Patrick Schemitz

--
Dr. Patrick Schemitz
Senior Scientist

billiger.de
solute gmbh



> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-08-31 Thread Shawn Heisey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149165#comment-16149165
 ] 

Shawn Heisey commented on SOLR-8096:


Discussing how we got here and who might be to blame is not something to do 
here.

The fact is that current Solr versions have a major performance regression for 
faceting, and probably for other things like grouping.  In the last couple of 
weeks, someone on the solr-user mailing list has encountered very slow results 
with our most recent version (6.6.0 right now) compared to 4.x versions.  For 
them, enabling docValues, which is supposed to be the magic bullet for faceting 
performance, causes performance to get even worse.

If I had any understanding of how this code worked and the precise reasons it 
has become slower, I would be working on a solution.  For those Solr committers 
who *do* know that part of the code: Is there anything a user can do to speed 
this up?  Is there anything we can do in the Solr code to fix the regression?  
Possibly insane idea: Can Solr leverage the faceting code in Lucene itself?


> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: facetcache.diff, simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2017-03-30 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949319#comment-15949319
 ] 

Erick Erickson commented on SOLR-8096:
--

OK, what's the status of this JIRA? Last comment was 9 months ago

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-06-15 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331393#comment-15331393
 ] 

Alessandro Benedetti commented on SOLR-8096:


Yes David,
you know, not the cleanest solution but it's the same approach used for a lot 
of other legacy facet method "bugs" or incompatibility.
The debug for the facet method applied is already in the trunk,part of 
SOLR-9176, will be logged both the method in input by the user and the method 
selected by Solr .

I can contribute a small patch in the afternoon to force UIF when docValues are 
not available .

Cheers

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-06-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331106#comment-15331106
 ] 

David Smiley commented on SOLR-8096:


bq. A work-around could be to force UIF if you have selected FC/FCS without 
docValues.

+1.  Then it's just as before (in 4x); no?

Separately it'd be nice if debug output showed which method was chosen.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-06-14 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329683#comment-15329683
 ] 

Alessandro Benedetti commented on SOLR-8096:


Mmmm actually it is still a regression.
If you were using fc/fcs without docValues, you will still see the regression.
A work-around could be to force UIF if you have selected FC/FCS without 
docValues.
But I definitely don't like that much this approach in "hiding" legacy facets 
bugs under forcing of other methods :(

What do you think ?

Cheers

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-06-14 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329588#comment-15329588
 ] 

David Smiley commented on SOLR-8096:


Since facet.method=uif SOLR-8466 and now that facet.method=enum works again 
SOLR-9176 is there anything left to do here or should it be closed?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-05-31 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308387#comment-15308387
 ] 

Joel Bernstein commented on SOLR-8096:
--

I haven't reviewed the code, but if the enum faceting is actually using FCS 
then this is a bug. It would also explain the regression on enum faceting that 
has been reported.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-05-31 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308099#comment-15308099
 ] 

Alessandro Benedetti commented on SOLR-8096:


We found finally our suspect !

https://issues.apache.org/jira/browse/SOLR-9176

I would like an opinion soon, it seems to me, a mistake, as the code is not 
equivalent and in the commit message there is no reason why we have lost the 
possibility of using the term Enum for single valued numeric fields.
For static indexes I can confirm this causes a visible performance regression ( 
very hidden as you think to use Term Enum while actually Solr uses FCS under 
the hood)

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-05-24 Thread Alessandro Benedetti (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298469#comment-15298469
 ] 

Alessandro Benedetti commented on SOLR-8096:


Just adding some additional information as I just incurred on the issue with 
Solr 6.0 :
Static index, around 50 *10^6 docs, 20 fields to facet, 1 of them with high 
cardinality on top of grouping.
Groping was not affecting at all.

All the symptoms are there, Solr 4.10.2 around 150 ms  and Solr 6.0 around 550 
ms .
The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 6.0.
In Solr 4.10 the 'fieldValueCache' is in heavy use with a cumulative_hitratio 
of 0.96 .
Switching from enum to fc to fcs to uif did not change that much.

Moving to DocValues didn't improve that much the situation ( but I was on an 
optimized index, so I need to try the multi-segmented one according to 
[~mkhludnev] contribution in Solr 5.4.0 ) .

Moving to field collapsing moved down the query to 110-120 ms ( but this is 
normal, we were faceting on 260 /1 million orignal docs)
Adding facet.threads=NCores moved down the queryTime to 100 ms, in combination 
with field collapsing we reached 80-90 ms when warmed.

What are the plan for the future related this ?
Do we want to deprecate the legacy facets implementation and move everything to 
Json facets ( like it happened with the UIF ) ?
So backward compatible but different implementation ?

Cheers

 

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, 6.0
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2016-02-04 Thread Nikolay Khitrin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15132395#comment-15132395
 ] 

Nikolay Khitrin commented on SOLR-8096:
---

I can confirm performance issue for new Solr 5.4.1 1725212 with 40M docs index 
and 6.7M unique terms per multivalued field. 
Facetting takes more than 3 seconds on Solr 5.4.1 and 190ms on Solr 4.4.0 over 
near-identical indexes.

My opinion is that DocValues API is JIT-unfriendly. LongValues.get is not 
monomorphic call and in single running Solr instance there are at least 
DirectMonotonicReader$2 and several DirectReader.DirectPackedReader* 
implementations in use.

It is very good approach in OOP terms, but for facetting we need read a lot of 
memory (for ex. from memory-mapped inputs) really fast and 
SortedSetDocValues-LongValues-RandomAccessInput chain should inline and compile 
into simple memory read assembly.

UnInvertedField, itself, is very solid class and can be optimized by JIT really 
hard.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
> Attachments: simple_facets.diff
>
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-12-19 Thread Jamie Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065639#comment-15065639
 ] 

Jamie Johnson commented on SOLR-8096:
-

Sorry if this is the incorrect place for this, but I took a stab at trying to 
implement supporting uninverted field based facets in SimpleFacets.  i am not 
sure it's 100% there but I think it's close.  The basic approach was to 
leverage as much as possible from the JSON Faceting API since that is the only 
consumer of the UIF that I could find.  This meant I had to make some classes 
public that were previously package protected (Perhaps moving SimpleFacets into 
the facet package would have been better?).  Additionally, I had to make 
FacetProcessor aware of Grouping so that the docset would be adjusted 
appropriately for grouping requests with truncate set to true.

Also I did this by adding DV as a new FacetMethod and made it so this is what 
triggers using DocValues vs FC which currently triggers it.  Perhaps it would 
be more appropriate to add a new FacetMethod named UIF and leave FC alone?  I'm 
open to suggestions here.

Last significant difference from the 4.10.4 implementation is I didn't attempt 
to use the Lucene FieldCache at all since it was made package protected.  The 
4.10.4 implementation used that in cases, but this should be inline with what 
JSON Facets is doing.

The commits are attached as a patch to this ticket (I'm happy to spawn off a 
new ticket if it's more appropriate) and also available at 
https://github.com/jej2003/lucene-solr

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-12-18 Thread Jamie Johnson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065185#comment-15065185
 ] 

Jamie Johnson commented on SOLR-8096:
-

While some (all?) of the performance issues are addressed, would it not still 
be useful to add an option to support either faceting approach?  I understand 
the benefits of DocValues but we have a case where the facets need to be 
calculated based on an access level the user has.  Simply storing in a separate 
field is not an option because the access controls are complex.  Given that the 
JSON Facet API allows developers to choose the faceting method it would seem 
reasonable to provide similar functionality here, no?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-10-08 Thread Uwe Reh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948700#comment-14948700
 ] 

Uwe Reh commented on SOLR-8096:
---

Uwe Schindler wrote:
  "Please keep in mind that it took about half a year until the first one 
recognized a problem like this, which makes me think that only few people are 
using those mostly-static indexes."

I reported the issue in the list and I disagree this view at both points.
1) I noticed the the Problem months ago, but I thought, that it was a problem 
of my poorly configured SolrCloud. I had to postpone the project.
2) Nearly all productive installations I know, are still running on Solr 4.x 
some even with Solr 3.6. The applications have been designed years ago. For all 
of them, there was no need to change a boring but well running production 
environment.

Yes, the new features for faceting have been announced, but I had no time to 
follow. Since there was no warning in the release notes, I thought it's a good 
idea to upgrade first. 

Sorry for being a bit off topic
Uwe

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-10-07 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948050#comment-14948050
 ] 

Bill Bell commented on SOLR-8096:
-

Are we adding it back and adding an option to enable it?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-10-03 Thread Mike Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942371#comment-14942371
 ] 

Mike Murphy commented on SOLR-8096:
---

Sorry Uwe, you are right as I said to Erick.  I will delete the accusations.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-27 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909836#comment-14909836
 ] 

Mikhail Khludnev commented on SOLR-8096:


[~ysee...@gmail.com], I suppose benchmarking post SOLR-7730 (5.4.0) shows fewer 
gain between DV and UnInvertedField.  

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was removed as part of LUCENE-5666, causing 
> severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.
> edit: removed "secret" adverb by request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-25 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907870#comment-14907870
 ] 

Uwe Schindler commented on SOLR-8096:
-

bq. Use of the highly optimized faceting that Solr had for multi-valued fields 
over relatively static indexes was secretly removed as part of LUCENE-5666, 
causing severe performance regressions.

Hi, the removal was not "secret". Removal of FieldCache from Lucene (and 
replacement by UninvertingReader) was discussed on the Issue tracker, although 
interest by Solr people was small. I think this is the main issue here. 
Sometimes it would be good to have Solr committers taking part of discussions 
on Lucene issues. If you want to make Solr bettre, you should also help in 
making Lucene better!

The old field cache was also put into a separate module (with the new DocValues 
emulating-API), because we (Lucene Committers) knew that Solr still uses it. 
Sure, we could have used UninvertingReader on top of 
SlowCompositeReaderWrapper, but this would bring other slowness! So the 
committers decided to step forward and remove the top-level facetting (which 
was long overdue).

It was announced in several talks about Lucene 5 that FieldCache was removed 
and all facetting in Solr was implicitely changed to only use per segment field 
caches (e.g., see my talk @ focdem 2015, JAX 2015, or berlinbuzzwords - around 
one of the last slides). Maybe there should have been added a changes entry 
also to the Solr CHANGES.txt about this, but 

The CHANGES.txt about this entry was, the first line mentions that facetting in 
Solr is involved. Any Solr committer could have looked into the code and bring 
up complaints about those changes in the issue tracker also after this commit 
has been done:

{quote}
* LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc)
  to use the DocValues API instead of FieldCache. For FieldCache functionality,
  use UninvertingReader in lucene/misc (or implement your own FilterReader).
  UninvertingReader is more efficient: supports multi-valued numeric fields,
  detects when a multi-valued field is single-valued, reuses caches
  of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access
  without insanity).  "Insanity" is no longer possible unless you explicitly 
want it. 
  Rename FieldCache* and DocTermOrds* classes in the search package to 
DocValues*. 
  Move SortedSetSortField to core and add SortedSetFieldSource to queries/, 
which
  takes the same selectors. Add helper methods to DocValues.java that are 
better 
  suited for search code (never return null, etc).  (Mike McCandless, Robert 
Muir)
{quote}


bq. The people who did this are elasticsearch employees. That is one way to 
deal with Solr's faster faceting!

This is speculation and really a bad behaviour on an Open Source issue tracker. 
We should discuss here about technical stuff, not make any assumptions about 
what people intend to do. This statement was posted by a person 
([~mmurphy3141]) who I never met in person, and who really seldem took place in 
Lucene/Solr discussions at all. So I don't think we should count on that. It is 
also bad behaviour to accuse committers on twitter about sabotage: 
https://twitter.com/mmurphy3141/status/647254551356162048; please don't do 
this. I would ask to remove this tweet, thanks.

I was informed about the changes mentioned here and I strongly agree with the 
committers behind LUCENE-5666. I was always in favour of removing those 
top-level facetting algorithms. So they still have my strong +1. On my Solr 
customers I have seen nobody who complained about slow top-level facetting 
(because I told them long time ago to no longer use those outdated top-level 
algorithms if they have dynamic indexes).

The right thing to do for Solr people would be to remove those top-level stuff 
completely. This is no longer fitting the new reader structure (composite and 
atomic/leaf readers) of Lucene 3 (with API cleanups to better reflect the new 
structure in Lucene 4). Lucene 3 is now several years retired already! So there 
was long time to fix Solr's facetting to go away from top-level. People with 
static indexes can still force merge their index and will have the same 
performance with the new algorithms.

Please keep in mind that it took about half a year until the first one 
recognized a problem like this, which makes me think that only few people are 
using those mostly-static indexes. 

*We should work on this issue to fix the issue, not accuse people, thanks!*

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of 

[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908063#comment-14908063
 ] 

Yonik Seeley commented on SOLR-8096:


Once again, UnInvertedField was not part of the lucene FieldCache.  It was a 
Solr class cached in SolrIndexSearcher (via fieldValueCache), did not implement 
the DocValues API, etc.  The *lucene* FieldCache was made package protected (an 
implementation detail) so one would need to access it via DocValues.  That's 
what the issue was about.

bq. So the committers decided to step forward and remove the top-level 
facetting (which was long overdue).

Where was this discussion?  I see nothing about it on LUCENE-5666
And of course I would have given a -1 to such a change for being dogmatic over 
practical and not caring about our users.

bq. I was informed about the changes mentioned here

Where did this discussion take place? I can't find it in any public forum.

bq. I was always in favour of removing those top-level facetting algorithms. So 
they still have my strong +1.

With no benchmarking of how the replacement performed?  No option to use the 
old method if a user *wanted* to? Without any public discussion of the impacts? 
 Without any note in Solr's CHANGES?

So you were strongly for the change, but you knew I'd most likely be against 
it, right (based on previous discussions about top-level data structures)?



> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-24 Thread Mike Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907515#comment-14907515
 ] 

Mike Murphy commented on SOLR-8096:
---

The people who did this are elasticsearch employees.  That is one way to deal 
with Solr's faster faceting!
This smells like the VW pollution scandal for lucene/solr/elasticsearch, except 
perhaps no consequences for those who pulled it off?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-24 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907568#comment-14907568
 ] 

Yonik Seeley commented on SOLR-8096:


bq. Are you sure it was secret and not just a mistake?

Yes.

- This algorithm had been relied apon by many since 2008 (SOLR-475), and 
completely removing it's use and replacing it would obviously warrant 
discussion, benchmarks, etc.
- This was a massive patch, and relevant changes should be called out, esp if 
changes seem unrelated to the issue's description.
- If you search the JIRA issue, "UnInvertedField" *never* appears.
  (the linked issues mention it now, but those were added by us after the fact)
- The issue's title is "Add UninvertingReader" and the description had to do 
with Lucene's FieldCache, which UnInvertedField is not part of.
- There is *no* mention of the issue or changes anywhere in Solr's CHANGES.txt
- When asked to comment on impacts of this massive patch, the answer given was 
"Is the CHANGES.txt entry not good here? The docvalues apis did not change..."
- The CHANGES entry for lucene made no mention of the change to Solr or 
UnInvertedField.
- Although the UnInvertedField code was left behind (as dead code), the removal 
of the use
  of UnInvertedField was *not* by mistake - you can see by the test code that 
was explicitly removed.
  (TestFaceting.java)

Exactly what other conclusion is there to draw?  Massive incompetence?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-24 Thread Mike Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907552#comment-14907552
 ] 

Mike Murphy commented on SOLR-8096:
---

Are you sure it was secret and not just a mistake?

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-24 Thread Mike Murphy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907587#comment-14907587
 ] 

Mike Murphy commented on SOLR-8096:
---

Erik, you're right. I do not know what the motivations were. Although after 
looking at it, the evidence is compelling that there was a coverup. What do you 
think the motivation was?
The fact that it was elasticsearch employees could have also been a coincidence.

> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-24 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907563#comment-14907563
 ] 

Erick Erickson commented on SOLR-8096:
--

[~mmurphy3141] Whoa! I don't know whether your comment was meant sarcastically 
or in any other humorous sense, but it stands a very good chance of being 
seriously received no matter what your intent. Let's find out what the 
background is here before casting aspersions.


> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8096) Major faceting performance regressions

2015-09-24 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907614#comment-14907614
 ] 

Erick Erickson commented on SOLR-8096:
--

I'm not going there. Speculating about motives does not have any _useful_ 
outcome, it just provides fodder for flame wars. Discussing fixing the issues 
is far more fruitful.

And less painful for me to read.



> Major faceting performance regressions
> --
>
> Key: SOLR-8096
> URL: https://issues.apache.org/jira/browse/SOLR-8096
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.0, 5.1, 5.2, 5.3, Trunk
>Reporter: Yonik Seeley
>Priority: Critical
>
> Use of the highly optimized faceting that Solr had for multi-valued fields 
> over relatively static indexes was *secretly removed* as part of LUCENE-5666, 
> causing severe performance regressions.
> Here are some quick benchmarks to gauge the damage, on a 5M document index, 
> with each field having between 0 and 5 values per document.  *Higher numbers 
> represent worse 5x performance*.
> Solr 5.4_dev faceting time as a percent of Solr 4.10.3 faceting time  
> ||...|| Percent of index being faceted
> ||num_unique_values|| 10% || 50% || 90% ||
> |10   | 351.17%   | 1587.08%  | 3057.28% |
> |100  | 158.10%   | 203.61%   | 1421.93% |
> |1000 | 143.78%   | 168.01%   | 1325.87% |
> |1| 137.98%   | 175.31%   | 1233.97% |
> |10   | 142.98%   | 159.42%   | 1252.45% |
> |100  | 255.15%   | 165.17%   | 1236.75% |
> For example, a field with 1000 unique values in the whole index, faceting 
> with 5x took 143% of the 4x time, when ~10% of the docs in the index were 
> faceted.
> One user who brought the performance problem to our attention: 
> http://markmail.org/message/ekmqh4ocbkwxv3we
> "faceting is unusable slow since upgrade to 5.3.0" (from 4.10.3)
> The disabling of the UnInvertedField algorithm was previously discovered in 
> SOLR-7190, but we didn't know just how bad the problem was at that time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org