Re: Faceting and Grouping Performance Degradation in Solr 5

2017-02-06 Thread Solr User
I am pleased to report that we are in Production on Solr 5.5.3 with
comparable performance to Solr 4.8.1 through leveraging facet.method=uif as
well as https://issues.apache.org/jira/browse/SOLR-9176.  Thanks to
everyone who worked on these!

On Mon, Oct 3, 2016 at 3:55 PM, Solr User  wrote:

> Below is some further testing.  This was done in an environment that had
> no other queries or updates during testing.  We ran through several
> scenarios so I pasted this with HTML formatting below so you may view this
> as a table.  Sorry if you have to pull this out into a different file for
> viewing, but I did not want the formatting to be messed up.  The times are
> average times in milliseconds.  Same test methodology as above except there
> was a 5 minute warmup and a 15 minute test.
>
> Note that both the segment and deletions were recorded from only 1 out of
> 2 of the shards so we cannot try to extrapolate a function between them and
> the outcome.  In other words, just view them as "non-optimized" versus
> "optimized" and "has deletions" versus "no deletions".  The only exceptions
> are the 0 deletes were true for both shards and the 1 segment and 8 segment
> cases were true for both shards.  A few of the tests were repeated as well.
>
> The only conclusion that I could draw is that the number of segments and
> the number of deletes appear to greatly influence the response times, at
> least more than any difference in Solr version.  There also appears to be
> some external contributor to variancemaybe network, etc.
>
> Thoughts?
>
>
> Date9/29/20169/29/
> 20169/29/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/20169/30/
> 20169/30/20169/30/201610/3/
> 201610/3/201610/3/201610/3/2016Solr
> Version5.5.25.5.24.8.14.
> 8.14.8.15.5.25.5.25.5.2<
> /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873
> 57873176958593694593694
> 578735787357873578730<
> /td>00<
> /td>0Segment Count3434 td>1827273434<
> td>34348811 td>8811
> facet.method=uifYESYESN/A<
> td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario
> #1198210145186<
> td>190208209210206 td>1091427370160 td>1098385Scenario
> #29288596258 td>7270777468<
> td>7363616654
> 5251
>
>
>
>
> On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:
>
>> I plan to re-test this in a separate environment that I have more control
>> over and will share the results when I can.
>>
>> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>>
>>> Certainly.  And I would of course welcome anyone else to test this for
>>> themselves especially with facet.method=uif to see if that has indeed
>>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>>> testing is invalid due to variance, problem in process, etc.  One thing I
>>> was pondering is if I should force merge the index to a certain amount of
>>> segments because indexing yields a random number of segments and
>>> deletions.  The only thing stopping me short of doing that were
>>> observations of longer Solr 4 times even with more deletions and similar
>>> number of segments.
>>>
>>> We use Soasta as our testing tool.  Before testing, load is sent for
>>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>>> with input being pulled from data files.  The requests are repeatable test
>>> to test.
>>>
>>> The numbers posted above are average response times as reported by
>>> Soasta.  However, respective time differences are supported by Splunk which
>>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>>> JVM's.
>>>
>>> The versions are deployed to the same machines thereby overlaying the
>>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>>> of indexing all documents and then deleting any that were not touched.
>>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>>> results as the previous Solr 4 test.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>>> wrote:
>>>
 On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
 > Further testing indicates that any performance difference is not due
 > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
 > deletes.

 Sanity check: Could you describe how you test?

 * How many queries do you issue for each test?
 * Are each query a new one or do you re-use the same query?
 * Do you discard the first X calls?
 * Are the numbers averages, medians or something third?
 * What do you do about disk cache?
 * Are both Solr's on the same machine?
 * Do they 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-10-03 Thread Solr User
Below is some further testing.  This was done in an environment that had no
other queries or updates during testing.  We ran through several scenarios
so I pasted this with HTML formatting below so you may view this as a
table.  Sorry if you have to pull this out into a different file for
viewing, but I did not want the formatting to be messed up.  The times are
average times in milliseconds.  Same test methodology as above except there
was a 5 minute warmup and a 15 minute test.

Note that both the segment and deletions were recorded from only 1 out of 2
of the shards so we cannot try to extrapolate a function between them and
the outcome.  In other words, just view them as "non-optimized" versus
"optimized" and "has deletions" versus "no deletions".  The only exceptions
are the 0 deletes were true for both shards and the 1 segment and 8 segment
cases were true for both shards.  A few of the tests were repeated as well.

The only conclusion that I could draw is that the number of segments and
the number of deletes appear to greatly influence the response times, at
least more than any difference in Solr version.  There also appears to be
some external contributor to variancemaybe network, etc.

Thoughts?


Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr
Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted
Docs578735787317695859369459369457873578735787357873Segment
Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario
#119821014518619020820921020610914273701601098385Scenario
#29288596258727077746873636166545251




On Wed, Sep 28, 2016 at 4:44 PM, Solr User  wrote:

> I plan to re-test this in a separate environment that I have more control
> over and will share the results when I can.
>
> On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:
>
>> Certainly.  And I would of course welcome anyone else to test this for
>> themselves especially with facet.method=uif to see if that has indeed
>> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
>> testing is invalid due to variance, problem in process, etc.  One thing I
>> was pondering is if I should force merge the index to a certain amount of
>> segments because indexing yields a random number of segments and
>> deletions.  The only thing stopping me short of doing that were
>> observations of longer Solr 4 times even with more deletions and similar
>> number of segments.
>>
>> We use Soasta as our testing tool.  Before testing, load is sent for
>> 10-15 minutes to make sure any Solr caches have stabilized.  Then the test
>> is run for 30 minutes of steady volume with Scenario #1 tested at 15
>> req/sec and Scenario #2 tested at 100 req/sec.  Each request is different
>> with input being pulled from data files.  The requests are repeatable test
>> to test.
>>
>> The numbers posted above are average response times as reported by
>> Soasta.  However, respective time differences are supported by Splunk which
>> indexes the Solr logs and Dynatrace which is instrumented on one of the
>> JVM's.
>>
>> The versions are deployed to the same machines thereby overlaying the
>> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
>> the same input data.  Being in SolrCloud mode, the full indexing comprises
>> of indexing all documents and then deleting any that were not touched.
>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
>> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
>> results as the previous Solr 4 test.
>>
>>
>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
>> wrote:
>>
>>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>>> > Further testing indicates that any performance difference is not due
>>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>>> > deletes.
>>>
>>> Sanity check: Could you describe how you test?
>>>
>>> * How many queries do you issue for each test?
>>> * Are each query a new one or do you re-use the same query?
>>> * Do you discard the first X calls?
>>> * Are the numbers averages, medians or something third?
>>> * What do you do about disk cache?
>>> * Are both Solr's on the same machine?
>>> * Do they use the same index?
>>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>
>>
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User
I plan to re-test this in a separate environment that I have more control
over and will share the results when I can.

On Wed, Sep 28, 2016 at 3:37 PM, Solr User  wrote:

> Certainly.  And I would of course welcome anyone else to test this for
> themselves especially with facet.method=uif to see if that has indeed
> bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
> testing is invalid due to variance, problem in process, etc.  One thing I
> was pondering is if I should force merge the index to a certain amount of
> segments because indexing yields a random number of segments and
> deletions.  The only thing stopping me short of doing that were
> observations of longer Solr 4 times even with more deletions and similar
> number of segments.
>
> We use Soasta as our testing tool.  Before testing, load is sent for 10-15
> minutes to make sure any Solr caches have stabilized.  Then the test is run
> for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
> Scenario #2 tested at 100 req/sec.  Each request is different with input
> being pulled from data files.  The requests are repeatable test to test.
>
> The numbers posted above are average response times as reported by
> Soasta.  However, respective time differences are supported by Splunk which
> indexes the Solr logs and Dynatrace which is instrumented on one of the
> JVM's.
>
> The versions are deployed to the same machines thereby overlaying the
> previous installation.  Going Solr 4 to Solr 5, full indexing is run with
> the same input data.  Being in SolrCloud mode, the full indexing comprises
> of indexing all documents and then deleting any that were not touched.
> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
> load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
> results as the previous Solr 4 test.
>
>
> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
> wrote:
>
>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
>> > Further testing indicates that any performance difference is not due
>> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
>> > deletes.
>>
>> Sanity check: Could you describe how you test?
>>
>> * How many queries do you issue for each test?
>> * Are each query a new one or do you re-use the same query?
>> * Do you discard the first X calls?
>> * Are the numbers averages, medians or something third?
>> * What do you do about disk cache?
>> * Are both Solr's on the same machine?
>> * Do they use the same index?
>> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>>
>> - Toke Eskildsen, State and University Library, Denmark
>>
>
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Solr User
Certainly.  And I would of course welcome anyone else to test this for
themselves especially with facet.method=uif to see if that has indeed
bridged the gap between Solr 4 and Solr 5.  I would be very happy if my
testing is invalid due to variance, problem in process, etc.  One thing I
was pondering is if I should force merge the index to a certain amount of
segments because indexing yields a random number of segments and
deletions.  The only thing stopping me short of doing that were
observations of longer Solr 4 times even with more deletions and similar
number of segments.

We use Soasta as our testing tool.  Before testing, load is sent for 10-15
minutes to make sure any Solr caches have stabilized.  Then the test is run
for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and
Scenario #2 tested at 100 req/sec.  Each request is different with input
being pulled from data files.  The requests are repeatable test to test.

The numbers posted above are average response times as reported by Soasta.
However, respective time differences are supported by Splunk which indexes
the Solr logs and Dynatrace which is instrumented on one of the JVM's.

The versions are deployed to the same machines thereby overlaying the
previous installation.  Going Solr 4 to Solr 5, full indexing is run with
the same input data.  Being in SolrCloud mode, the full indexing comprises
of indexing all documents and then deleting any that were not touched.
Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not
load with a Solr 5 index.  Testing Solr 4 after reverting yields the same
results as the previous Solr 4 test.


On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen 
wrote:

> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> > Further testing indicates that any performance difference is not due
> > to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> > deletes.
>
> Sanity check: Could you describe how you test?
>
> * How many queries do you issue for each test?
> * Are each query a new one or do you re-use the same query?
> * Do you discard the first X calls?
> * Are the numbers averages, medians or something third?
> * What do you do about disk cache?
> * Are both Solr's on the same machine?
> * Do they use the same index?
> * Do you alternate between testing 4.8.1 and 5.5.2 first?
>
> - Toke Eskildsen, State and University Library, Denmark
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-28 Thread Toke Eskildsen
On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote:
> Further testing indicates that any performance difference is not due
> to deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing
> deletes.

Sanity check: Could you describe how you test?

* How many queries do you issue for each test?
* Are each query a new one or do you re-use the same query?
* Do you discard the first X calls?
* Are the numbers averages, medians or something third?
* What do you do about disk cache?
* Are both Solr's on the same machine?
* Do they use the same index?
* Do you alternate between testing 4.8.1 and 5.5.2 first?

- Toke Eskildsen, State and University Library, Denmark


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Solr User
Further testing indicates that any performance difference is not due to
deletes.  Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes.
The times appear to converge on an optimized index.  Below are the
details.  Not sure what else to make of this at this point other than
moving forward with an upgrade with an optimized index wherever possible.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
4.8.1 (without deletes): 104 ms
5.5.2 (without deletes): 125 ms
4.8.1 (1 segment without deletes): 55 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
4.8.1 (without deletes): 35 ms
5.5.2 (without deletes): 42 ms
4.8.1 (1 segment without deletes): 28 ms
5.5.2 (1 segment without deletes): 34 ms

On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti  wrote:

> Hi !
> At the time we didn't investigate the deletion implication at all.
> This can be interesting.
> if you proceed with your investigations and discover what changed in the
> deletion approach, I would be more than happy to help!
>
> Cheers
>
> On Mon, Sep 26, 2016 at 10:59 PM, Solr User  wrote:
>
> > Thanks again for your work on honoring the facet.method.  I have an
> > observation that I would like to share and get your feedback on if
> > possible.
> >
> > I performance tested Solr 5.5.2 with various facet queries and the only
> way
> > I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> > Here are the details.
> >
> > Scenario #1:  Using facet.method=uif with faceting on several
> multi-valued
> > fields.
> > 4.8.1 (with deletes): 115 ms
> > 5.5.2 (with deletes): 155 ms
> > 5.5.2 (without deletes): 125 ms
> > 5.5.2 (1 segment without deletes): 44 ms
> >
> > Scenario #2:  Using facet.method=enum with faceting on several
> multi-valued
> > fields.  These fields are different than Scenario #1 and perform much
> > better with enum hence that method is used instead.
> > 4.8.1 (with deletes): 38 ms
> > 5.5.2 (with deletes): 49 ms
> > 5.5.2 (without deletes): 42 ms
> > 5.5.2 (1 segment without deletes): 34 ms
> >
> >
> >
> > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > Interesting developments :
> > >
> > > https://issues.apache.org/jira/browse/SOLR-9176
> > >
> > > I think we found why term Enum seems slower in recent Solr !
> > > In our case it is likely to be related to the commit I mention in the
> > Jira.
> > > Have a check Joel !
> > >
> > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > > abenede...@apache.org> wrote:
> > >
> > > > I am investigating this scenario right now.
> > > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > > And I agree with Joel, it seems to be un-related with the famous
> > faceting
> > > > regression :(
> > > >
> > > > Furthermore with the legacy facet approach, if you set docValues for
> > the
> > > > field you are not going to be able to try the enum approach anymore.
> > > >
> > > > org/apache/solr/request/SimpleFacets.java:448
> > > >
> > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > > >   // only fc can handle docvalues types
> > > >   method = FacetMethod.FC;
> > > > }
> > > >
> > > >
> > > > I got really horrible regressions simply using term enum in both
> Solr 4
> > > > and Solr 6.
> > > >
> > > > And even the most optimized fcs approach with docValues and
> > > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > > >
> > > > i.e.
> > > >
> > > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > > I think we should open an issue if we can confirm it is not related
> > with
> > > > the other.
> > > > A lot of people will continue using the legacy approach for a
> while...
> > > >
> > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein  >
> > > > wrote:
> > > >
> > > >> The enum slowness is interesting. It would appear on the surface to
> > not
> > > be
> > > >> related to the FieldCache issue. I don't think the main emphasis of
> > the
> > > >> JSON facet API has been the enum approach. You may find using the
> JSON
> > > >> facet API and eliminating the use of enum meets your performance
> > needs.
> > > >>
> > > >> With the CollapsingQParserPlugin top_fc is definitely faster during
> > > >> queries. The tradeoff is slower warming times and increased memory
> > usage
> > > >> if
> > > >> the collapse fields are used in faceting, as faceting will load the
> > > field
> > > >> into a different cache.
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Alessandro Benedetti
Hi !
At the time we didn't investigate the deletion implication at all.
This can be interesting.
if you proceed with your investigations and discover what changed in the
deletion approach, I would be more than happy to help!

Cheers

On Mon, Sep 26, 2016 at 10:59 PM, Solr User  wrote:

> Thanks again for your work on honoring the facet.method.  I have an
> observation that I would like to share and get your feedback on if
> possible.
>
> I performance tested Solr 5.5.2 with various facet queries and the only way
> I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
> possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
> Here are the details.
>
> Scenario #1:  Using facet.method=uif with faceting on several multi-valued
> fields.
> 4.8.1 (with deletes): 115 ms
> 5.5.2 (with deletes): 155 ms
> 5.5.2 (without deletes): 125 ms
> 5.5.2 (1 segment without deletes): 44 ms
>
> Scenario #2:  Using facet.method=enum with faceting on several multi-valued
> fields.  These fields are different than Scenario #1 and perform much
> better with enum hence that method is used instead.
> 4.8.1 (with deletes): 38 ms
> 5.5.2 (with deletes): 49 ms
> 5.5.2 (without deletes): 42 ms
> 5.5.2 (1 segment without deletes): 34 ms
>
>
>
> On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > Interesting developments :
> >
> > https://issues.apache.org/jira/browse/SOLR-9176
> >
> > I think we found why term Enum seems slower in recent Solr !
> > In our case it is likely to be related to the commit I mention in the
> Jira.
> > Have a check Joel !
> >
> > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> > abenede...@apache.org> wrote:
> >
> > > I am investigating this scenario right now.
> > > I can confirm that the enum slowness is in Solr 6.0 as well.
> > > And I agree with Joel, it seems to be un-related with the famous
> faceting
> > > regression :(
> > >
> > > Furthermore with the legacy facet approach, if you set docValues for
> the
> > > field you are not going to be able to try the enum approach anymore.
> > >
> > > org/apache/solr/request/SimpleFacets.java:448
> > >
> > > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> > >   // only fc can handle docvalues types
> > >   method = FacetMethod.FC;
> > > }
> > >
> > >
> > > I got really horrible regressions simply using term enum in both Solr 4
> > > and Solr 6.
> > >
> > > And even the most optimized fcs approach with docValues and
> > > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> > >
> > > i.e.
> > >
> > > For some sample queries I have 40 ms vs 160 ms and similar...
> > > I think we should open an issue if we can confirm it is not related
> with
> > > the other.
> > > A lot of people will continue using the legacy approach for a while...
> > >
> > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein 
> > > wrote:
> > >
> > >> The enum slowness is interesting. It would appear on the surface to
> not
> > be
> > >> related to the FieldCache issue. I don't think the main emphasis of
> the
> > >> JSON facet API has been the enum approach. You may find using the JSON
> > >> facet API and eliminating the use of enum meets your performance
> needs.
> > >>
> > >> With the CollapsingQParserPlugin top_fc is definitely faster during
> > >> queries. The tradeoff is slower warming times and increased memory
> usage
> > >> if
> > >> the collapse fields are used in faceting, as faceting will load the
> > field
> > >> into a different cache.
> > >>
> > >> Joel Bernstein
> > >> http://joelsolr.blogspot.com/
> > >>
> > >> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
> > >>
> > >> > Joel,
> > >> >
> > >> > Thank you for taking the time to respond to my question.  I tried
> the
> > >> JSON
> > >> > Facet API for one query that uses facet.method=enum (since this one
> > has
> > >> a
> > >> > ton of unique values and performed better with enum) but this was
> way
> > >> > slower than even the slower Solr 5 times.  I did not try the new API
> > >> with
> > >> > the non-enum queries though so I will give that a go.  It looks like
> > >> Solr
> > >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> > >> >
> > >> > If these do not prove helpful, it looks like I will need to wait for
> > >> > SOLR-8096 to be resolved before upgrading.
> > >> >
> > >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> > use
> > >> > collapse/expand for some queries but traditional grouping for others
> > >> due to
> > >> > performance.  It will be interesting to see if those grouping
> queries
> > >> > perform better now using CollapsingQParser with top_fc.
> > >> >
> > >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <
> joels...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Yes, SOLR-8096 is the issue here.
> > >> > >
> > >> > > I don't believe indexing with docValues is going to help too 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-26 Thread Solr User
Thanks again for your work on honoring the facet.method.  I have an
observation that I would like to share and get your feedback on if possible.

I performance tested Solr 5.5.2 with various facet queries and the only way
I get comparable results to Solr 4.8.1 is when I expungeDeletes.  Is it
possible that Solr 5 is not as efficiently ignoring deletes as Solr 4?
Here are the details.

Scenario #1:  Using facet.method=uif with faceting on several multi-valued
fields.
4.8.1 (with deletes): 115 ms
5.5.2 (with deletes): 155 ms
5.5.2 (without deletes): 125 ms
5.5.2 (1 segment without deletes): 44 ms

Scenario #2:  Using facet.method=enum with faceting on several multi-valued
fields.  These fields are different than Scenario #1 and perform much
better with enum hence that method is used instead.
4.8.1 (with deletes): 38 ms
5.5.2 (with deletes): 49 ms
5.5.2 (without deletes): 42 ms
5.5.2 (1 segment without deletes): 34 ms



On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> Interesting developments :
>
> https://issues.apache.org/jira/browse/SOLR-9176
>
> I think we found why term Enum seems slower in recent Solr !
> In our case it is likely to be related to the commit I mention in the Jira.
> Have a check Joel !
>
> On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
> abenede...@apache.org> wrote:
>
> > I am investigating this scenario right now.
> > I can confirm that the enum slowness is in Solr 6.0 as well.
> > And I agree with Joel, it seems to be un-related with the famous faceting
> > regression :(
> >
> > Furthermore with the legacy facet approach, if you set docValues for the
> > field you are not going to be able to try the enum approach anymore.
> >
> > org/apache/solr/request/SimpleFacets.java:448
> >
> > if (method == FacetMethod.ENUM && sf.hasDocValues()) {
> >   // only fc can handle docvalues types
> >   method = FacetMethod.FC;
> > }
> >
> >
> > I got really horrible regressions simply using term enum in both Solr 4
> > and Solr 6.
> >
> > And even the most optimized fcs approach with docValues and
> > facet.threads=nCore does not perform as the simple enum in Solr 4 .
> >
> > i.e.
> >
> > For some sample queries I have 40 ms vs 160 ms and similar...
> > I think we should open an issue if we can confirm it is not related with
> > the other.
> > A lot of people will continue using the legacy approach for a while...
> >
> > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein 
> > wrote:
> >
> >> The enum slowness is interesting. It would appear on the surface to not
> be
> >> related to the FieldCache issue. I don't think the main emphasis of the
> >> JSON facet API has been the enum approach. You may find using the JSON
> >> facet API and eliminating the use of enum meets your performance needs.
> >>
> >> With the CollapsingQParserPlugin top_fc is definitely faster during
> >> queries. The tradeoff is slower warming times and increased memory usage
> >> if
> >> the collapse fields are used in faceting, as faceting will load the
> field
> >> into a different cache.
> >>
> >> Joel Bernstein
> >> http://joelsolr.blogspot.com/
> >>
> >> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
> >>
> >> > Joel,
> >> >
> >> > Thank you for taking the time to respond to my question.  I tried the
> >> JSON
> >> > Facet API for one query that uses facet.method=enum (since this one
> has
> >> a
> >> > ton of unique values and performed better with enum) but this was way
> >> > slower than even the slower Solr 5 times.  I did not try the new API
> >> with
> >> > the non-enum queries though so I will give that a go.  It looks like
> >> Solr
> >> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> >> >
> >> > If these do not prove helpful, it looks like I will need to wait for
> >> > SOLR-8096 to be resolved before upgrading.
> >> >
> >> > Thanks also for your comment on top_fc for the CollapsingQParser.  I
> use
> >> > collapse/expand for some queries but traditional grouping for others
> >> due to
> >> > performance.  It will be interesting to see if those grouping queries
> >> > perform better now using CollapsingQParser with top_fc.
> >> >
> >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
> >> > wrote:
> >> >
> >> > > Yes, SOLR-8096 is the issue here.
> >> > >
> >> > > I don't believe indexing with docValues is going to help too much
> with
> >> > > this. The enum slowness may not be related, but I'm not positive
> about
> >> > > that.
> >> > >
> >> > > The major slowdowns are likely due to the removal of the top level
> >> > > FieldCache from general use and the removal of the FieldValuesCache
> >> which
> >> > > was used for multi-value field faceting.
> >> > >
> >> > > The JSON facet API covers all the functionality in the traditional
> >> > > faceting, and it has been developed to be very performant.
> >> > >
> >> > > You may also want to see if Collapse/Expand can meet your
> 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-31 Thread Alessandro Benedetti
Interesting developments :

https://issues.apache.org/jira/browse/SOLR-9176

I think we found why term Enum seems slower in recent Solr !
In our case it is likely to be related to the commit I mention in the Jira.
Have a check Joel !

On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> I am investigating this scenario right now.
> I can confirm that the enum slowness is in Solr 6.0 as well.
> And I agree with Joel, it seems to be un-related with the famous faceting
> regression :(
>
> Furthermore with the legacy facet approach, if you set docValues for the
> field you are not going to be able to try the enum approach anymore.
>
> org/apache/solr/request/SimpleFacets.java:448
>
> if (method == FacetMethod.ENUM && sf.hasDocValues()) {
>   // only fc can handle docvalues types
>   method = FacetMethod.FC;
> }
>
>
> I got really horrible regressions simply using term enum in both Solr 4
> and Solr 6.
>
> And even the most optimized fcs approach with docValues and
> facet.threads=nCore does not perform as the simple enum in Solr 4 .
>
> i.e.
>
> For some sample queries I have 40 ms vs 160 ms and similar...
> I think we should open an issue if we can confirm it is not related with
> the other.
> A lot of people will continue using the legacy approach for a while...
>
> On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein 
> wrote:
>
>> The enum slowness is interesting. It would appear on the surface to not be
>> related to the FieldCache issue. I don't think the main emphasis of the
>> JSON facet API has been the enum approach. You may find using the JSON
>> facet API and eliminating the use of enum meets your performance needs.
>>
>> With the CollapsingQParserPlugin top_fc is definitely faster during
>> queries. The tradeoff is slower warming times and increased memory usage
>> if
>> the collapse fields are used in faceting, as faceting will load the field
>> into a different cache.
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
>>
>> > Joel,
>> >
>> > Thank you for taking the time to respond to my question.  I tried the
>> JSON
>> > Facet API for one query that uses facet.method=enum (since this one has
>> a
>> > ton of unique values and performed better with enum) but this was way
>> > slower than even the slower Solr 5 times.  I did not try the new API
>> with
>> > the non-enum queries though so I will give that a go.  It looks like
>> Solr
>> > 5.5.1 also has a facet.method=uif which will be interesting to try.
>> >
>> > If these do not prove helpful, it looks like I will need to wait for
>> > SOLR-8096 to be resolved before upgrading.
>> >
>> > Thanks also for your comment on top_fc for the CollapsingQParser.  I use
>> > collapse/expand for some queries but traditional grouping for others
>> due to
>> > performance.  It will be interesting to see if those grouping queries
>> > perform better now using CollapsingQParser with top_fc.
>> >
>> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
>> > wrote:
>> >
>> > > Yes, SOLR-8096 is the issue here.
>> > >
>> > > I don't believe indexing with docValues is going to help too much with
>> > > this. The enum slowness may not be related, but I'm not positive about
>> > > that.
>> > >
>> > > The major slowdowns are likely due to the removal of the top level
>> > > FieldCache from general use and the removal of the FieldValuesCache
>> which
>> > > was used for multi-value field faceting.
>> > >
>> > > The JSON facet API covers all the functionality in the traditional
>> > > faceting, and it has been developed to be very performant.
>> > >
>> > > You may also want to see if Collapse/Expand can meet your applications
>> > > needs rather Grouping. It allows you to specify using a top level
>> > > FieldCache if performance is a blocker without it.
>> > >
>> > >
>> > >
>> > >
>> > > Joel Bernstein
>> > > http://joelsolr.blogspot.com/
>> > >
>> > > On Wed, May 18, 2016 at 10:42 AM, Solr User 
>> wrote:
>> > >
>> > > > Does anyone know the answer to this?
>> > > >
>> > > > On Wed, May 4, 2016 at 2:19 PM, Solr User 
>> wrote:
>> > > >
>> > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1
>> > but
>> > > > had
>> > > > > to abort due to average response times degraded from a baseline
>> > volume
>> > > > > performance test.  The affected queries involved faceting (both
>> enum
>> > > > method
>> > > > > and default) and grouping.  There is a critical bug
>> > > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open
>> > which I
>> > > > > gather is the cause of the slower response times.  One concern I
>> have
>> > > is
>> > > > > that discussions around the issue offer the suggestion of indexing
>> > with
>> > > > > docValues which alleviated the problem in at least that one
>> reported
>> > > > case.
>> > > > > However, indexing with docValues 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-25 Thread Alessandro Benedetti
I am investigating this scenario right now.
I can confirm that the enum slowness is in Solr 6.0 as well.
And I agree with Joel, it seems to be un-related with the famous faceting
regression :(

Furthermore with the legacy facet approach, if you set docValues for the
field you are not going to be able to try the enum approach anymore.

org/apache/solr/request/SimpleFacets.java:448

if (method == FacetMethod.ENUM && sf.hasDocValues()) {
  // only fc can handle docvalues types
  method = FacetMethod.FC;
}


I got really horrible regressions simply using term enum in both Solr 4 and
Solr 6.

And even the most optimized fcs approach with docValues and
facet.threads=nCore does not perform as the simple enum in Solr 4 .

i.e.

For some sample queries I have 40 ms vs 160 ms and similar...
I think we should open an issue if we can confirm it is not related with
the other.
A lot of people will continue using the legacy approach for a while...

On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein  wrote:

> The enum slowness is interesting. It would appear on the surface to not be
> related to the FieldCache issue. I don't think the main emphasis of the
> JSON facet API has been the enum approach. You may find using the JSON
> facet API and eliminating the use of enum meets your performance needs.
>
> With the CollapsingQParserPlugin top_fc is definitely faster during
> queries. The tradeoff is slower warming times and increased memory usage if
> the collapse fields are used in faceting, as faceting will load the field
> into a different cache.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:
>
> > Joel,
> >
> > Thank you for taking the time to respond to my question.  I tried the
> JSON
> > Facet API for one query that uses facet.method=enum (since this one has a
> > ton of unique values and performed better with enum) but this was way
> > slower than even the slower Solr 5 times.  I did not try the new API with
> > the non-enum queries though so I will give that a go.  It looks like Solr
> > 5.5.1 also has a facet.method=uif which will be interesting to try.
> >
> > If these do not prove helpful, it looks like I will need to wait for
> > SOLR-8096 to be resolved before upgrading.
> >
> > Thanks also for your comment on top_fc for the CollapsingQParser.  I use
> > collapse/expand for some queries but traditional grouping for others due
> to
> > performance.  It will be interesting to see if those grouping queries
> > perform better now using CollapsingQParser with top_fc.
> >
> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
> > wrote:
> >
> > > Yes, SOLR-8096 is the issue here.
> > >
> > > I don't believe indexing with docValues is going to help too much with
> > > this. The enum slowness may not be related, but I'm not positive about
> > > that.
> > >
> > > The major slowdowns are likely due to the removal of the top level
> > > FieldCache from general use and the removal of the FieldValuesCache
> which
> > > was used for multi-value field faceting.
> > >
> > > The JSON facet API covers all the functionality in the traditional
> > > faceting, and it has been developed to be very performant.
> > >
> > > You may also want to see if Collapse/Expand can meet your applications
> > > needs rather Grouping. It allows you to specify using a top level
> > > FieldCache if performance is a blocker without it.
> > >
> > >
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > > On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:
> > >
> > > > Does anyone know the answer to this?
> > > >
> > > > On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
> > > >
> > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1
> > but
> > > > had
> > > > > to abort due to average response times degraded from a baseline
> > volume
> > > > > performance test.  The affected queries involved faceting (both
> enum
> > > > method
> > > > > and default) and grouping.  There is a critical bug
> > > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open
> > which I
> > > > > gather is the cause of the slower response times.  One concern I
> have
> > > is
> > > > > that discussions around the issue offer the suggestion of indexing
> > with
> > > > > docValues which alleviated the problem in at least that one
> reported
> > > > case.
> > > > > However, indexing with docValues did not improve the performance in
> > my
> > > > case.
> > > > >
> > > > > Can someone please confirm or correct my understanding that this
> > issue
> > > > has
> > > > > no path forward at this time and specifically that it is already
> > known
> > > > that
> > > > > docValues does not necessarily solve this?
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>



-- 
--

Benedetti Alessandro
Visiting card : 

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Joel Bernstein
The enum slowness is interesting. It would appear on the surface to not be
related to the FieldCache issue. I don't think the main emphasis of the
JSON facet API has been the enum approach. You may find using the JSON
facet API and eliminating the use of enum meets your performance needs.

With the CollapsingQParserPlugin top_fc is definitely faster during
queries. The tradeoff is slower warming times and increased memory usage if
the collapse fields are used in faceting, as faceting will load the field
into a different cache.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 18, 2016 at 5:28 PM, Solr User  wrote:

> Joel,
>
> Thank you for taking the time to respond to my question.  I tried the JSON
> Facet API for one query that uses facet.method=enum (since this one has a
> ton of unique values and performed better with enum) but this was way
> slower than even the slower Solr 5 times.  I did not try the new API with
> the non-enum queries though so I will give that a go.  It looks like Solr
> 5.5.1 also has a facet.method=uif which will be interesting to try.
>
> If these do not prove helpful, it looks like I will need to wait for
> SOLR-8096 to be resolved before upgrading.
>
> Thanks also for your comment on top_fc for the CollapsingQParser.  I use
> collapse/expand for some queries but traditional grouping for others due to
> performance.  It will be interesting to see if those grouping queries
> perform better now using CollapsingQParser with top_fc.
>
> On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein 
> wrote:
>
> > Yes, SOLR-8096 is the issue here.
> >
> > I don't believe indexing with docValues is going to help too much with
> > this. The enum slowness may not be related, but I'm not positive about
> > that.
> >
> > The major slowdowns are likely due to the removal of the top level
> > FieldCache from general use and the removal of the FieldValuesCache which
> > was used for multi-value field faceting.
> >
> > The JSON facet API covers all the functionality in the traditional
> > faceting, and it has been developed to be very performant.
> >
> > You may also want to see if Collapse/Expand can meet your applications
> > needs rather Grouping. It allows you to specify using a top level
> > FieldCache if performance is a blocker without it.
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:
> >
> > > Does anyone know the answer to this?
> > >
> > > On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
> > >
> > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1
> but
> > > had
> > > > to abort due to average response times degraded from a baseline
> volume
> > > > performance test.  The affected queries involved faceting (both enum
> > > method
> > > > and default) and grouping.  There is a critical bug
> > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open
> which I
> > > > gather is the cause of the slower response times.  One concern I have
> > is
> > > > that discussions around the issue offer the suggestion of indexing
> with
> > > > docValues which alleviated the problem in at least that one reported
> > > case.
> > > > However, indexing with docValues did not improve the performance in
> my
> > > case.
> > > >
> > > > Can someone please confirm or correct my understanding that this
> issue
> > > has
> > > > no path forward at this time and specifically that it is already
> known
> > > that
> > > > docValues does not necessarily solve this?
> > > >
> > > > Thanks in advance!
> > > >
> > > >
> > > >
> > >
> >
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User
Joel,

Thank you for taking the time to respond to my question.  I tried the JSON
Facet API for one query that uses facet.method=enum (since this one has a
ton of unique values and performed better with enum) but this was way
slower than even the slower Solr 5 times.  I did not try the new API with
the non-enum queries though so I will give that a go.  It looks like Solr
5.5.1 also has a facet.method=uif which will be interesting to try.

If these do not prove helpful, it looks like I will need to wait for
SOLR-8096 to be resolved before upgrading.

Thanks also for your comment on top_fc for the CollapsingQParser.  I use
collapse/expand for some queries but traditional grouping for others due to
performance.  It will be interesting to see if those grouping queries
perform better now using CollapsingQParser with top_fc.

On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein  wrote:

> Yes, SOLR-8096 is the issue here.
>
> I don't believe indexing with docValues is going to help too much with
> this. The enum slowness may not be related, but I'm not positive about
> that.
>
> The major slowdowns are likely due to the removal of the top level
> FieldCache from general use and the removal of the FieldValuesCache which
> was used for multi-value field faceting.
>
> The JSON facet API covers all the functionality in the traditional
> faceting, and it has been developed to be very performant.
>
> You may also want to see if Collapse/Expand can meet your applications
> needs rather Grouping. It allows you to specify using a top level
> FieldCache if performance is a blocker without it.
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:
>
> > Does anyone know the answer to this?
> >
> > On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
> >
> > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but
> > had
> > > to abort due to average response times degraded from a baseline volume
> > > performance test.  The affected queries involved faceting (both enum
> > method
> > > and default) and grouping.  There is a critical bug
> > > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> > > gather is the cause of the slower response times.  One concern I have
> is
> > > that discussions around the issue offer the suggestion of indexing with
> > > docValues which alleviated the problem in at least that one reported
> > case.
> > > However, indexing with docValues did not improve the performance in my
> > case.
> > >
> > > Can someone please confirm or correct my understanding that this issue
> > has
> > > no path forward at this time and specifically that it is already known
> > that
> > > docValues does not necessarily solve this?
> > >
> > > Thanks in advance!
> > >
> > >
> > >
> >
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Joel Bernstein
Yes, SOLR-8096 is the issue here.

I don't believe indexing with docValues is going to help too much with
this. The enum slowness may not be related, but I'm not positive about
that.

The major slowdowns are likely due to the removal of the top level
FieldCache from general use and the removal of the FieldValuesCache which
was used for multi-value field faceting.

The JSON facet API covers all the functionality in the traditional
faceting, and it has been developed to be very performant.

You may also want to see if Collapse/Expand can meet your applications
needs rather Grouping. It allows you to specify using a top level
FieldCache if performance is a blocker without it.




Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, May 18, 2016 at 10:42 AM, Solr User  wrote:

> Does anyone know the answer to this?
>
> On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:
>
> > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but
> had
> > to abort due to average response times degraded from a baseline volume
> > performance test.  The affected queries involved faceting (both enum
> method
> > and default) and grouping.  There is a critical bug
> > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> > gather is the cause of the slower response times.  One concern I have is
> > that discussions around the issue offer the suggestion of indexing with
> > docValues which alleviated the problem in at least that one reported
> case.
> > However, indexing with docValues did not improve the performance in my
> case.
> >
> > Can someone please confirm or correct my understanding that this issue
> has
> > no path forward at this time and specifically that it is already known
> that
> > docValues does not necessarily solve this?
> >
> > Thanks in advance!
> >
> >
> >
>


Re: Faceting and Grouping Performance Degradation in Solr 5

2016-05-18 Thread Solr User
Does anyone know the answer to this?

On Wed, May 4, 2016 at 2:19 PM, Solr User  wrote:

> I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had
> to abort due to average response times degraded from a baseline volume
> performance test.  The affected queries involved faceting (both enum method
> and default) and grouping.  There is a critical bug
> https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
> gather is the cause of the slower response times.  One concern I have is
> that discussions around the issue offer the suggestion of indexing with
> docValues which alleviated the problem in at least that one reported case.
> However, indexing with docValues did not improve the performance in my case.
>
> Can someone please confirm or correct my understanding that this issue has
> no path forward at this time and specifically that it is already known that
> docValues does not necessarily solve this?
>
> Thanks in advance!
>
>
>


Faceting and Grouping Performance Degradation in Solr 5

2016-05-04 Thread Solr User
I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had
to abort due to average response times degraded from a baseline volume
performance test.  The affected queries involved faceting (both enum method
and default) and grouping.  There is a critical bug
https://issues.apache.org/jira/browse/SOLR-8096 currently open which I
gather is the cause of the slower response times.  One concern I have is
that discussions around the issue offer the suggestion of indexing with
docValues which alleviated the problem in at least that one reported case.
However, indexing with docValues did not improve the performance in my case.

Can someone please confirm or correct my understanding that this issue has
no path forward at this time and specifically that it is already known that
docValues does not necessarily solve this?

Thanks in advance!