[jira] [Commented] (SOLR-6312) CloudSolrServer doesn't honor updatesToLeaders constructor argument

2017-10-17 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208362#comment-16208362
 ] 

Jeff Wartes commented on SOLR-6312:
---

As of 7.0.1, three years later, yes, I think it is still an open issue.
CloudSolrServer.Builder has two functions that have no effect: 
sendUpdatesOnlyToShardLeaders and sendUpdatesToAllReplicasInShard. They are not 
marked depreciated, and the javadoc implies functionality.


> CloudSolrServer doesn't honor updatesToLeaders constructor argument
> ---
>
> Key: SOLR-6312
> URL: https://issues.apache.org/jira/browse/SOLR-6312
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.9
>Reporter: Steve Davids
> Fix For: 4.10
>
> Attachments: SOLR-6312.patch
>
>
> The CloudSolrServer doesn't use the updatesToLeaders property - all SolrJ 
> requests are being sent to the shard leaders.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7269) ZK as truth for SolrCloud

2017-01-25 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838581#comment-15838581
 ] 

Jeff Wartes commented on SOLR-7269:
---

Any life still here? I've always thought it was strange that Solr effectively 
had two sources of truth. (disk and Zk)

> ZK as truth for SolrCloud
> -
>
> Key: SOLR-7269
> URL: https://issues.apache.org/jira/browse/SOLR-7269
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>
> We have been wanting to do this for a long time. 
> Mark listed out what all should go into this here - 
> https://issues.apache.org/jira/browse/SOLR-7248?focusedCommentId=14363441=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14363441
> The best approach as Mark suggested would be to work on these under 
> legacyCloud=false and once we are confident switch over to it as default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2017-01-10 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816053#comment-15816053
 ] 

Jeff Wartes commented on SOLR-5170:
---

Well, yes, I'm interested. I've got enough other work projects going at the 
moment I'm not sure if I'll be able to dedicate much time in the next month or 
two, but I wouldn't mind trying to chip at it.

I don't want to pollute this issue, so if you have a few minutes, and could 
drop me an email with any pointers about the code areas involved, or references 
to any prior art you're aware of, I expect that'd accelerate things a lot. 
Thanks.

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2017-01-09 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812871#comment-15812871
 ] 

Jeff Wartes commented on SOLR-5170:
---

It's coming up on two years, and I'm aware there have been some significant 
changes to areas like docvalues and geospatial since the last update to this 
issue. 

What's the state of the world now? 
If you have entities with multiple locations, and you want to filter and sort, 
is this patch still the highest-performance option available? I'm more willing 
to give up on the real-time-friendliness these days, if that changes the answer.

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-12-12 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743325#comment-15743325
 ] 

Jeff Wartes commented on SOLR-4735:
---

Understood, and not all cores are part of a collection. But if it matches the 
solrcloud convention, it would be pretty nice to use it. (and the node name if 
it doesn't) I could've sworn I saw an existing function for picking a node name 
apart somewhere, but I can't seem to find it now - maybe it was in a patch I 
read or something.

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch, screenshot-1.png
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-12-12 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743207#comment-15743207
 ] 

Jeff Wartes commented on SOLR-4735:
---

That's almost perfect. Can we replace those underscores with dots? That would 
mean the dashboard doesn't need to regex the "name" in order to group similar 
metrics.

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch, screenshot-1.png
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-12-12 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15743201#comment-15743201
 ] 

Jeff Wartes commented on SOLR-4735:
---

Oh, one thing just occurred to me though. There are essentially two classes of 
request to a collection - the top-level request, and the per-shard fan-out 
requests. I guess you can sort of derive the metrics of the top-level request 
from the per-core metrics, but it requires you know the number of shards, and 
still only works if the two classes of request are not mixed together. 


> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch, screenshot-1.png
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4735) Improve Solr metrics reporting

2016-12-12 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742740#comment-15742740
 ] 

Jeff Wartes edited comment on SOLR-4735 at 12/12/16 6:38 PM:
-

I've fallen behind keeping up with your changes, but for what it's worth, I 
agree with this. Collection-level metrics are at the cluster level, in 
aggregate. It's up to the thing you're reporting the metrics into to do the 
aggregation. For example, what I really want on my dashboard in grafana is a 
line, something like:

AVG(solr.[all nodes].[all cores belonging to a particular 
collection].latency.p95)

Then I can drill into a particular node, or core, in my reporting tool if I 
want. There's a requirement that the metrics namespaces being reported allows 
for aggregation like this, which might mean a core needs to know the collection 
to which it belongs, but I don't think the node itself should needs to report 
collection metrics.



was (Author: jwartes):
I've fallen behind keeping up with your changes, but for what it's worth, I 
agree with this. Collection-level metrics are at the cluster level, in 
aggregate. It's up to the thing you're reporting the metrics into to do the 
aggregation. For example, what I really want on my dashboard in grafana is a 
line, something like:

AVG(solr.{all nodes}.{all cores belonging to a particular 
collection}.latency.p95)

Then I can drill into a particular node, or core, in my reporting tool if I 
want. There's a requirement that the metrics namespaces being reported allows 
for aggregation like this, which might mean a core needs to know the collection 
to which it belongs, but I don't think the node itself should needs to report 
collection metrics.


> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch, screenshot-1.png
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-12-12 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742740#comment-15742740
 ] 

Jeff Wartes commented on SOLR-4735:
---

I've fallen behind keeping up with your changes, but for what it's worth, I 
agree with this. Collection-level metrics are at the cluster level, in 
aggregate. It's up to the thing you're reporting the metrics into to do the 
aggregation. For example, what I really want on my dashboard in grafana is a 
line, something like:

AVG(solr.{all nodes}.{all cores belonging to a particular 
collection}.latency.p95)

Then I can drill into a particular node, or core, in my reporting tool if I 
want. There's a requirement that the metrics namespaces being reported allows 
for aggregation like this, which might mean a core needs to know the collection 
to which it belongs, but I don't think the node itself should needs to report 
collection metrics.


> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch, screenshot-1.png
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-12-02 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15716772#comment-15716772
 ] 

Jeff Wartes commented on SOLR-4735:
---

That seems pretty viable too. As I mentioned, the memory overhead of a registry 
is pretty low, just a concurrent map and a list. Plus, the actual metric 
objects in the map would be shared by both registries, so I'd be more concerned 
about the work involved keeping them synchronized then with just having 
multiple registries.

I confess though, I don't have a clear idea whether that's more or less 
overhead than multiple identically-configured reporters. It feels like most of 
the possible performance issues here are linear, so it may not matter. Two 
reporters iterating through 10 metrics each sounds pretty much the same as one 
reporter iterating over 20 to me, all else being equal. 

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-12-02 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15716247#comment-15716247
 ] 

Jeff Wartes commented on SOLR-4735:
---

Yeah, I get that. I like this line of thought because it means we can create as 
many registries as make sense, (cores, collections, logical code sections, etc) 
without worrying about how to get everything reported. We only have to pick 
some names.

What about a class that extends MetricRegistry and also implements 
MetricRegistryListener? Call that a ListeningMetricRegistry or something. When 
the configuration asks for a reporter on some set of (registry) names, we 
create a new, perhaps non-shared ListeningMetricRegistry, use registerAll to 
scoop the metrics in the desired registries into it, and then call addListener 
on all the desired registries with the ListeningMetricRegistry so everything 
stays in sync?

So that could still mean a single registry with a ton of metrics, but only in 
cases where there's been an explicit request for a reporter on a ton of 
metrics. 

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-12-02 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715880#comment-15715880
 ] 

Jeff Wartes commented on SOLR-4735:
---

`MetricRegistry` is really just a bunch of convenience methods and 
thread-safety around a `MetricSet`. There isn't much overhead difference 
between the two. But really, when I think of a MetricRegistry, I think of it as 
"a set of metrics I want to attach a reporter to", nothing more. 
It's a bit disappointing that reporters take a Registry instead of a MetricSet, 
since a Registry isa MetricSet.

With that in mind, one strategy would be have every logical grouping of metrics 
use its own dedicated (probably shared) registry, and then bind the 
reporter-registry concept together at reporter definition time. 

That is, create a non-shared registry explicitly for the purpose of attaching a 
reporter to it, and only when asked to define a reporter. The reporter 
definition would then include the names of the registries to be reported. Under 
the hood, a new registry would be created as the union of the requested 
registries, and the reporter instantiated and attached to that. We'd have to 
make sure the namespace of all the metrics in the metric groups is unique, so 
that arbitrary groups can be combined without conflict, but that sounds 
desirable regardless.


> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-11-29 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707000#comment-15707000
 ] 

Jeff Wartes commented on SOLR-4735:
---

Heh, I wondered whether something like that would happen if I commented on 
github. Should I constrain myself to talking in Jira?

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-11-25 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696479#comment-15696479
 ] 

Jeff Wartes commented on SOLR-4735:
---

I had a scheme for collapsable namespaced registries in my original PR for 
SOLR-8785.

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-11-24 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15693874#comment-15693874
 ] 

Jeff Wartes commented on SOLR-4735:
---

>From what I see, SolrMetricManager only needs the SolrCore for the 
>config-based reporter instantiation, but that's a pretty nice thing to have.

How about SolrMetricManager takes, as an optional second parameter to the 
constructor, the name of a SharedMetricRegistry. If absent, then it creates a 
new, isolated registry. With a name though, that means the config-based 
reporters you attach are actually being attached to the shared registry, 
pulling whatever happens to be in there too. 
Of course, then the core unregister action needs to be careful to only 
replace/reset those metrics that it'd added to the registry, instead of all of 
them as currently written. It could remove/replace the reporters with no real 
issue on every core reload (aside from possibly a blip in the reporting 
interval) though.

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki 
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-11-23 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15690992#comment-15690992
 ] 

Jeff Wartes commented on SOLR-4735:
---

For what it's worth, this looks like really great stuff to me. 
I'm still unconvinced that metrics should always get reset on core reload, 
which is a source of some complexity, but doing so is certainly consistent with 
the prior behavior, so I can hardly complain. 
I think I can see a path to providing reportable metrics outside of the 
RequestHandler. I'd be interested in Kelvin's thoughts on that subject though, 
since he chose not to use SharedMetricsRegistries.

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8785) Use Metrics library for core metrics

2016-11-21 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15684725#comment-15684725
 ] 

Jeff Wartes commented on SOLR-8785:
---

Understood, I'm all for incremental change, and I don't see "how to make a 
Reporter" as part of this issue. I will be slightly disappointed though, if we 
convert to the library without also providing a recommended access path for the 
use of that library. Gathering metrics you can't report on is useless, and one 
of the things I liked about the original patch was this:

{code}
if(this.pluginInfo==null) {
  // if a request handler has a name, use a persistent, reportable timer 
under that name
  if (pluginInfo.name != null)
requestTimes = Metrics.namedTimer(Metrics.mkName(this.getClass(), 
pluginInfo.name), REGISTRY_NAME);
  this.pluginInfo = pluginInfo;
}
{code}

This meant that I automatically got access to all the relevant metrics for any 
named request handler, using any Reporters (Log, JMX, Graphite, whatever) I 
cared to attach. This, in turn, was only possible because all those metrics 
were in a well-defined and accessible location.


> Use Metrics library for core metrics
> 
>
> Key: SOLR-8785
> URL: https://issues.apache.org/jira/browse/SOLR-8785
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.1
>Reporter: Jeff Wartes
>  Labels: patch, patch-available
> Attachments: SOLR-8785-increment.patch, SOLR-8785.patch, 
> SOLR-8785.patch
>
>
> The Metrics library (https://dropwizard.github.io/metrics/3.1.0/) is a 
> well-known way to track metrics about applications. 
> In SOLR-1972, latency percentile tracking was added. The comment list is 
> long, so here’s my synopsis:
> 1. An attempt was made to use the Metrics library
> 2. That attempt failed due to a memory leak in Metrics v2.1.1
> 3. Large parts of Metrics were then copied wholesale into the 
> org.apache.solr.util.stats package space and that was used instead.
> Copy/pasting Metrics code into Solr may have been the correct solution at the 
> time, but I submit that it isn’t correct any more. 
> The leak in Metrics was fixed even before SOLR-1972 was released, and by 
> copy/pasting a subset of the functionality, we miss access to other important 
> things that the Metrics library provides, particularly the concept of a 
> Reporter. (https://dropwizard.github.io/metrics/3.1.0/manual/core/#reporters)
> Further, Metrics v3.0.2 is already packaged with Solr anyway, because it’s 
> used in two contrib modules. (map-reduce and morphines-core)
> I’m proposing that:
> 1. Metrics as bundled with Solr be upgraded to the current v3.1.2
> 2. Most of the org.apache.solr.util.stats package space be deleted outright, 
> or gutted and replaced with simple calls to Metrics. Due to the copy/paste 
> origin, the concepts should mostly map 1:1.
> I’d further recommend a usage pattern like:
> SharedMetricRegistries.getOrCreate(System.getProperty(“solr.metrics.registry”,
>  “solr-registry”))
> There are all kinds of areas in Solr that could benefit from metrics tracking 
> and reporting. This pattern allows diverse areas of code to track metrics 
> within a single, named registry. This well-known-name then becomes a handle 
> you can use to easily attach a Reporter and ship all of those metrics off-box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2016-11-01 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15626442#comment-15626442
 ] 

Jeff Wartes commented on SOLR-4735:
---

I have, and am, by instantiating a SharedMetricRegistry and GraphiteReporter 
directly in the jetty.xml. (Which is hacky, but in lieu of SOLR-8785, does work 
fine) 
I'm also using the logging and JVM metrics plugins quite happily.



> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch
>
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8785) Use Metrics library for core metrics

2016-10-21 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595671#comment-15595671
 ] 

Jeff Wartes commented on SOLR-8785:
---

For the record, it looks like I wrote this patch against master, around about 
version 6.1.
I recall I had some concern at the time that the metrics namespace generation 
was too flexible (complicated), so that's something to look at.

> Use Metrics library for core metrics
> 
>
> Key: SOLR-8785
> URL: https://issues.apache.org/jira/browse/SOLR-8785
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.1
>Reporter: Jeff Wartes
>  Labels: patch, patch-available
>
> The Metrics library (https://dropwizard.github.io/metrics/3.1.0/) is a 
> well-known way to track metrics about applications. 
> In SOLR-1972, latency percentile tracking was added. The comment list is 
> long, so here’s my synopsis:
> 1. An attempt was made to use the Metrics library
> 2. That attempt failed due to a memory leak in Metrics v2.1.1
> 3. Large parts of Metrics were then copied wholesale into the 
> org.apache.solr.util.stats package space and that was used instead.
> Copy/pasting Metrics code into Solr may have been the correct solution at the 
> time, but I submit that it isn’t correct any more. 
> The leak in Metrics was fixed even before SOLR-1972 was released, and by 
> copy/pasting a subset of the functionality, we miss access to other important 
> things that the Metrics library provides, particularly the concept of a 
> Reporter. (https://dropwizard.github.io/metrics/3.1.0/manual/core/#reporters)
> Further, Metrics v3.0.2 is already packaged with Solr anyway, because it’s 
> used in two contrib modules. (map-reduce and morphines-core)
> I’m proposing that:
> 1. Metrics as bundled with Solr be upgraded to the current v3.1.2
> 2. Most of the org.apache.solr.util.stats package space be deleted outright, 
> or gutted and replaced with simple calls to Metrics. Due to the copy/paste 
> origin, the concepts should mostly map 1:1.
> I’d further recommend a usage pattern like:
> SharedMetricRegistries.getOrCreate(System.getProperty(“solr.metrics.registry”,
>  “solr-registry”))
> There are all kinds of areas in Solr that could benefit from metrics tracking 
> and reporting. This pattern allows diverse areas of code to track metrics 
> within a single, named registry. This well-known-name then becomes a handle 
> you can use to easily attach a Reporter and ship all of those metrics off-box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4449) Enable backup requests for the internal solr load balancer

2016-10-21 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-4449:
--
Labels: patch patch-available  (was: patch-available)

> Enable backup requests for the internal solr load balancer
> --
>
> Key: SOLR-4449
> URL: https://issues.apache.org/jira/browse/SOLR-4449
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
>  Labels: patch, patch-available
> Attachments: SOLR-4449.patch, SOLR-4449.patch, SOLR-4449.patch, 
> patch-4449.txt, solr-back-request-lb-plugin.jar
>
>
> Add the ability to configure the built-in solr load balancer such that it 
> submits a backup request to the next server in the list if the initial 
> request takes too long. Employing such an algorithm could improve the latency 
> of the 9xth percentile albeit at the expense of increasing overall load due 
> to additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8785) Use Metrics library for core metrics

2016-10-21 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-8785:
--
Labels: patch patch-available  (was: patch-available)

> Use Metrics library for core metrics
> 
>
> Key: SOLR-8785
> URL: https://issues.apache.org/jira/browse/SOLR-8785
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.1
>Reporter: Jeff Wartes
>  Labels: patch, patch-available
>
> The Metrics library (https://dropwizard.github.io/metrics/3.1.0/) is a 
> well-known way to track metrics about applications. 
> In SOLR-1972, latency percentile tracking was added. The comment list is 
> long, so here’s my synopsis:
> 1. An attempt was made to use the Metrics library
> 2. That attempt failed due to a memory leak in Metrics v2.1.1
> 3. Large parts of Metrics were then copied wholesale into the 
> org.apache.solr.util.stats package space and that was used instead.
> Copy/pasting Metrics code into Solr may have been the correct solution at the 
> time, but I submit that it isn’t correct any more. 
> The leak in Metrics was fixed even before SOLR-1972 was released, and by 
> copy/pasting a subset of the functionality, we miss access to other important 
> things that the Metrics library provides, particularly the concept of a 
> Reporter. (https://dropwizard.github.io/metrics/3.1.0/manual/core/#reporters)
> Further, Metrics v3.0.2 is already packaged with Solr anyway, because it’s 
> used in two contrib modules. (map-reduce and morphines-core)
> I’m proposing that:
> 1. Metrics as bundled with Solr be upgraded to the current v3.1.2
> 2. Most of the org.apache.solr.util.stats package space be deleted outright, 
> or gutted and replaced with simple calls to Metrics. Due to the copy/paste 
> origin, the concepts should mostly map 1:1.
> I’d further recommend a usage pattern like:
> SharedMetricRegistries.getOrCreate(System.getProperty(“solr.metrics.registry”,
>  “solr-registry”))
> There are all kinds of areas in Solr that could benefit from metrics tracking 
> and reporting. This pattern allows diverse areas of code to track metrics 
> within a single, named registry. This well-known-name then becomes a handle 
> you can use to easily attach a Reporter and ship all of those metrics off-box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4449) Enable backup requests for the internal solr load balancer

2016-10-21 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-4449:
--
Labels: patch-available  (was: )

> Enable backup requests for the internal solr load balancer
> --
>
> Key: SOLR-4449
> URL: https://issues.apache.org/jira/browse/SOLR-4449
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
>  Labels: patch-available
> Attachments: SOLR-4449.patch, SOLR-4449.patch, SOLR-4449.patch, 
> patch-4449.txt, solr-back-request-lb-plugin.jar
>
>
> Add the ability to configure the built-in solr load balancer such that it 
> submits a backup request to the next server in the list if the initial 
> request takes too long. Employing such an algorithm could improve the latency 
> of the 9xth percentile albeit at the expense of increasing overall load due 
> to additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8785) Use Metrics library for core metrics

2016-10-21 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-8785:
--
Labels: patch-available  (was: )

> Use Metrics library for core metrics
> 
>
> Key: SOLR-8785
> URL: https://issues.apache.org/jira/browse/SOLR-8785
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.1
>Reporter: Jeff Wartes
>  Labels: patch-available
>
> The Metrics library (https://dropwizard.github.io/metrics/3.1.0/) is a 
> well-known way to track metrics about applications. 
> In SOLR-1972, latency percentile tracking was added. The comment list is 
> long, so here’s my synopsis:
> 1. An attempt was made to use the Metrics library
> 2. That attempt failed due to a memory leak in Metrics v2.1.1
> 3. Large parts of Metrics were then copied wholesale into the 
> org.apache.solr.util.stats package space and that was used instead.
> Copy/pasting Metrics code into Solr may have been the correct solution at the 
> time, but I submit that it isn’t correct any more. 
> The leak in Metrics was fixed even before SOLR-1972 was released, and by 
> copy/pasting a subset of the functionality, we miss access to other important 
> things that the Metrics library provides, particularly the concept of a 
> Reporter. (https://dropwizard.github.io/metrics/3.1.0/manual/core/#reporters)
> Further, Metrics v3.0.2 is already packaged with Solr anyway, because it’s 
> used in two contrib modules. (map-reduce and morphines-core)
> I’m proposing that:
> 1. Metrics as bundled with Solr be upgraded to the current v3.1.2
> 2. Most of the org.apache.solr.util.stats package space be deleted outright, 
> or gutted and replaced with simple calls to Metrics. Due to the copy/paste 
> origin, the concepts should mostly map 1:1.
> I’d further recommend a usage pattern like:
> SharedMetricRegistries.getOrCreate(System.getProperty(“solr.metrics.registry”,
>  “solr-registry”))
> There are all kinds of areas in Solr that could benefit from metrics tracking 
> and reporting. This pattern allows diverse areas of code to track metrics 
> within a single, named registry. This well-known-name then becomes a handle 
> you can use to easily attach a Reporter and ship all of those metrics off-box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6581) Efficient DocValues support and numeric collapse field implementations for Collapse and Expand

2016-07-25 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392504#comment-15392504
 ] 

Jeff Wartes commented on SOLR-6581:
---

For what it's worth, I recall having a bad experience with that hint in a Solr 
5.4 cluster late last year. I never did dig into why though.
I had a similar case where I was collapsing on a highly distinct field, and as 
Joel indicates, the memory allocation rate was bad enough I had to give up on 
the whole thing. Joel and I discussed this a little in SOLR-9125 if you're 
curious.

> Efficient DocValues support and numeric collapse field implementations for 
> Collapse and Expand
> --
>
> Key: SOLR-6581
> URL: https://issues.apache.org/jira/browse/SOLR-6581
> Project: Solr
>  Issue Type: Bug
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 5.0, 6.0
>
> Attachments: SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
> SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
> SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
> SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, SOLR-6581.patch, 
> renames.diff
>
>
> The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent 
> are optimized to work with a top level FieldCache. Top level FieldCaches have 
> a very fast docID to top-level ordinal lookup. Fast access to the top-level 
> ordinals allows for very high performance field collapsing on high 
> cardinality fields. 
> LUCENE-5666 unified the DocValues and FieldCache api's so that the top level 
> FieldCache is no longer in regular use. Instead all top level caches are 
> accessed through MultiDocValues. 
> This ticket does the following:
> 1) Optimizes Collapse and Expand to use MultiDocValues and makes this the 
> default approach when collapsing on String fields
> 2) Provides an option to use a top level FieldCache if the performance of 
> MultiDocValues is a blocker. The mechanism for switching to the FieldCache is 
> a new "hint" parameter. If the hint parameter is set to "top_fc" then the 
> top-level FieldCache would be used for both Collapse and Expand.
> Example syntax:
> {code}
> fq={!collapse field=x hint=TOP_FC}
> {code}
> 3)  Adds numeric collapse field implementations.
> 4) Resolves issue SOLR-6066
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9335) Move solr stats collections to use LongAdder

2016-07-25 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392297#comment-15392297
 ] 

Jeff Wartes commented on SOLR-9335:
---

fwiw, SOLR-8241 involves cache implementations that (among other improvements) 
uses LongAddr, and the author has been having trouble getting committer 
attention.

> Move solr stats collections to use LongAdder
> 
>
> Key: SOLR-9335
> URL: https://issues.apache.org/jira/browse/SOLR-9335
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>Assignee: Varun Thacker
>Priority: Minor
> Fix For: 6.2, master (7.0)
>
> Attachments: SOLR-9335.patch, SOLR-9335.patch
>
>
> With Java 8 we can use LongAdder which has more throughput under high 
> contention .
> These classes of Solr should benefit from LongAdder
> - Caches ( ConcurentLRUCache / LRUCache )
> - Searches ( RequestHandlerBase )
> - Updates ( DirectUpdateHandler2 )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9133) UUID FieldType shouldn't be stored as a String

2016-05-19 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-9133:
-

 Summary: UUID FieldType shouldn't be stored as a String
 Key: SOLR-9133
 URL: https://issues.apache.org/jira/browse/SOLR-9133
 Project: Solr
  Issue Type: Improvement
Reporter: Jeff Wartes


This came up in passing on SOLR-6741 last year, but as far as I can tell, the 
solr UUIDField still indexes those UUIDs as strings, not as a 128bit number.

So really, the only point of the UUIDField instead of using a StringField is 
that there's some validation and the possibility of a newly-generated value. 
Seems a little misleading.

>From what I can tell, Lucene has added a bunch of support for arbitrary sized 
>numbers and binary primitives (LUCENE-7043?), so it seems like the Solr UUID 
>field should save some space and actually index UUIDs as what they are.

Of course, since this would change the encoding of an existing field type, it 
might take the form of a new "CompressedUUIDField" or something instead.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based

2016-05-17 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287339#comment-15287339
 ] 

Jeff Wartes commented on SOLR-9125:
---

Isn't there a chicken-and-egg situation there? You need the set of matching 
docs to figure out the HLL.cardinality to specify the initial size of the map 
you're going to save the set of matching docs in? 

Or maybe collect() would just throw every doc in the FBS, and finish() would do 
all the finding group heads and collapsing?

> CollapseQParserPlugin allocations are index based, not query based
> --
>
> Key: SOLR-9125
> URL: https://issues.apache.org/jira/browse/SOLR-9125
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Jeff Wartes
>Priority: Minor
>  Labels: collapsingQParserPlugin
>
> Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates 
> space per-query for: 
> 1 int (doc id) per ordinal
> 1 float (score) per ordinal
> 1 bit (FixedBitSet) per document in the index
>  
> So the higher the cardinality of the thing you’re grouping on, and the more 
> documents in the index, the more memory gets consumed per query. Since high 
> cardinality and large indexes are the use-cases CollapseQParserPlugin was 
> designed for, I thought I'd point this out.
> My real issue is that this does not vary based on the number of results in 
> the query, either before or after collapsing, so a query that results in one 
> doc consumes the same amount of memory as one that returns all of them. All 
> of the Collectors suffer from this to some degree, but I think OrdScore is 
> the worst offender.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based

2016-05-17 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286940#comment-15286940
 ] 

Jeff Wartes commented on SOLR-9125:
---

I messed around a little bit, but I don't have a solution for this. I thought 
I'd file the issue anyway just to shine some light.

I had attempted to use CollapseQParserPlugin on a very large index using a 
collapse on a field whose cardinality was about 1/7th the doc count... it 
didn't go well. Worse, the issue didn't come up until pretty late in the game, 
because at low query rate and/or on smaller indexes, the problem isn't evident. 
I abandoned the attempt.

Some stuff I tried:

- I thought about replacing the FBS with a DocIdSetBuilder, but 
DelegatingCollector.finish() gets called twice, and you can't 
DocIdSetBuilder.build() twice on the same builder. We'd need to save the first 
build() result and use it to initialize a new builder for the second, but I 
wasn't convinced I understood the distinction between the two passes.
- I did one quick test where I replaced the "ords" and "scores" arrays with an 
IntIntScatterMap IntFloatScatterMap, thinking those would work better for small 
result sets. That ended up being worse (from a total allocations standpoint) 
for the queries I was trying, probably due to the map resizing necessary. It 
might be possible to set initial size values from statistics and help this case 
that way. It would also be possible to encode the docId/score into a long and 
just use one IntLongScatterMap, but I didn't try that.

> CollapseQParserPlugin allocations are index based, not query based
> --
>
> Key: SOLR-9125
> URL: https://issues.apache.org/jira/browse/SOLR-9125
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Jeff Wartes
>Priority: Minor
>  Labels: collapsingQParserPlugin
>
> Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates 
> space per-query for: 
> 1 int (doc id) per ordinal
> 1 float (score) per ordinal
> 1 bit (FixedBitSet) per document in the index
>  
> So the higher the cardinality of the thing you’re grouping on, and the more 
> documents in the index, the more memory gets consumed per query. Since high 
> cardinality and large indexes are the use-cases CollapseQParserPlugin was 
> designed for, I thought I'd point this out.
> My real issue is that this does not vary based on the number of results in 
> the query, either before or after collapsing, so a query that results in one 
> doc consumes the same amount of memory as one that returns all of them. All 
> of the Collectors suffer from this to some degree, but I think OrdScore is 
> the worst offender.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based

2016-05-17 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-9125:
-

 Summary: CollapseQParserPlugin allocations are index based, not 
query based
 Key: SOLR-9125
 URL: https://issues.apache.org/jira/browse/SOLR-9125
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Jeff Wartes
Priority: Minor


Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates space 
per-query for: 
1 int (doc id) per ordinal
1 float (score) per ordinal
1 bit (FixedBitSet) per document in the index
 
So the higher the cardinality of the thing you’re grouping on, and the more 
documents in the index, the more memory gets consumed per query. Since high 
cardinality and large indexes are the use-cases CollapseQParserPlugin was 
designed for, I thought I'd point this out.

My real issue is that this does not vary based on the number of results in the 
query, either before or after collapsing, so a query that results in one doc 
consumes the same amount of memory as one that returns all of them. All of the 
Collectors suffer from this to some degree, but I think OrdScore is the worst 
offender.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8697) Fix LeaderElector issues

2016-05-13 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282958#comment-15282958
 ] 

Jeff Wartes commented on SOLR-8697:
---

Does this fix SOLR-6498?

> Fix LeaderElector issues
> 
>
> Key: SOLR-8697
> URL: https://issues.apache.org/jira/browse/SOLR-8697
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Mark Miller
>  Labels: patch, reliability, solrcloud
> Fix For: 5.5.1, 6.0
>
> Attachments: OverseerTestFail.log, SOLR-8697-followup.patch, 
> SOLR-8697.patch
>
>
> This patch is still somewhat WIP for a couple of reasons:
> 1) Still debugging test failures.
> 2) This will more scrutiny from knowledgable folks!
> There are some subtle bugs with the current implementation of LeaderElector, 
> best demonstrated by the following test:
> 1) Start up a small single-node solrcloud.  it should be become Overseer.
> 2) kill -9 the solrcloud process and immediately start a new one.
> 3) The new process won't become overseer.  The old process's ZK leader elect 
> node has not yet disappeared, and the new process fails to set appropriate 
> watches.
> NOTE: this is only reproducible if the new node is able to start up and join 
> the election quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-05-02 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15267176#comment-15267176
 ] 

Jeff Wartes commented on LUCENE-7258:
-

Ok, yeah, that’s a reasonable thing to assume. We usually think of it in terms 
of cpu work, but filter caches would be an equally great way to mitigate 
allocations. But a cache is really only useful when you’ve got non-uniform 
query distributions, or enough time-locality at your query rate that your rare 
queries haven’t faced a cache eviction yet. 

I’m indexing address-type data. Not uncommon. I think that if my typical 
geospatial search were based on some hyper-local phone location, we’d be done 
talking, since a filter cache would be useless.  

So maybe we should assume I’m not doing that.

Let’s assume I can get away with something coarse. Let’s assume I can convert 
all location based queries to the center point of a city. Let’s further assume 
that I only care about one radius per city. Finally, let’s assume I’m only 
searching in the US. There are some 40,000 cities in the US, so those 
assumptions yield 40,000 possible queries. That’s not too bad. 

With a 100M-doc core, I think that’s about 12.5Mb per filter cache entry. It 
could be less, I think, particularly with the changes in SOLR-8922, but since 
we’re only going with coarse queries, it’s reasonable to assume there’s going 
to be a lot of hits. 
I don’t need every city in the cache, of course, so maybe… 5%? That’s only some 
25G of heap. 
Doable, especially since it saves allocation size and you could probably trade 
in more of the eden space. (Although this would make warmup more of a pain) I’d 
probably have to cross the CompressedOops boundary at 32G of heap to do that 
too though, so add another 16G to get back to baseline.

Fortunately, the top 5% of cities probably maps to more than 5% of queries. 
More populated cities are also more likely targets for searching in most query 
corpuses. So assuming it’s the biggest 5% that are in the cache, maybe we can 
assume a 15% hit rate? 20%?

Ok, so now I’ve spent something like 41G of heap, and I’ve reduced allocations 
by 20%. Is this pretty good?

I suppose it’s worth noting that this also assumes a perfect cache eviction 
policy, (I’m pretty interested in SOLR-8241) and that there’s no other filter 
cache pressure. (At the least, I’m using facets - SOLR-8171)


> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-05-01 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265829#comment-15265829
 ] 

Jeff Wartes commented on LUCENE-7258:
-

There are actually three threads going on this ticket right now, there’s the 
“what threshold and expansion to use for geospatial” that I’d originally 
intended and provided a patch for, there’s the “what expansion for 
DocIdSetBuilder is generically optimal”, and there’s the “FBS is 50% of my 
allocation rate, can we pool” conversation.

I think the latter is a worthy conversation, and I don’t have a better place 
for it, so I’m going to continue to respond to the comments along those lines, 
(with apologies for the book I’m writing here) but I wanted to point out the 
divergence.

So, I certainly understand a knee-jerk reaction around using object pools of 
any kind. Yes, this IS what the JVM is for. It’s easier and simpler and lower 
maintenance to just use what’s provided. But I could also argue that 
Arrays.sort has all those same positive attributes and that hasn’t stopped 
several hand-written sort algorithms get into this codebase. The question is 
actually whether the easy and simple thing is good enough, or whether the 
harder thing has a sufficient offsetting benefit. Everyone on this thread is a 
highly experienced programmer, we all know this.

In this case, that means the question is actually whether the allocation rate 
is “good enough” or if there's a sufficiently offsetting opportunity for 
improvement, and arguments should ideally come from that analysis. 

I can empirically state that for my large Solr index, that GC pause is the 
single biggest detriment to my 90+th percentile query latency. Put another way, 
Lucene is fantastically fast, at least when the JVM isn’t otherwise occupied. 
Because of shard fan-out, a per-shard p90 latency very quickly becomes a p50 
latency for queries overall. (Even with mitigations like SOLR-4449) 
I don’t think there’s anything particularly unique to my use-case in anything I 
just said, except possibly the word “large”.

As such, I consider this an opportunity for improvement, so I’ve suggested a 
mitigation strategy. It clearly has some costs. I’d be delighted to entertain 
any alternative strategies.

Actually, [~dsmiley] did bring up one alternative suggestion for improvement, 
so let’s talk about -Xmn:

First, let’s assume that Lucene’s policy on G1 hasn’t changed, and we’re still 
talking about ParNew/CMS. Second, with the exception of a few things like 
cache, most of the allocations in a Solr/Lucene index are very short-lived. So 
it follows that given a young generation of sufficient size, the tenured 
generation would actually see very little activity.

The major disadvantage to just using a huge young generation then is that there 
aren’t any concurrent young-generation collectors. The bigger it is, the less 
frequently you need to collect, but the longer the stop-the-world GC pause when 
you do.
On the other end of the scale, a very small young space means shorter pauses, 
but far more frequent. Since almost all garbage is short-lived, maybe now 
you're doing young-collections so often that you’ve got the tenured collector 
doing a bunch of the work cleaning up short-lived objects too. (This can 
actually be a good thing, since the CMS collector is mostly concurrent)

There’s some theoretical size that optimizes frequency vs pause for averaged 
latency. Perhaps even by deliberately allowing some premature overflow into 
tenured simply because tenured can be collected concurrently. This kind of 
thing is extremely delicate to tune for though, especially since query rate 
(and query type distribution) can fluctuate. It’s easy to get it wrong, such 
that a sudden large-allocation slams past the rate CMS was expecting and 
triggers a full-heap stop-the-world pause.

I’m focusing on FBS here because: 1. _Fifty Percent_. 2. These are generally 
larger objects, so mitigating those allocations seemed like a good way to 
mitigate unexpected changes in allocation rate and allow more stable tuning.

There’s probably also at least one Jira issue around looking at object count 
allocation rate (vs size) since I suspect the single biggest factor in 
collector pause is the object count. Certainly I can point to objects that get 
allocated (by count) in orders of magnitude greater frequency than then next 
highest count. But since I don’t have a good an understanding of the use cases, 
let alone have any suggestions yet, I’ve left that for another time.


> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> 

[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-29 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264309#comment-15264309
 ] 

Jeff Wartes commented on LUCENE-7258:
-

I'm not sure I understand how the dangers of large FBS size would be any 
different with a pooling mechanism than they are right now. If a query needs 
several of them, then it needs several of them, whether they're freshly 
allocated or not. The only real difference I see might be whether that memory 
exists in the tenured space, rather than thrashing the eden space every time. 

I don't think it'd need to be per-thread. I don't mind points of 
synchronization if they're tight and well understood. Allocation rate by count 
is generally lower here. One thought:
https://gist.github.com/randomstatistic/87caefdea8435d6af4ad13a3f92d2698

To anticipate some objections, there are likely lockless data structures you 
could use, and yes, you might prefer to control size in terms of memory instead 
of count. I can think of a dozen improvements per minute I spend looking at 
this. But you get the idea. Anyone anywhere who knows for *sure* they're done 
with a FBS can offer it up for reuse, and anyone can potentially get some reuse 
by just changing their "new" to "request". 
If everybody does this, you end up with a fairly steady pool of FBS instances 
large enough for most uses. If only some places use it, there's no chance of an 
unbounded leak, you might get some gain, and worst-case you haven't lost much. 
If nobody uses it, you've lost nothing.

Last I checked, something like a full 50% of (my) allocations by size were 
FixedBitSets despite a low allocation rate by count, or I wouldn't be harping 
on the subject. As a matter of principle, I'd gladly pay heap to reduce GC. The 
fastest search algorithm in the world doesn't help me if I'm stuck waiting for 
the collector to finish all the time.


> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-27 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260960#comment-15260960
 ] 

Jeff Wartes commented on LUCENE-7258:
-

I'd be interested in trying TimSort, or something like [~yo...@apache.org] 
suggested where an ExpandingIntArray-style array of arrays is fed directly into 
the Radix sort, but I'm not sure I'm going to be able to commit much more time 
to this for a bit.

That said, in the process of thinking about this, I do have a few git stashes 
saved off with sketches for things like using TimSort and using 
ExpandingIntArray that I could try to clean and post if anyone is interested. 

I also have one sketch I started for using a loose pool mechanism to front 
acquiring a FixedBitSet, but I didn't get deep enough to be able to tell with 
confidence that a FBS was actually not being used anymore. Things like the 
public FixedBitSet.getBits() method make it scary, although I'm convinced even 
a very small pool of large FixedBitSets could be extremely advantageous. There 
aren't that many in use at any given time, and a large FBS can still be used 
for a small use-case. If anyone has some pointers around the lifecycle here, 
I'd love to hear them.

> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-27 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated LUCENE-7258:

Attachment: LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch

After 24 hours, I found I could discern a penalty to cpu on my patched node. I 
removed the change in sort algorithm, and that seems to have resolved it 
without too significantly changing the allocation savings.

> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-26 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258929#comment-15258929
 ] 

Jeff Wartes commented on LUCENE-7258:
-

Random aside: I did do one test run where I changed all usages 
ArrayUtil.oversize to use an expansion of 2x. I recall this increased overall 
allocations on my test query corpus by about 4%, when compared to the 256th/2x 
applied to only the IntersectsPrefixTreeQuery.

> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-26 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated LUCENE-7258:

Attachment: allocation_plot.jpg

Attaching the graph directly.

> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch, 
> allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-26 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258848#comment-15258848
 ] 

Jeff Wartes commented on LUCENE-7258:
-

I put this patch on a production node this morning, looks like allocation rate 
went down about 10%, which I think is pretty good considering only about 15% of 
my queries even have a geospatial component.

CPU usage has not changed enough for me to notice.

> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-26 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated LUCENE-7258:

Attachment: LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch

This patch does the following:

1. Moves the FBS threshold from 1/128th to 1/256th for 
IntersectsPrefixTreeQuery.
2. Changes the expansion policy to 2x when used by IntersectsPrefixTreeQuery
3. Changed the sort algorithm in DocIdSetBuilder (for ALL usages) to 
InPlaceMergeSorter, since LSBRadixSorter requires allocating a new array of 
size N.
4. In order to do #1 & #2, I had to add parameter support for the threshold and 
expansion policies.

Justifications: 
1. Since Geospatial data is typically non-uniform, a smaller threshold seemed 
reasonable.
2. A more aggressive expansion policy results in less wasted allocations, 
particularly for short-lived data structures.
3. This one might be controversial since it affects more than just geospatial 
search, but I thought I'd see what happened if I saved the memory. I also 
considered TimSort, which has a configurable memory cost, but LUCENE-5140 gave 
me some pause. 

> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
> Attachments: 
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7258) Tune DocIdSetBuilder allocation rate

2016-04-26 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258834#comment-15258834
 ] 

Jeff Wartes commented on LUCENE-7258:
-

The "eia" label represents using the ExpandingIntArray approach from SOLR-8922. 
It suffered somewhat in my plot because I accounted for the fact that when 
you're done collecting, you need to convert it to a single array for sorting 
purposes. (if you haven't overflowed into a FBS, anyway, and want to use the 
usual Sorters.)


> Tune DocIdSetBuilder allocation rate
> 
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial
>Reporter: Jeff Wartes
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to 
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies 
> to see if I could tune things more. 
> See here: http://i.imgur.com/7sXLAYv.jpg 
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up 
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index 
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
> terrible from an allocation standpoint if you're doing a lot of expansions, 
> and is especially terrible when used to build a short-lived data structure 
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory 
> for the buffer as it would have needed for just the FBS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7258) Tune Spatial RPT Intersects allocation rate

2016-04-26 Thread Jeff Wartes (JIRA)
Jeff Wartes created LUCENE-7258:
---

 Summary: Tune Spatial RPT Intersects allocation rate
 Key: LUCENE-7258
 URL: https://issues.apache.org/jira/browse/LUCENE-7258
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: Jeff Wartes


LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but 
didn't actually reduce garbage generation for my Solr index.

Since something like 40% of my garbage (by space) is now attributed to 
DocIdSetBuilder.growBuffer, I charted a few different allocation strategies to 
see if I could tune things more. 

See here: http://i.imgur.com/7sXLAYv.jpg 
The jump-then-flatline at the right would be where DocIdSetBuilder gives up and 
allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index curve/cutoff 
looked similar)

Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is 
terrible from an allocation standpoint if you're doing a lot of expansions, and 
is especially terrible when used to build a short-lived data structure like 
this one.
By the time it goes with the FBS, it's allocated around twice as much memory 
for the buffer as it would have needed for just the FBS.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

2016-04-12 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237853#comment-15237853
 ] 

Jeff Wartes commented on SOLR-8944:
---

Results from applying this patch were quite positive, but for more subtle 
reasons than I'd expected.

To my surprise, the quantity of garbage generated (by size) over my test run 
was mostly unchanged, as was the frequency of collections. However, the garbage 
collector (ParNew) seemed to have a *much* easier time with what was being 
generated. Avg GC pause went down 45%, and max GC pause for the run was cut in 
half. 

I'm not sure I can even speculate on what makes for easier work within ParNew.

>From an allocation rate standpoint, I'm guessing that my test run sits near 
>the edge of where the DocIdSetBuilder's buffer remains efficient from an 
>allocation size perspective. Naively that looks like about a hit rate 
>threshold of 25%, but suspect it's a lot more complicated than that, since 
>DocIdSetBuilder grows the buffer in 1/8th increments and throws away the old 
>allocations, which generates more garbage. (By contrast, SOLR-8922 uses 1/64 
>as the threshold instead of 1/128, but allocates additional space in 2x 
>increments, and doesn't throw away what's already been allocated)

Looking at some before/after memory snapshots, the allocation size attributed 
to long[] in FixedBitSet is indeed down, but mostly replaced by lots of int[] 
allocations attributed to DocIdSetBuilder.growBuffer, as we might expect given 
that overall allocation size didn't change much.

In general, this is a desirable enough patch for my index that I'd be willing 
to move it into a Lucene issue just on it's face, but it still feels like there 
is some room for improvement. I suppose I should have made this a Lucene issue 
in the first place, but given that I'm running with and testing with Solr I 
wasn't sure how that fit.



> Improve geospatial garbage generation
> -
>
> Key: SOLR-8944
> URL: https://issues.apache.org/jira/browse/SOLR-8944
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
>  Labels: spatialrecursiveprefixtreefieldtype
> Attachments: 
> SOLR-8944-Use-DocIdSetBuilder-instead-of-FixedBitSet.patch
>
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal 
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
> backtraces, it appears the biggest source of FixBitSet creation in my case 
> (by two orders of magnitude) is my use of queries that involve geospatial 
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
> which presumably changes less frequently than queries are issued. If an 
> existing FixedBitSet were not available from a pool, the worst case (create a 
> new one) would be no worse than the current behavior. The complication would 
> be enforcement around when to return the object to the pool, but it looks 
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts 
> considerable effort into allocating smaller chunks only as necessary. Is this 
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little 
> more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8944) Improve geospatial garbage generation

2016-04-12 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-8944:
--
Attachment: SOLR-8944-Use-DocIdSetBuilder-instead-of-FixedBitSet.patch

[~dsmiley]'s suggestion was almost too trivial a change to create a patch for, 
but here it is. This was against 5.4. The path of the class has changed in 
master, but the contents have not, so the patch should apply there too.

> Improve geospatial garbage generation
> -
>
> Key: SOLR-8944
> URL: https://issues.apache.org/jira/browse/SOLR-8944
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
>  Labels: spatialrecursiveprefixtreefieldtype
> Attachments: 
> SOLR-8944-Use-DocIdSetBuilder-instead-of-FixedBitSet.patch
>
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal 
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
> backtraces, it appears the biggest source of FixBitSet creation in my case 
> (by two orders of magnitude) is my use of queries that involve geospatial 
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
> which presumably changes less frequently than queries are issued. If an 
> existing FixedBitSet were not available from a pool, the worst case (create a 
> new one) would be no worse than the current behavior. The complication would 
> be enforcement around when to return the object to the pool, but it looks 
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts 
> considerable effort into allocating smaller chunks only as necessary. Is this 
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little 
> more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8241) Evaluate W-TinyLfu cache

2016-04-11 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235935#comment-15235935
 ] 

Jeff Wartes commented on SOLR-8241:
---

Since Solr requires Java 8 as of 6.0, it seems like this patch could be applied 
pretty easily now?

> Evaluate W-TinyLfu cache
> 
>
> Key: SOLR-8241
> URL: https://issues.apache.org/jira/browse/SOLR-8241
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Reporter: Ben Manes
>Priority: Minor
> Attachments: SOLR-8241.patch
>
>
> SOLR-2906 introduced an LFU cache and in-progress SOLR-3393 makes it O(1). 
> The discussions seem to indicate that the higher hit rate (vs LRU) is offset 
> by the slower performance of the implementation. An original goal appeared to 
> be to introduce ARC, a patented algorithm that uses ghost entries to retain 
> history information.
> My analysis of Window TinyLfu indicates that it may be a better option. It 
> uses a frequency sketch to compactly estimate an entry's popularity. It uses 
> LRU to capture recency and operate in O(1) time. When using available 
> academic traces the policy provides a near optimal hit rate regardless of the 
> workload.
> I'm getting ready to release the policy in Caffeine, which Solr already has a 
> dependency on. But, the code is fairly straightforward and a port into Solr's 
> caches instead is a pragmatic alternative. More interesting is what the 
> impact would be in Solr's workloads and feedback on the policy's design.
> https://github.com/ben-manes/caffeine/wiki/Efficiency



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-04-06 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229260#comment-15229260
 ] 

Jeff Wartes commented on SOLR-8922:
---

I stumbled onto SJK recently, which provides me a more lightweight way to 
measure allocation rate on my production nodes, and also eliminate startup 
noise from the measurement. 
According to this tool, the node with this patch is allocating heap space at 
roughly 60% of the rate that the others are.
That's reasonably consistent with my other measurements, and a pretty big 
improvement.

If anyone decides to pull this in, I'd appreciate it getting applied to the 5.5 
branch as well, in case there's a 5.5.1 release.

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

2016-04-05 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226948#comment-15226948
 ] 

Jeff Wartes commented on SOLR-8944:
---

I hadn't refreshed and didn't see this comment before I added mine, but thanks 
for the info, I appreciate the references and context. I'll take a look at what 
would be involved with DocIdSetBuilder.

I also feel like I should mention though, that class will be the third case of 
a hardcoded magic fraction of maxDoc I've come across in the context of 
investigating allocations this last week. It might be worth considering whether 
the gyrations around avoiding the creation of these BitSets is more or less 
complicated than managing a pool would be.

> Improve geospatial garbage generation
> -
>
> Key: SOLR-8944
> URL: https://issues.apache.org/jira/browse/SOLR-8944
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
>  Labels: spatialrecursiveprefixtreefieldtype
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal 
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
> backtraces, it appears the biggest source of FixBitSet creation in my case 
> (by two orders of magnitude) is my use of queries that involve geospatial 
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
> which presumably changes less frequently than queries are issued. If an 
> existing FixedBitSet were not available from a pool, the worst case (create a 
> new one) would be no worse than the current behavior. The complication would 
> be enforcement around when to return the object to the pool, but it looks 
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts 
> considerable effort into allocating smaller chunks only as necessary. Is this 
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little 
> more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8944) Improve geospatial garbage generation

2016-04-05 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226709#comment-15226709
 ] 

Jeff Wartes commented on SOLR-8944:
---

It was an easy test, so I tried simply using a SparseFixedBitSet instead. That 
only bought me about a 5% overall reduction in allocation rate. (Again, this is 
after applying SOLR-8922) 
Since I don't have any data on the performance impact (cpu/latency) of 
SparseFixedBitSet vs FixedBitSet, the relatively low difference in allocation 
rate makes it feel like an object pool approach might be worth the extra work.

> Improve geospatial garbage generation
> -
>
> Key: SOLR-8944
> URL: https://issues.apache.org/jira/browse/SOLR-8944
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
>  Labels: spatialrecursiveprefixtreefieldtype
>
> I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
> (5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)
> After applying SOLR-8922, I find my biggest source of garbage by a literal 
> order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
> backtraces, it appears the biggest source of FixBitSet creation in my case 
> (by two orders of magnitude) is my use of queries that involve geospatial 
> filtering.
> Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
> https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60
> Has this been considered for optimization? I can think of a few paths:
> 1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
> which presumably changes less frequently than queries are issued. If an 
> existing FixedBitSet were not available from a pool, the worst case (create a 
> new one) would be no worse than the current behavior. The complication would 
> be enforcement around when to return the object to the pool, but it looks 
> like this has some lifecycle hooks already.
> 2. I note that a thing called a SparseFixedBitSet already exists, and puts 
> considerable effort into allocating smaller chunks only as necessary. Is this 
> not usable for this purpose? How significant is the performance difference?
> I'd be happy to spend some time on a patch, but I was hoping for a little 
> more data around the current choices before choosing an approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8944) Improve geospatial garbage generation

2016-04-04 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-8944:
-

 Summary: Improve geospatial garbage generation
 Key: SOLR-8944
 URL: https://issues.apache.org/jira/browse/SOLR-8944
 Project: Solr
  Issue Type: Improvement
Reporter: Jeff Wartes


I’ve been continuing some analysis into JVM garbage sources in my Solr index. 
(5.4, 86M docs/core, 56k 99.9th percentile hit count with my query corpus)

After applying SOLR-8922, I find my biggest source of garbage by a literal 
order of magnitude (by size) is the long[] allocated by FixedBitSet. From the 
backtraces, it appears the biggest source of FixBitSet creation in my case (by 
two orders of magnitude) is my use of queries that involve geospatial filtering.

Specifically, IntersectsPrefixTreeQuery.getDocIdSet, here:
https://github.com/apache/lucene-solr/blob/569b6ca9ca439ee82734622f35f6b6342c0e9228/lucene/spatial-extras/src/java/org/apache/lucene/spatial/prefix/IntersectsPrefixTreeQuery.java#L60

Has this been considered for optimization? I can think of a few paths:

1. Persistent Object pools - FixedBitSet size is allocated based on maxDoc, 
which presumably changes less frequently than queries are issued. If an 
existing FixedBitSet were not available from a pool, the worst case (create a 
new one) would be no worse than the current behavior. The complication would be 
enforcement around when to return the object to the pool, but it looks like 
this has some lifecycle hooks already.
2. I note that a thing called a SparseFixedBitSet already exists, and puts 
considerable effort into allocating smaller chunks only as necessary. Is this 
not usable for this purpose? How significant is the performance difference?

I'd be happy to spend some time on a patch, but I was hoping for a little more 
data around the current choices before choosing an approach.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-31 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220342#comment-15220342
 ] 

Jeff Wartes commented on SOLR-8922:
---

Both of those appear to add capacity by declaring a new array and doing a 
System.arraycopy.
Wouldn't that just result in more space allocated and then thrown away?

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-31 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220299#comment-15220299
 ] 

Jeff Wartes commented on SOLR-8922:
---

With some tweaking, I was able to get G1 pause to about the same ballpark as I 
get with ParNew/CMS. But without a compelling difference, the Lucene 
recommendation against G1 keep me away.

This issue is more about garbage generation though. Less garbage should be a 
benefit regardless of the collector you choose.

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-31 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220136#comment-15220136
 ] 

Jeff Wartes commented on SOLR-8922:
---

Ok, so after a little more than 12 hours on one of my production nodes, there 
was no noticeable change in CPU usage. 
Running before/after GC logs through GCViewer, it's a little hard to compare 
rate, since the logs were for different intervals and the "after" log included 
startup. That said, "Freed mem/minute" was down by 44%, and "Throughput" went 
from 87% to 93%. I also see noticeably reduced average pause time, and 
increased average pause interval. All positive signs. 

The only irritation I'm finding here is that it looks like the CMS collector is 
running more often. I expect that's simply because I changed the footing of a 
fairly tuned set of GC parameters though.



> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-31 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220052#comment-15220052
 ] 

Jeff Wartes commented on SOLR-8922:
---

Incidentally, I had one or two other findings from my garbage analysis. 
Solutions are less obvious there though, and probably involve some 
conversation. Is Jira the right place for that, or is there another medium more 
appropriate?

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-31 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220045#comment-15220045
 ] 

Jeff Wartes commented on SOLR-8922:
---

Absolutely. Memory pools were my first thought, between when I saw that 60% and 
when I looked at my hit rates and realized the allocation size was could just 
be changed. I had started poking around the internet for terms like "slab 
allocators" and "direct byte buffers", but even an on-heap persistent pool 
sounded good to me. Or, if you had persistent tracking of hit rates for the 
optimization, perhaps the size of the scratch array could optimize itself over 
time. All of that would be more complicated, of course.

I did look one other place worth mentioning though. In Heliosearch the way the 
DocSetCollector handles the "scratch" array isn't any different, but it's 
interesting because it added a lifecycle with a close() method to the class, to 
support the native bitset implementation. Knowing that it's possible to impose 
a lifecycle on the class, checking things out and back into a persistent memory 
pool should be easy.

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-30 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219066#comment-15219066
 ] 

Jeff Wartes commented on SOLR-8922:
---

Not yet. The major risk area would be the new ExpandingIntArray class, but it 
looked reasonable. It expands along powers of two, and although the add() and 
copyTo() calls are certainly more work than simple array assignment/retrieval, 
it still all looks like pretty simple stuff. A few ArrayList calls and some 
simple numeric comparisons mostly. 
I'm more worried about bugs in there than performance, I don't know how well 
[~steff1193] tested this, although I got the impression he was using it in 
production at the time.

There may be better approaches, but this one was handy and I'm excited enough 
that I'm going to be doing a production test. I'll have more info in a day or 
two.

As a side note, I got a similar garbage-related improvement on an earlier test 
by simply hard-coding the smallSetSize to 10 - the expanding arrays 
approach only bought me another 3%. But of course, that 10 is very index 
and query set dependant, so I didn't want to offer it as a general case.

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-30 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218974#comment-15218974
 ] 

Jeff Wartes commented on SOLR-8922:
---

For my index, (86M-doc shards and a per-shard 99.9th percentile query hit count 
of 56k) this reduced total garbage generation by 33%, which naturally also 
brought significant improvements in gc pause and frequency.


> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-30 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-8922:
--
Attachment: SOLR-8922.patch

This is essentially the same patch as in SOLR-5444, but applies cleanly against 
(at least) 5.4 where I did some GC testing, and master.

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
> Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-30 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15218910#comment-15218910
 ] 

Jeff Wartes commented on SOLR-8922:
---

SOLR-5444 had a patch to help with this, 
(SOLR-5444_ExpandingIntArray_DocSetCollector_4_4_0.patch) but it was mixed in 
with some other things, and didn't get picked up with the other parts of the 
issue.

> DocSetCollector can allocate massive garbage on large indexes
> -
>
> Key: SOLR-8922
> URL: https://issues.apache.org/jira/browse/SOLR-8922
> Project: Solr
>  Issue Type: Improvement
>Reporter: Jeff Wartes
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

2016-03-30 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-8922:
-

 Summary: DocSetCollector can allocate massive garbage on large 
indexes
 Key: SOLR-8922
 URL: https://issues.apache.org/jira/browse/SOLR-8922
 Project: Solr
  Issue Type: Improvement
Reporter: Jeff Wartes


After reaching a point of diminishing returns tuning the GC collector, I 
decided to take a look at where the garbage was coming from. To my surprise, it 
turned out that for my index and query set, almost 60% of the garbage was 
coming from this single line:

https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49

This is due to the simple fact that I have 86M documents in my shards. 
Allocating a scratch array big enough to track a result set 1/64th of my index 
(1.3M) is also almost certainly excessive, considering my 99.9th percentile hit 
count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7887) Upgrade Solr to use log4j2 -- log4j 1 now officially end of life

2016-03-25 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212210#comment-15212210
 ] 

Jeff Wartes commented on SOLR-7887:
---

SOLR-7698 and SOLR-6377 both have attempts at logback watchers, I think.

I'll also second the desire for async appenders, and I'd go further to suggest 
it as the default. Solr does a lot of logging, and this gets that work out of 
the critical path for query latency. 

> Upgrade Solr to use log4j2 -- log4j 1 now officially end of life
> 
>
> Key: SOLR-7887
> URL: https://issues.apache.org/jira/browse/SOLR-7887
> Project: Solr
>  Issue Type: Task
>Affects Versions: 5.2.1
>Reporter: Shawn Heisey
>
> The logging services project has officially announced the EOL of log4j 1:
> https://blogs.apache.org/foundation/entry/apache_logging_services_project_announces
> In the official binary jetty deployment, we use use log4j 1.2 as our final 
> logging destination, so the admin UI has a log watcher that actually uses 
> log4j and java.util.logging classes.  That will need to be extended to add 
> log4j2.  I think that might be the largest pain point to this upgrade.
> There is some crossover between log4j2 and slf4j.  Figuring out exactly which 
> jars need to be in the lib/ext directory will take some research.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8785) Use Metrics library for core metrics

2016-03-03 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-8785:
-

 Summary: Use Metrics library for core metrics
 Key: SOLR-8785
 URL: https://issues.apache.org/jira/browse/SOLR-8785
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.1
Reporter: Jeff Wartes


The Metrics library (https://dropwizard.github.io/metrics/3.1.0/) is a 
well-known way to track metrics about applications. 

In SOLR-1972, latency percentile tracking was added. The comment list is long, 
so here’s my synopsis:

1. An attempt was made to use the Metrics library
2. That attempt failed due to a memory leak in Metrics v2.1.1
3. Large parts of Metrics were then copied wholesale into the 
org.apache.solr.util.stats package space and that was used instead.

Copy/pasting Metrics code into Solr may have been the correct solution at the 
time, but I submit that it isn’t correct any more. 
The leak in Metrics was fixed even before SOLR-1972 was released, and by 
copy/pasting a subset of the functionality, we miss access to other important 
things that the Metrics library provides, particularly the concept of a 
Reporter. (https://dropwizard.github.io/metrics/3.1.0/manual/core/#reporters)

Further, Metrics v3.0.2 is already packaged with Solr anyway, because it’s used 
in two contrib modules. (map-reduce and morphines-core)

I’m proposing that:

1. Metrics as bundled with Solr be upgraded to the current v3.1.2
2. Most of the org.apache.solr.util.stats package space be deleted outright, or 
gutted and replaced with simple calls to Metrics. Due to the copy/paste origin, 
the concepts should mostly map 1:1.

I’d further recommend a usage pattern like:
SharedMetricRegistries.getOrCreate(System.getProperty(“solr.metrics.registry”, 
“solr-registry”))

There are all kinds of areas in Solr that could benefit from metrics tracking 
and reporting. This pattern allows diverse areas of code to track metrics 
within a single, named registry. This well-known-name then becomes a handle you 
can use to easily attach a Reporter and ship all of those metrics off-box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8725) Invalid name error with core names with hyphens

2016-02-26 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169572#comment-15169572
 ] 

Jeff Wartes commented on SOLR-8725:
---

Well, I'm glad I haven't tried moving to 5.5 yet, this would have been an 
unpleasant migration discovery.
I use hyphens in both collection names and alias names. (although not as a 
leading character) 
Generally, I prefer to avoid using underscores anyplace that ends up in a URL.


> Invalid name error with core names with hyphens
> ---
>
> Key: SOLR-8725
> URL: https://issues.apache.org/jira/browse/SOLR-8725
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.5
>Reporter: Chris Beer
>
> In SOLR-8642, hyphens are no longer considered valid identifiers for cores 
> (and collections?). Our solr instance was successfully using hyphens in our 
> core names, and our affected cores now error with:
> marc-profiler_shard1_replica1: 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
> Invalid name: 'marc-profiler_shard1_replica1' Identifiers must consist 
> entirely of periods, underscores and alphanumerics
> Before starting to rename all of our collections, I wonder if this decision 
> could be revisited to be backwards compatible with previously created 
> collections.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8531) ZK leader path changed in 5.4

2016-01-26 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117568#comment-15117568
 ] 

Jeff Wartes commented on SOLR-8531:
---

Looks like this was resolved in SOLR-8561

> ZK leader path changed in 5.4
> -
>
> Key: SOLR-8531
> URL: https://issues.apache.org/jira/browse/SOLR-8531
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.4
>Reporter: Jeff Wartes
>
> While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I 
> observed that upgraded nodes would not register their shards as active unless 
> they were elected the leader for the shard.
> There were no errors, the shards were fully up and responsive, but would not  
> publish any change from the "down" state.
> This appears to be because the recovery process never happens, because the ZK 
> node containing the current leader can't be found, because the ZK path has 
> changed.
> Specifically, the leader data node changed from:
> /leaders/
> to
> /leaders//leader
> It looks to me like this happened during SOLR-7844, perhaps accidentally. 
> At the least, the "Migrating to Solr 5.4" section of the README should get 
> updated with this info, since it means a rolling upgrade of a collection with 
> multiple replicas will suffer serious degradation in the number of active 
> replicas as nodes are upgraded. It's entirely possible this will reduce some 
> shards to a single active replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-8531) ZK leader path changed in 5.4

2016-01-26 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes resolved SOLR-8531.
---
   Resolution: Fixed
Fix Version/s: 5.4.1

> ZK leader path changed in 5.4
> -
>
> Key: SOLR-8531
> URL: https://issues.apache.org/jira/browse/SOLR-8531
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.4
>Reporter: Jeff Wartes
> Fix For: 5.4.1
>
>
> While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I 
> observed that upgraded nodes would not register their shards as active unless 
> they were elected the leader for the shard.
> There were no errors, the shards were fully up and responsive, but would not  
> publish any change from the "down" state.
> This appears to be because the recovery process never happens, because the ZK 
> node containing the current leader can't be found, because the ZK path has 
> changed.
> Specifically, the leader data node changed from:
> /leaders/
> to
> /leaders//leader
> It looks to me like this happened during SOLR-7844, perhaps accidentally. 
> At the least, the "Migrating to Solr 5.4" section of the README should get 
> updated with this info, since it means a rolling upgrade of a collection with 
> multiple replicas will suffer serious degradation in the number of active 
> replicas as nodes are upgraded. It's entirely possible this will reduce some 
> shards to a single active replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-8531) ZK leader path changed in 5.4

2016-01-26 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes closed SOLR-8531.
-

> ZK leader path changed in 5.4
> -
>
> Key: SOLR-8531
> URL: https://issues.apache.org/jira/browse/SOLR-8531
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.4
>Reporter: Jeff Wartes
> Fix For: 5.4.1
>
>
> While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I 
> observed that upgraded nodes would not register their shards as active unless 
> they were elected the leader for the shard.
> There were no errors, the shards were fully up and responsive, but would not  
> publish any change from the "down" state.
> This appears to be because the recovery process never happens, because the ZK 
> node containing the current leader can't be found, because the ZK path has 
> changed.
> Specifically, the leader data node changed from:
> /leaders/
> to
> /leaders//leader
> It looks to me like this happened during SOLR-7844, perhaps accidentally. 
> At the least, the "Migrating to Solr 5.4" section of the README should get 
> updated with this info, since it means a rolling upgrade of a collection with 
> multiple replicas will suffer serious degradation in the number of active 
> replicas as nodes are upgraded. It's entirely possible this will reduce some 
> shards to a single active replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8531) ZK leader path changed in 5.4

2016-01-10 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091427#comment-15091427
 ] 

Jeff Wartes commented on SOLR-8531:
---

I just looked again, and 5.4 is indeed writing the leader data to both places. 
Perhaps 5.4 is only looking in the new place?
This is speculation, but if so, a possible upgrade path might have been to try 
to get the first 5.4 node for each shard to be the leader, (preferredLeader 
property?) and then the rest of the rollout would work.  
As I mentioned, I didn't check what happened when I restarted a 5.3 node while 
5.4 was leader though.

> ZK leader path changed in 5.4
> -
>
> Key: SOLR-8531
> URL: https://issues.apache.org/jira/browse/SOLR-8531
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.4
>Reporter: Jeff Wartes
>
> While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I 
> observed that upgraded nodes would not register their shards as active unless 
> they were elected the leader for the shard.
> There were no errors, the shards were fully up and responsive, but would not  
> publish any change from the "down" state.
> This appears to be because the recovery process never happens, because the ZK 
> node containing the current leader can't be found, because the ZK path has 
> changed.
> Specifically, the leader data node changed from:
> /leaders/
> to
> /leaders//leader
> It looks to me like this happened during SOLR-7844, perhaps accidentally. 
> At the least, the "Migrating to Solr 5.4" section of the README should get 
> updated with this info, since it means a rolling upgrade of a collection with 
> multiple replicas will suffer serious degradation in the number of active 
> replicas as nodes are upgraded. It's entirely possible this will reduce some 
> shards to a single active replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8531) ZK leader path changed in 5.4

2016-01-09 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090732#comment-15090732
 ] 

Jeff Wartes commented on SOLR-8531:
---

1. A fully upgraded cluster behaves normally.
2. The problem is only occurs for collections with replicationFactor > 1, but 
by definition, this means you only have problems if you're trying an HA upgrade.

Upgraded nodes got in line for leader election as normal, but could not figure 
out the current leader on start, and never executed replication recovery and 
became active. If I restarted 5.3 nodes for a given shard, the 5.4 shard would 
eventually get elected leader, and publish active state without intervention, 
but restarting the 5.4 shard again would mean a 5.3 shard got elected, and the 
5.4 node would be stuck in 'down' state again. I did not test restarting a 5.3 
shard while the 5.4 shard was leader.

In my case I had sufficient production capacity to upgrade half my cluster, 
create a new collection in 5.4, copy the data into it, and then upgrade the 
rest of the cluster, so I did that. 
As mentioned, taking downtime and upgrading the whole cluster at once would 
also have worked.


> ZK leader path changed in 5.4
> -
>
> Key: SOLR-8531
> URL: https://issues.apache.org/jira/browse/SOLR-8531
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.4
>Reporter: Jeff Wartes
>
> While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I 
> observed that upgraded nodes would not register their shards as active unless 
> they were elected the leader for the shard.
> There were no errors, the shards were fully up and responsive, but would not  
> publish any change from the "down" state.
> This appears to be because the recovery process never happens, because the ZK 
> node containing the current leader can't be found, because the ZK path has 
> changed.
> Specifically, the leader data node changed from:
> /leaders/
> to
> /leaders//leader
> It looks to me like this happened during SOLR-7844, perhaps accidentally. 
> At the least, the "Migrating to Solr 5.4" section of the README should get 
> updated with this info, since it means a rolling upgrade of a collection with 
> multiple replicas will suffer serious degradation in the number of active 
> replicas as nodes are upgraded. It's entirely possible this will reduce some 
> shards to a single active replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8531) ZK leader path changed in 5.4

2016-01-09 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090706#comment-15090706
 ] 

Jeff Wartes commented on SOLR-8531:
---

I was imagining a note 
https://lucene.apache.org/solr/5_4_0/changes/Changes.html#v5.4.0.upgrading_from_solr_5.3
But I could understand that being driven off of an immutable release tag.

I haven't fully read the SOLR-7844 patch for comprehension, but the change to 
ZkStateReader.java looks like the reason:
https://github.com/apache/lucene-solr/commit/65cb72631b0833f8ddcf34dfa3d4a91f2c5091c4#diff-8f54b814c3da916328992910b1ad9163

I don't immediately see the change being necessary, so I suspect it could be 
reverted or made reverse-compatible without too much trouble.

If it's the former, then I'll presumably hit the same issue again in reverse 
moving from 5.4 to 5.4.1, which could be ok now that I know to expect it.


> ZK leader path changed in 5.4
> -
>
> Key: SOLR-8531
> URL: https://issues.apache.org/jira/browse/SOLR-8531
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.4
>Reporter: Jeff Wartes
>
> While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I 
> observed that upgraded nodes would not register their shards as active unless 
> they were elected the leader for the shard.
> There were no errors, the shards were fully up and responsive, but would not  
> publish any change from the "down" state.
> This appears to be because the recovery process never happens, because the ZK 
> node containing the current leader can't be found, because the ZK path has 
> changed.
> Specifically, the leader data node changed from:
> /leaders/
> to
> /leaders//leader
> It looks to me like this happened during SOLR-7844, perhaps accidentally. 
> At the least, the "Migrating to Solr 5.4" section of the README should get 
> updated with this info, since it means a rolling upgrade of a collection with 
> multiple replicas will suffer serious degradation in the number of active 
> replicas as nodes are upgraded. It's entirely possible this will reduce some 
> shards to a single active replica.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8531) ZK leader path changed in 5.4

2016-01-08 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-8531:
-

 Summary: ZK leader path changed in 5.4
 Key: SOLR-8531
 URL: https://issues.apache.org/jira/browse/SOLR-8531
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.4
Reporter: Jeff Wartes



While doing a rolling upgrade from 5.3 to 5.4 of a solrcloud cluster, I 
observed that upgraded nodes would not register their shards as active unless 
they were elected the leader for the shard.
There were no errors, the shards were fully up and responsive, but would not  
publish any change from the "down" state.

This appears to be because the recovery process never happens, because the ZK 
node containing the current leader can't be found, because the ZK path has 
changed.

Specifically, the leader data node changed from:
/leaders/
to
/leaders//leader

It looks to me like this happened during SOLR-7844, perhaps accidentally. 

At the least, the "Migrating to Solr 5.4" section of the README should get 
updated with this info, since it means a rolling upgrade of a collection with 
multiple replicas will suffer serious degradation in the number of active 
replicas as nodes are upgraded. It's entirely possible this will reduce some 
shards to a single active replica.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4449) Enable backup requests for the internal solr load balancer

2015-12-11 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053711#comment-15053711
 ] 

Jeff Wartes commented on SOLR-4449:
---

For what it's worth, if I were a solr committer, I probably wouldn't just merge 
this issue as-is. BackupRequestLBHttpSolrClient still has a certain amount of 
copy/paste code from the parent LBHttpSolrClient class that'd become extra 
long-term maintenance load. (As it will be every time I update this issue for a 
new solr version)

Instead, I'd do something like:
1. Pull the asynchronous ExecutorCompletionService-based query approach into 
the LBHttpSolrClient itself. This would be interesting and useful functionality 
in it's own right. 
2. Add the concept of a shardTimeout. (Distinct from timeAllowed)
3. Add extendable support for how to handle a shardTimeout. If a strategy ends 
up making a request to another server in the list, that request must be 
submitted to the same ExecutorCompletionService so that in all cases, 
LBHttpSolrClient would return the first response among the submitted requests. 
4. The backup-request functionality could still then be a class extending 
LBHttpSolrClient, but the only real code there would be defining the 
shardTimeout for a given request, and how to handle a shardTimeout if there was 
one.

I'd probably audit the access restrictions in LBHttpSolrClient while I was at 
it though, since solrconfig.xml provides such an easy way to use alternate 
implementations of that class. A lot of the existing code in 
BackupRequestLBHttpSolrClient is only necessary due to not having sufficient 
access to the parent class. (isTimeExceeded/getTimeAllowedInNanos seem 
generally useful to have, for example, and I'm not sure why doRequest is 
protected)


> Enable backup requests for the internal solr load balancer
> --
>
> Key: SOLR-4449
> URL: https://issues.apache.org/jira/browse/SOLR-4449
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
> Attachments: SOLR-4449.patch, SOLR-4449.patch, SOLR-4449.patch, 
> patch-4449.txt, solr-back-request-lb-plugin.jar
>
>
> Add the ability to configure the built-in solr load balancer such that it 
> submits a backup request to the next server in the list if the initial 
> request takes too long. Employing such an algorithm could improve the latency 
> of the 9xth percentile albeit at the expense of increasing overall load due 
> to additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4449) Enable backup requests for the internal solr load balancer

2015-12-11 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053665#comment-15053665
 ] 

Jeff Wartes commented on SOLR-4449:
---

I looked around a bit, and unless I'm missing something, it looks like 
solr-core doesn't really use metrics-core. At the end of SOLR-1972, the 
necessary classes were just copy/pasted into the solr codeline. It sounds like 
this was mostly due to being nervous after encountering some problems in the 
metrics-core version at the time, and an aversion to a global registry 
approach. 

Unfortunately, this means that although requesthandlers have statistics, they 
cannot be attached to a metrics Reporter, and instead you have to develop 
something to interrogate JMX or some such. 

Solr does include metrics-core 3.0.1, but there's only a few places it actually 
gets used, and only in contrib modules.

I didn't have the negative experience with metrics-core. In fact, my 
experiences with 3.1.2 over the last year and a half has been universally 
positive. So when I added backup-percentile support to this issue I relied 
heavily on the global SharedMetricsRegistry and the assumption that the library 
was threadsafe in general. My scattershot code reviews of the metrics library 
have generally enforced my opinion that this is ok, and I'm using my version of 
this issue in production now. Initializing a well-known-named shared registry 
with an attached reporter in jetty.xml has yielded all kinds of great 
performance data.

This might be a useful point of information if anyone gets back to SOLR-4735. 
I'll mention here if I do encounter any metrics-core related issues in the 
future.

> Enable backup requests for the internal solr load balancer
> --
>
> Key: SOLR-4449
> URL: https://issues.apache.org/jira/browse/SOLR-4449
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
> Attachments: SOLR-4449.patch, SOLR-4449.patch, SOLR-4449.patch, 
> patch-4449.txt, solr-back-request-lb-plugin.jar
>
>
> Add the ability to configure the built-in solr load balancer such that it 
> submits a backup request to the next server in the list if the initial 
> request takes too long. Employing such an algorithm could improve the latency 
> of the 9xth percentile albeit at the expense of increasing overall load due 
> to additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-8171) Facet query filterCache usage is psychic

2015-10-19 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-8171:
-

 Summary: Facet query filterCache usage is psychic
 Key: SOLR-8171
 URL: https://issues.apache.org/jira/browse/SOLR-8171
 Project: Solr
  Issue Type: Bug
  Components: faceting
Affects Versions: 5.3
Reporter: Jeff Wartes


>From this thread:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201510.mbox/%3cd23998fc.6fa32%25jwar...@whitepages.com%3E

There's really a few points here, which may be different issues:

1. Either facet queries aren't using the filterCache correctly, or the stats 
don't reflect actual usage. (Or it's psychic.) Somehow, "lookups" only ever 
gets incremented when "hits" does, yielding a 100% cache hit rate at all times.
2. Facet queries appear to use the filterCache as a queryResultCache. Meaning, 
only identical facet queries cause filterCache "hits" to increase. 
Interestingly, disabling the queryResultCache still results in facet queries 
doing *inserts* into the filterCache, but no longer allows stats-reported 
*usage* of those entries.

If the stats are right and facet queries *aren't* actually using the 
filterCache for anything except possible future searches, then there should be 
a mechanism for disabling facet query filterCache usage to avoid filling the 
filterCache with low usage queries. Honestly though, that sounds more like 
something for the queryResultCache than filterCache anyway.

If facet queries *are* using the filterCache for performance within a single 
query, I'd suggest that facet queries should have their own named cache 
specifically for that use, rather than try to share a task load (size, 
regenerator) with the generic filterCache.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4449) Enable backup requests for the internal solr load balancer

2015-10-08 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14949141#comment-14949141
 ] 

Jeff Wartes commented on SOLR-4449:
---

Ah, great. Yeah, same thing. I knew it had been discussed in SOLR-4735, but 
since that hadn't been merged, I didn't even bother checking if it already 
existed.

Thanks for the reference. After reading through the comment history in 
SOLR-1972, it seems like I should look closely at that integration and see if I 
can leverage anything existing there.

> Enable backup requests for the internal solr load balancer
> --
>
> Key: SOLR-4449
> URL: https://issues.apache.org/jira/browse/SOLR-4449
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
> Attachments: SOLR-4449.patch, SOLR-4449.patch, SOLR-4449.patch, 
> patch-4449.txt, solr-back-request-lb-plugin.jar
>
>
> Add the ability to configure the built-in solr load balancer such that it 
> submits a backup request to the next server in the list if the initial 
> request takes too long. Employing such an algorithm could improve the latency 
> of the 9xth percentile albeit at the expense of increasing overall load due 
> to additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4449) Enable backup requests for the internal solr load balancer

2015-10-07 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947724#comment-14947724
 ] 

Jeff Wartes commented on SOLR-4449:
---

I've added performance tracking to this, so that you can request a backup 
request at (say) the 95th percentile latency for a given performance class of 
query.

I'm likely going to continue on this path, but this adds a dependency on 
metrics-core, so I've dropped a tag (5.3_port_complete) just prior to those 
changes. Anyone interested in merging something like this may prefer to work 
from that.

> Enable backup requests for the internal solr load balancer
> --
>
> Key: SOLR-4449
> URL: https://issues.apache.org/jira/browse/SOLR-4449
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
> Attachments: SOLR-4449.patch, SOLR-4449.patch, SOLR-4449.patch, 
> patch-4449.txt, solr-back-request-lb-plugin.jar
>
>
> Add the ability to configure the built-in solr load balancer such that it 
> submits a backup request to the next server in the list if the initial 
> request takes too long. Employing such an algorithm could improve the latency 
> of the 9xth percentile albeit at the expense of increasing overall load due 
> to additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8059) NPE distributed DebugComponent

2015-09-28 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933631#comment-14933631
 ] 

Jeff Wartes commented on SOLR-8059:
---

I've seen this too. I assumed it was related to 
https://issues.apache.org/jira/browse/SOLR-1880, but I've never investigated.


> NPE distributed DebugComponent
> --
>
> Key: SOLR-8059
> URL: https://issues.apache.org/jira/browse/SOLR-8059
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 5.3
>Reporter: Markus Jelsma
>Assignee: Shalin Shekhar Mangar
> Fix For: 5.4
>
>
> The following URL select?debug=true=*:*=id,score yields
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.solr.handler.component.DebugComponent.finishStage(DebugComponent.java:229)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:416)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:210)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>   at org.eclipse.jetty.server.Server.handle(Server.java:499)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>   at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> I can reproduce it everytime. Strange enough fl=*,score, or any other content 
> field does not! I have seen this happening in Highlighter as well on the same 
> code path. It makes little sense, how would fl influence that piece of code, 
> the id is requested in fl afterall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4449) Enable backup requests for the internal solr load balancer

2015-09-21 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901555#comment-14901555
 ] 

Jeff Wartes commented on SOLR-4449:
---

I pulled this patch out into a freestanding jar and ported it to Solr 5.3. 

I tried to pull in all the things that had changed since they were copied from 
the parent class in 4.4, and added per-request backup time support. 
Sadly, there were still a few places where package-protected restrictions got 
in the way, (Rsp.server and LBHttpSolrClient.doRequest in particular) so even 
as a separate jar, this must be loaded by the same classloader as 
LBHttpSolrClient, not via solr's lib inclusion mechanism.

After this long, it feels unlikely this feature will get merged, but if there's 
any interest in that it should still be pretty simple to just copy the files 
back into the solr source tree, I didn't change any paths or package names, and 
I'd be happy to upload another patch file.

My version can be found here:
https://github.com/whitepages/SOLR-4449

For those who were wondering about the effect of this stuff, in one test today 
I cut my median query response time in half, at a cost of about 15% more 
cluster-wide cpu, simply by using this and setting the backupRequestDelay to 
half my observed ParNew GC pause. The next logical step would be 
performance-aware backup request settings, like "issue a backup request when 
you exceed your 95th percentile latency for a given requestHandler or 
queryPerformanceClass".

My thanks to [~phloy] for authoring this.

> Enable backup requests for the internal solr load balancer
> --
>
> Key: SOLR-4449
> URL: https://issues.apache.org/jira/browse/SOLR-4449
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
> Attachments: SOLR-4449.patch, SOLR-4449.patch, SOLR-4449.patch, 
> patch-4449.txt, solr-back-request-lb-plugin.jar
>
>
> Add the ability to configure the built-in solr load balancer such that it 
> submits a backup request to the next server in the list if the initial 
> request takes too long. Employing such an algorithm could improve the latency 
> of the 9xth percentile albeit at the expense of increasing overall load due 
> to additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7698) solr alternative logback contrib

2015-08-19 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703723#comment-14703723
 ] 

Jeff Wartes commented on SOLR-7698:
---

Also see SOLR-6377

 solr alternative logback  contrib
 -

 Key: SOLR-7698
 URL: https://issues.apache.org/jira/browse/SOLR-7698
 Project: Solr
  Issue Type: New Feature
Affects Versions: 5.2.1
Reporter: Linbin Chen
  Labels: logback
 Fix For: 5.3

 Attachments: SOLR-7698.patch


 alternative use logback support
 solr.xml like
 {code:xml}
 solr
 !-- ... --
   logging
 str name=classorg.apache.solr.logging.logback.LogbackWatcher/str
 bool name=enabledtrue/bool
 watcher
   int name=size50/int
   str name=thresholdWARN/str
 /watcher
   /logging
 !-- ... --
 /solr
 {code}
 solr-X.X.X/server/lib/ext  remove:
  * log4j-1.2.X.jar
  * slf4j-log4j12-1.7.X.jar
 add :
  * log4j-over-slf4j-1.7.7.jar
  * logback-classic-1.1.3.jar
  * logback-core-1.1.3.jar
 example : https://github.com/chenlb/vootoo/wiki/Logback-for-solr-logging



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally

2015-04-30 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-7493:
-

 Summary: Requests aren't distributed evenly if the collection 
isn't present locally
 Key: SOLR-7493
 URL: https://issues.apache.org/jira/browse/SOLR-7493
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Jeff Wartes


I had a SolrCloud cluster where every node is behind a simple round-robin load 
balancer.
This cluster had two collections (A, B), and the slices of each were 
partitioned such that one collection (A) used two thirds of the nodes, and the 
other collection (B) used the remaining third of the nodes.

I observed that every request for collection B that the load balancer sent to a 
node with (only) slices for collection A got proxied to one *specific* node 
hosting a slice for collection B. This node started running pretty hot, for 
obvious reasons.

This meant that one specific node was handling the fan-out for slightly more 
than two-thirds of the requests against collection B.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally

2015-04-30 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-7493:
--
Labels:   (was: pat)

 Requests aren't distributed evenly if the collection isn't present locally
 --

 Key: SOLR-7493
 URL: https://issues.apache.org/jira/browse/SOLR-7493
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Jeff Wartes
 Attachments: SOLR-7493.patch


 I had a SolrCloud cluster where every node is behind a simple round-robin 
 load balancer.
 This cluster had two collections (A, B), and the slices of each were 
 partitioned such that one collection (A) used two thirds of the nodes, and 
 the other collection (B) used the remaining third of the nodes.
 I observed that every request for collection B that the load balancer sent to 
 a node with (only) slices for collection A got proxied to one *specific* node 
 hosting a slice for collection B. This node started running pretty hot, for 
 obvious reasons.
 This meant that one specific node was handling the fan-out for slightly more 
 than two-thirds of the requests against collection B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally

2015-04-30 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-7493:
--
Attachment: SOLR-7493.patch

It looks like this happens because SolrDispatchFilter's getRemoteCoreURL 
eventually takes the first viable entry from a HashMap.values list of cores. 

HashMap.values ordering is always the same, if you load the HashMap with the 
same data in the same order. So if the list from ZK is presented in the same 
order on every node, every node will use the same ordering on every request.

There might be a better solution, but this patch would randomize that ordering 
per-request. 
My environment is a bit messed up at the moment, so I haven't done much more 
than verify this compiles.

 Requests aren't distributed evenly if the collection isn't present locally
 --

 Key: SOLR-7493
 URL: https://issues.apache.org/jira/browse/SOLR-7493
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Jeff Wartes
 Attachments: SOLR-7493.patch


 I had a SolrCloud cluster where every node is behind a simple round-robin 
 load balancer.
 This cluster had two collections (A, B), and the slices of each were 
 partitioned such that one collection (A) used two thirds of the nodes, and 
 the other collection (B) used the remaining third of the nodes.
 I observed that every request for collection B that the load balancer sent to 
 a node with (only) slices for collection A got proxied to one *specific* node 
 hosting a slice for collection B. This node started running pretty hot, for 
 obvious reasons.
 This meant that one specific node was handling the fan-out for slightly more 
 than two-thirds of the requests against collection B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7493) Requests aren't distributed evenly if the collection isn't present locally

2015-04-30 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-7493:
--
Labels: pat  (was: )

 Requests aren't distributed evenly if the collection isn't present locally
 --

 Key: SOLR-7493
 URL: https://issues.apache.org/jira/browse/SOLR-7493
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
Reporter: Jeff Wartes
  Labels: pat
 Attachments: SOLR-7493.patch


 I had a SolrCloud cluster where every node is behind a simple round-robin 
 load balancer.
 This cluster had two collections (A, B), and the slices of each were 
 partitioned such that one collection (A) used two thirds of the nodes, and 
 the other collection (B) used the remaining third of the nodes.
 I observed that every request for collection B that the load balancer sent to 
 a node with (only) slices for collection A got proxied to one *specific* node 
 hosting a slice for collection B. This node started running pretty hot, for 
 obvious reasons.
 This meant that one specific node was handling the fan-out for slightly more 
 than two-thirds of the requests against collection B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2015-04-23 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509521#comment-14509521
 ] 

Jeff Wartes commented on SOLR-5170:
---


I got tired of maintaining a custom solr build process for the sole purpose of 
this patch at my work, especially given the deployment changes in Solr 5.0.
Since this patch really just adds new classes, I pulled those files out into a 
freestanding repository that builds a jar, copied the necessary infrastructure 
to allow the tests to run, and posted that here:

https://github.com/randomstatistic/SOLR-5170

This repo contains the necessary API changes to the patch to support Solr 5.0. 
I have not bothered to update the patch in Jira here with those changes, and 
going forward, I'll probably continue to only push changes to that repo unless 
someone asks otherwise.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-06-30 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-5170:
--

Attachment: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch

Updated to work with Solr 4.9  LUCENE-5703.

Any chance of realtime-friendly multi-value distance sorting getting into the 
mainline anytime soon? I've been building with this patch for getting close to 
a year now.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5917) Allow dismax wildcard field specifications

2014-03-26 Thread Jeff Wartes (JIRA)
Jeff Wartes created SOLR-5917:
-

 Summary: Allow dismax wildcard field specifications
 Key: SOLR-5917
 URL: https://issues.apache.org/jira/browse/SOLR-5917
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Jeff Wartes
Priority: Minor


The dynamic field schema specification is handy for when you want a bunch of 
fields of a given type, but don't know how many there will be.

You do currently need to know how many there will be (and the exact names) if 
you want to query them, however.

If edismax supported a similar wildcard specification like qf=dynfield_*, 
this would allow easy search across a given field type. It would also provide a 
convenient alternative to multi-value fields without the fieldNorm implications 
of having multiple values in a single field.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Wartes updated SOLR-5170:
--

Attachment: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt

Adds recipDistance scoring, lat/long is one param.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864738#comment-13864738
 ] 

Jeff Wartes commented on SOLR-5170:
---

I've been using this patch with some minor tweaks and solr 4.3.1 in production 
for about six months now. Since I was applying it again against 4.6 this 
morning, I figured I should attach my tweaks, and mention it passes tests 
against 4.6.

This does NOT address the design issues David raises in the initial comment. 
The changes vs the initial patchfile allow it to be applied against a greater 
range of solr versions, and brings it a little closer to feeling the same as 
geofilt's params.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org