Re: How to make SolrCloud more elastic
Hi Matt, See: http://search-lucene.com/?q=query+routing&fc_project=Solr https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Thu, Feb 12, 2015 at 2:09 PM, Matt Kuiper wrote: > Otis, > > Thanks for your reply. I see your point about too many shards and search > efficiency. I also agree that I need to get a better handle on customer > requirements and expected loads. > > Initially I figured that with the shard splitting option, I would need to > double my Solr nodes every time I split (as I would want to split every > shard within the collection). Where actually only the number of shards > would double, and then I would have the opportunity to rebalance the shards > over the existing Solr nodes plus a number of new nodes that make sense at > the time. This may be preferable to defining many micro shards up front. > > The time-base collections may be an option for this project. I am not > familiar with query routing, can you point me to any documentation on how > this might be implemented? > > Thanks, > Matt > > -Original Message- > From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] > Sent: Wednesday, February 11, 2015 9:13 PM > To: solr-user@lucene.apache.org > Subject: Re: How to make SolrCloud more elastic > > Hi Matt, > > You could create extra shards up front, but if your queries are fanned out > to all of them, you can run into situations where there are too many > concurrent queries per node causing lots of content switching and > ultimately being less efficient than if you had fewer shards. So while > this is an approach to take, I'd personally first try to run tests to see > how much a single node can handle in terms of volume, expected query rates, > and target latency, and then use monitoring/alerting/whatever-helps tools > to keep an eye on the cluster so that when you start approaching the target > limits you are ready with additional nodes and shard splitting if needed. > > Of course, if your data and queries are such that newer documents are > queries more, you should look into time-based collections... and if your > queries can only query a subset of data you should look into query routing. > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper > wrote: > > > I am starting a new project and one of the requirements is that Solr > > must scale to handle increasing load (both search performance and index > size). > > > > My understanding is that one way to address search performance is by > > adding more replicas. > > > > I am more concerned about handling a growing index size. I have > > already been given some good input on this topic and am considering a > > shard splitting approach, but am more focused on a rebalancing > > approach that includes defining many shards up front and then moving > > these existing shards on to new Solr servers as needed. Plan to > > experiment with this approach first. > > > > Before I got too deep, I wondered if anyone has any tips or warnings > > on these approaches, or has scaled Solr in a different manner. > > > > Thanks, > > Matt > > >
RE: How to make SolrCloud more elastic
Matt Kuiper [matt.kui...@issinc.com] wrote: > Thanks for your reply. Yes, I believe I will be working with a write > once archive. However, my understanding is that all shards are > defined up front, with the option to split later. Our situation might be a bit special as a few minutes downtime - preferably at off-peak hours - now and then is acceptable. We basically maintain a SolrCloud with static shards and use a completely separate builder to generate new shards, one at a time. When the builder has finished a shard, we add it to the cloud the hard way (re-configuration and restarting, hence the downtime). There's a description at https://sbdevel.wordpress.com/net-archive-search/ To avoid too much ZooKeeper hassle, we have a bunch of empty shards, ready to be switched with newly build ones. We have contemplated making the shard under construction being part of the Solrcloud, but have yet to experiment with that setup. Static shards, optimized down to a single segment and using DocValues for faceting is a very potent mix: A Solr serving a non-static index needs more memory as it must be capable of handling having more than one version of the index open at a time, plus the indexing itself. Faceting on many unique values is more efficient with single-segment as there is no need for an internal structure mapping the terms between the segments. - Toke Eskildsen
RE: How to make SolrCloud more elastic
Toke, Thanks for your reply. Yes, I believe I will be working with a write once archive. However, my understanding is that all shards are defined up front, with the option to split later. Can you describe, or point me to documentation, on how to create shards one at a time? Thanks, Matt -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Wednesday, February 11, 2015 11:47 PM To: solr-user@lucene.apache.org Subject: Re: How to make SolrCloud more elastic On Wed, 2015-02-11 at 21:32 +0100, Matt Kuiper wrote: > I am starting a new project and one of the requirements is that Solr > must scale to handle increasing load (both search performance and > index size). [...] > Before I got too deep, I wondered if anyone has any tips or warnings > on these approaches, or has scaled Solr in a different manner. If your corpus only contains static content (e.e. log files or a write-once archive), you can create shards one at a time and optimize them. This lowers requirements for your searchers. - Toke Eskildsen, State and University Library, Denmark
RE: How to make SolrCloud more elastic
Otis, Thanks for your reply. I see your point about too many shards and search efficiency. I also agree that I need to get a better handle on customer requirements and expected loads. Initially I figured that with the shard splitting option, I would need to double my Solr nodes every time I split (as I would want to split every shard within the collection). Where actually only the number of shards would double, and then I would have the opportunity to rebalance the shards over the existing Solr nodes plus a number of new nodes that make sense at the time. This may be preferable to defining many micro shards up front. The time-base collections may be an option for this project. I am not familiar with query routing, can you point me to any documentation on how this might be implemented? Thanks, Matt -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Wednesday, February 11, 2015 9:13 PM To: solr-user@lucene.apache.org Subject: Re: How to make SolrCloud more elastic Hi Matt, You could create extra shards up front, but if your queries are fanned out to all of them, you can run into situations where there are too many concurrent queries per node causing lots of content switching and ultimately being less efficient than if you had fewer shards. So while this is an approach to take, I'd personally first try to run tests to see how much a single node can handle in terms of volume, expected query rates, and target latency, and then use monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that when you start approaching the target limits you are ready with additional nodes and shard splitting if needed. Of course, if your data and queries are such that newer documents are queries more, you should look into time-based collections... and if your queries can only query a subset of data you should look into query routing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper wrote: > I am starting a new project and one of the requirements is that Solr > must scale to handle increasing load (both search performance and index size). > > My understanding is that one way to address search performance is by > adding more replicas. > > I am more concerned about handling a growing index size. I have > already been given some good input on this topic and am considering a > shard splitting approach, but am more focused on a rebalancing > approach that includes defining many shards up front and then moving > these existing shards on to new Solr servers as needed. Plan to > experiment with this approach first. > > Before I got too deep, I wondered if anyone has any tips or warnings > on these approaches, or has scaled Solr in a different manner. > > Thanks, > Matt >
RE: How to make SolrCloud more elastic
Thanks Alex. Per your recommendation I checked out the presentation and it was very informative. While my problem space will not reach the scale addressed in this talk, some of the topics may be helpful. Those being the improvements to shard splitting and the new 'migrate' API. Thanks, Matt Matt Kuiper - Software Engineer Intelligent Software Solutions p. 719.452.7721 | matt.kui...@issinc.com www.issinc.com | LinkedIn: intelligent-software-solutions -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Wednesday, February 11, 2015 2:31 PM To: solr-user Subject: Re: How to make SolrCloud more elastic Did you have a look at the presentations from the recent SolrRevolution? E.g. https://www.youtube.com/watch?v=nxRROble76A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 11 February 2015 at 15:32, Matt Kuiper wrote: > I am starting a new project and one of the requirements is that Solr must > scale to handle increasing load (both search performance and index size). > > My understanding is that one way to address search performance is by adding > more replicas. > > I am more concerned about handling a growing index size. I have already been > given some good input on this topic and am considering a shard splitting > approach, but am more focused on a rebalancing approach that includes > defining many shards up front and then moving these existing shards on to new > Solr servers as needed. Plan to experiment with this approach first. > > Before I got too deep, I wondered if anyone has any tips or warnings on these > approaches, or has scaled Solr in a different manner. > > Thanks, > Matt
Re: How to make SolrCloud more elastic
On Wed, 2015-02-11 at 21:32 +0100, Matt Kuiper wrote: > I am starting a new project and one of the requirements is that Solr > must scale to handle increasing load (both search performance and index > size). [...] > Before I got too deep, I wondered if anyone has any tips or warnings on > these approaches, or has scaled Solr in a different manner. If your corpus only contains static content (e.e. log files or a write-once archive), you can create shards one at a time and optimize them. This lowers requirements for your searchers. - Toke Eskildsen, State and University Library, Denmark
Re: How to make SolrCloud more elastic
Hi Matt, You could create extra shards up front, but if your queries are fanned out to all of them, you can run into situations where there are too many concurrent queries per node causing lots of content switching and ultimately being less efficient than if you had fewer shards. So while this is an approach to take, I'd personally first try to run tests to see how much a single node can handle in terms of volume, expected query rates, and target latency, and then use monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that when you start approaching the target limits you are ready with additional nodes and shard splitting if needed. Of course, if your data and queries are such that newer documents are queries more, you should look into time-based collections... and if your queries can only query a subset of data you should look into query routing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper wrote: > I am starting a new project and one of the requirements is that Solr must > scale to handle increasing load (both search performance and index size). > > My understanding is that one way to address search performance is by > adding more replicas. > > I am more concerned about handling a growing index size. I have already > been given some good input on this topic and am considering a shard > splitting approach, but am more focused on a rebalancing approach that > includes defining many shards up front and then moving these existing > shards on to new Solr servers as needed. Plan to experiment with this > approach first. > > Before I got too deep, I wondered if anyone has any tips or warnings on > these approaches, or has scaled Solr in a different manner. > > Thanks, > Matt >
Re: How to make SolrCloud more elastic
Did you have a look at the presentations from the recent SolrRevolution? E.g. https://www.youtube.com/watch?v=nxRROble76A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 11 February 2015 at 15:32, Matt Kuiper wrote: > I am starting a new project and one of the requirements is that Solr must > scale to handle increasing load (both search performance and index size). > > My understanding is that one way to address search performance is by adding > more replicas. > > I am more concerned about handling a growing index size. I have already been > given some good input on this topic and am considering a shard splitting > approach, but am more focused on a rebalancing approach that includes > defining many shards up front and then moving these existing shards on to new > Solr servers as needed. Plan to experiment with this approach first. > > Before I got too deep, I wondered if anyone has any tips or warnings on these > approaches, or has scaled Solr in a different manner. > > Thanks, > Matt