Re: How to make SolrCloud more elastic

2015-02-13 Thread Otis Gospodnetic
Hi Matt,

See:
http://search-lucene.com/?q=query+routing&fc_project=Solr
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Feb 12, 2015 at 2:09 PM, Matt Kuiper  wrote:

> Otis,
>
> Thanks for your reply.  I see your point about too many shards and search
> efficiency.  I also agree that I need to get a better handle on customer
> requirements and expected loads.
>
> Initially I figured that with the shard splitting option, I would need to
> double my Solr nodes every time I split (as I would want to split every
> shard within the collection).  Where actually only the number of shards
> would double, and then I would have the opportunity to rebalance the shards
> over the existing Solr nodes plus a number of new nodes that make sense at
> the time.  This may be preferable to defining many micro shards up front.
>
> The time-base collections may be an option for this project.  I am not
> familiar with query routing, can you point me to any documentation on how
> this might be implemented?
>
> Thanks,
> Matt
>
> -Original Message-
> From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
> Sent: Wednesday, February 11, 2015 9:13 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to make SolrCloud more elastic
>
> Hi Matt,
>
> You could create extra shards up front, but if your queries are fanned out
> to all of them, you can run into situations where there are too many
> concurrent queries per node causing lots of content switching and
> ultimately being less efficient than if you had fewer shards.  So while
> this is an approach to take, I'd personally first try to run tests to see
> how much a single node can handle in terms of volume, expected query rates,
> and target latency, and then use monitoring/alerting/whatever-helps tools
> to keep an eye on the cluster so that when you start approaching the target
> limits you are ready with additional nodes and shard splitting if needed.
>
> Of course, if your data and queries are such that newer documents are
> queries   more, you should look into time-based collections... and if your
> queries can only query a subset of data you should look into query routing.
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper 
> wrote:
>
> > I am starting a new project and one of the requirements is that Solr
> > must scale to handle increasing load (both search performance and index
> size).
> >
> > My understanding is that one way to address search performance is by
> > adding more replicas.
> >
> > I am more concerned about handling a growing index size.  I have
> > already been given some good input on this topic and am considering a
> > shard splitting approach, but am more focused on a rebalancing
> > approach that includes defining many shards up front and then moving
> > these existing shards on to new Solr servers as needed.  Plan to
> > experiment with this approach first.
> >
> > Before I got too deep, I wondered if anyone has any tips or warnings
> > on these approaches, or has scaled Solr in a different manner.
> >
> > Thanks,
> > Matt
> >
>


RE: How to make SolrCloud more elastic

2015-02-12 Thread Toke Eskildsen
Matt Kuiper [matt.kui...@issinc.com] wrote:
> Thanks for your reply.  Yes, I believe I will be working with a write
> once archive.  However, my understanding is that all shards are
> defined up front, with the option to split later.

Our situation might be a bit special as a few minutes downtime - preferably at 
off-peak hours - now and then is acceptable.

We basically maintain a SolrCloud with static shards and use a completely 
separate builder to generate new shards, one at a time. When the builder has 
finished a shard, we add it to the cloud the hard way (re-configuration and 
restarting, hence the downtime). There's a description at 
https://sbdevel.wordpress.com/net-archive-search/

To avoid too much ZooKeeper hassle, we have a bunch of empty shards, ready to 
be switched with newly build ones. We have contemplated making the shard under 
construction being part of the Solrcloud, but have yet to experiment with that 
setup.

Static shards, optimized down to a single segment and using DocValues for 
faceting is a very potent mix: A Solr serving a non-static index needs more 
memory as it must be capable of handling having more than one version of the 
index open at a time, plus the indexing itself. Faceting on many unique values 
is more efficient with single-segment as there is no need for an internal 
structure mapping the terms between the segments.

- Toke Eskildsen


RE: How to make SolrCloud more elastic

2015-02-12 Thread Matt Kuiper
Toke,

Thanks for your reply.  Yes, I believe I will be working with a write once 
archive.  However, my understanding is that all shards are defined up front, 
with the option to split later.

Can you describe, or point me to documentation, on how to create shards one at 
a time?  

Thanks,
Matt

-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: Wednesday, February 11, 2015 11:47 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make SolrCloud more elastic

On Wed, 2015-02-11 at 21:32 +0100, Matt Kuiper wrote:
> I am starting a new project and one of the requirements is that Solr 
> must scale to handle increasing load (both search performance and 
> index size).

[...]

> Before I got too deep, I wondered if anyone has any tips or warnings 
> on these approaches, or has scaled Solr in a different manner.

If your corpus only contains static content (e.e. log files or a write-once 
archive), you can create shards one at a time and optimize them. This lowers 
requirements for your searchers.

- Toke Eskildsen, State and University Library, Denmark




RE: How to make SolrCloud more elastic

2015-02-12 Thread Matt Kuiper
Otis,

Thanks for your reply.  I see your point about too many shards and search 
efficiency.  I also agree that I need to get a better handle on customer 
requirements and expected loads.  

Initially I figured that with the shard splitting option, I would need to 
double my Solr nodes every time I split (as I would want to split every shard 
within the collection).  Where actually only the number of shards would double, 
and then I would have the opportunity to rebalance the shards over the existing 
Solr nodes plus a number of new nodes that make sense at the time.  This may be 
preferable to defining many micro shards up front.

The time-base collections may be an option for this project.  I am not familiar 
with query routing, can you point me to any documentation on how this might be 
implemented?

Thanks,
Matt

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Wednesday, February 11, 2015 9:13 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make SolrCloud more elastic

Hi Matt,

You could create extra shards up front, but if your queries are fanned out to 
all of them, you can run into situations where there are too many concurrent 
queries per node causing lots of content switching and ultimately being less 
efficient than if you had fewer shards.  So while this is an approach to take, 
I'd personally first try to run tests to see how much a single node can handle 
in terms of volume, expected query rates, and target latency, and then use 
monitoring/alerting/whatever-helps tools to keep an eye on the cluster so that 
when you start approaching the target limits you are ready with additional 
nodes and shard splitting if needed.

Of course, if your data and queries are such that newer documents are queries   
more, you should look into time-based collections... and if your queries can 
only query a subset of data you should look into query routing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & 
Elasticsearch Support * http://sematext.com/


On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper  wrote:

> I am starting a new project and one of the requirements is that Solr 
> must scale to handle increasing load (both search performance and index size).
>
> My understanding is that one way to address search performance is by 
> adding more replicas.
>
> I am more concerned about handling a growing index size.  I have 
> already been given some good input on this topic and am considering a 
> shard splitting approach, but am more focused on a rebalancing 
> approach that includes defining many shards up front and then moving 
> these existing shards on to new Solr servers as needed.  Plan to 
> experiment with this approach first.
>
> Before I got too deep, I wondered if anyone has any tips or warnings 
> on these approaches, or has scaled Solr in a different manner.
>
> Thanks,
> Matt
>


RE: How to make SolrCloud more elastic

2015-02-12 Thread Matt Kuiper
Thanks Alex. Per your recommendation I checked out the presentation and it was 
very informative.

While my problem space will not reach the scale addressed in this talk, some of 
the topics may be helpful.  Those being the improvements to shard splitting and 
the new 'migrate' API.

Thanks,
Matt

Matt Kuiper - Software Engineer
Intelligent Software Solutions
p. 719.452.7721 | matt.kui...@issinc.com 
www.issinc.com | LinkedIn: intelligent-software-solutions

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Wednesday, February 11, 2015 2:31 PM
To: solr-user
Subject: Re: How to make SolrCloud more elastic

Did you have a look at the presentations from the recent SolrRevolution? E.g.
https://www.youtube.com/watch?v=nxRROble76A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 11 February 2015 at 15:32, Matt Kuiper  wrote:
> I am starting a new project and one of the requirements is that Solr must 
> scale to handle increasing load (both search performance and index size).
>
> My understanding is that one way to address search performance is by adding 
> more replicas.
>
> I am more concerned about handling a growing index size.  I have already been 
> given some good input on this topic and am considering a shard splitting 
> approach, but am more focused on a rebalancing approach that includes 
> defining many shards up front and then moving these existing shards on to new 
> Solr servers as needed.  Plan to experiment with this approach first.
>
> Before I got too deep, I wondered if anyone has any tips or warnings on these 
> approaches, or has scaled Solr in a different manner.
>
> Thanks,
> Matt


Re: How to make SolrCloud more elastic

2015-02-11 Thread Toke Eskildsen
On Wed, 2015-02-11 at 21:32 +0100, Matt Kuiper wrote:
> I am starting a new project and one of the requirements is that Solr
> must scale to handle increasing load (both search performance and index
> size).

[...]

> Before I got too deep, I wondered if anyone has any tips or warnings on
> these approaches, or has scaled Solr in a different manner.

If your corpus only contains static content (e.e. log files or a
write-once archive), you can create shards one at a time and optimize
them. This lowers requirements for your searchers.

- Toke Eskildsen, State and University Library, Denmark




Re: How to make SolrCloud more elastic

2015-02-11 Thread Otis Gospodnetic
Hi Matt,

You could create extra shards up front, but if your queries are fanned out
to all of them, you can run into situations where there are too many
concurrent queries per node causing lots of content switching and
ultimately being less efficient than if you had fewer shards.  So while
this is an approach to take, I'd personally first try to run tests to see
how much a single node can handle in terms of volume, expected query rates,
and target latency, and then use monitoring/alerting/whatever-helps tools
to keep an eye on the cluster so that when you start approaching the target
limits you are ready with additional nodes and shard splitting if needed.

Of course, if your data and queries are such that newer documents are
queries more, you should look into time-based collections... and if your
queries can only query a subset of data you should look into query routing.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Feb 11, 2015 at 3:32 PM, Matt Kuiper  wrote:

> I am starting a new project and one of the requirements is that Solr must
> scale to handle increasing load (both search performance and index size).
>
> My understanding is that one way to address search performance is by
> adding more replicas.
>
> I am more concerned about handling a growing index size.  I have already
> been given some good input on this topic and am considering a shard
> splitting approach, but am more focused on a rebalancing approach that
> includes defining many shards up front and then moving these existing
> shards on to new Solr servers as needed.  Plan to experiment with this
> approach first.
>
> Before I got too deep, I wondered if anyone has any tips or warnings on
> these approaches, or has scaled Solr in a different manner.
>
> Thanks,
> Matt
>


Re: How to make SolrCloud more elastic

2015-02-11 Thread Alexandre Rafalovitch
Did you have a look at the presentations from the recent
SolrRevolution? E.g.
https://www.youtube.com/watch?v=nxRROble76A&list=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 11 February 2015 at 15:32, Matt Kuiper  wrote:
> I am starting a new project and one of the requirements is that Solr must 
> scale to handle increasing load (both search performance and index size).
>
> My understanding is that one way to address search performance is by adding 
> more replicas.
>
> I am more concerned about handling a growing index size.  I have already been 
> given some good input on this topic and am considering a shard splitting 
> approach, but am more focused on a rebalancing approach that includes 
> defining many shards up front and then moving these existing shards on to new 
> Solr servers as needed.  Plan to experiment with this approach first.
>
> Before I got too deep, I wondered if anyone has any tips or warnings on these 
> approaches, or has scaled Solr in a different manner.
>
> Thanks,
> Matt