from:"Sergey Sazonov"

Re: Programmatic restructuring of a Solr cloud

2011-05-06 Thread Sergey Sazonov

Hello Jan,

Thank you very much for the answer. Unfortunately, we don't use Amazon,
and I doubt we will be able to persuade the customer to switch to it.
Moreover, the amount of data will not allow us to store everything on a
single master. However, having considered your design I am starting to
see the problem in a new light, so maybe it will still prove helpful ;)

In the meanwhile, I'm still looking for other solutions...

Best regards,
Sergey Sazonov.

On 05/05/11 15:07, Jan Høydahl wrote:

Hi,

One approach if you're using Amazon is using BeanStalk

* Create one master with 12 cores, named jan, feb, mar etc
* Every month, you clear the current month index and switch indexing to it
You will only have one master, because you're only indexing to one month at
a time
* For each of the 12 months, setup an Amazon BeanStalk instance with a Solr
replica pointing to its master
This way, Amazon will spin off replicas as needed
NOTE: Your replica could still be located at /solr/select even if it
replicates from /solr/may/replication
* You only query the replicas, and the client will control whether to query one
or more shards

shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr

After this is setup, you have 0 config to worry about :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 14.03, Sergey Sazonov wrote:

Dear Solr Experts,

First of all, I would like to thank you for your patience when answering
questions of those who are less experienced.

And now to the main topic: I would like to learn whether it is possible to
restructure a Solr cloud programmatically.

Let me describe the system we are designing to make the requirements clear. The
indexed documents are certain log entries. We are planning to shard them by
month, and only keep the last 12 months in the index. We are going to replicate
each shard across several servers.

Now, the user is always required to search within a single month (= shard). Most
importantly, we expect an absolute majority of the requests to query the current month,
with only a minor load on the previous months. In order to utilise the cluster most
efficiently, we would like a majority of the servers to contain replicas of the current
month data, and have only one or two servers per older month. To this end, we are
planning to have a set of slaves that migrate from master to master,
depending on which master holds the data for the current month. When a new month starts,
those slaves have to be reconfigured to hold the new shard and to replicate from the new
master (their old master now holding the data for the previous month).

Since this operation has to be done every month, we are naturally considering
automating it. So my question is whether anyone has faced a similar problem
before, and what is the best way to solve it. We are not committed to any
solution, or even architecture, so feel free to propose different solutions.
The only requirement is that a majority of the servers should be able to serve
requests to the current month at any given moment.

Thank you in advance for your answers.

Best regards,
Sergey Sazonov.

Programmatic restructuring of a Solr cloud

2011-05-05 Thread Sergey Sazonov


Dear Solr Experts,

First of all, I would like to thank you for your patience when answering 
questions of those who are less experienced.


And now to the main topic: I would like to learn whether it is possible 
to restructure a Solr cloud programmatically.


Let me describe the system we are designing to make the requirements 
clear. The indexed documents are certain log entries. We are planning to 
shard them by month, and only keep the last 12 months in the index. We 
are going to replicate each shard across several servers.


Now, the user is always required to search within a single month (= 
shard). Most importantly, we expect an absolute majority of the requests 
to query the current month, with only a minor load on the previous 
months. In order to utilise the cluster most efficiently, we would like 
a majority of the servers to contain replicas of the current month data, 
and have only one or two servers per older month. To this end, we are 
planning to have a set of slaves that migrate from master to master, 
depending on which master holds the data for the current month. When a 
new month starts, those slaves have to be reconfigured to hold the new 
shard and to replicate from the new master (their old master now holding 
the data for the previous month).


Since this operation has to be done every month, we are naturally 
considering automating it. So my question is whether anyone has faced a 
similar problem before, and what is the best way to solve it. We are not 
committed to any solution, or even architecture, so feel free to propose 
different solutions. The only requirement is that a majority of the 
servers should be able to serve requests to the current month at any 
given moment.


Thank you in advance for your answers.

Best regards,
Sergey Sazonov.

Re: Programmatic restructuring of a Solr cloud

Programmatic restructuring of a Solr cloud

2 matches

Site Navigation

Mail list logo

Footer information