Re: Programmatic restructuring of a Solr cloud

2011-05-06 Thread Sergey Sazonov

Hello Jan,

Thank you very much for the answer. Unfortunately, we don't use Amazon, 
and I doubt we will be able to persuade the customer to switch to it. 
Moreover, the amount of data will not allow us to store everything on a 
single master. However, having considered your design I am starting to 
see the problem in a new light, so maybe it will still prove helpful ;)


In the meanwhile, I'm still looking for other solutions...

Best regards,
Sergey Sazonov.

On 05/05/11 15:07, Jan Høydahl wrote:

Hi,

One approach if you're using Amazon is using BeanStalk

* Create one master with 12 cores, named jan, feb, mar etc
* Every month, you clear the current month index and switch indexing to it
   You will only have one master, because you're only indexing to one month at 
a time
* For each of the 12 months, setup an Amazon BeanStalk instance with a Solr 
replica pointing to its master
   This way, Amazon will spin off replicas as needed
   NOTE: Your replica could still be located at /solr/select even if it 
replicates from /solr/may/replication
* You only query the replicas, and the client will control whether to query one 
or more shards
   
shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr

After this is setup, you have 0 config to worry about :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 14.03, Sergey Sazonov wrote:


Dear Solr Experts,

First of all, I would like to thank you for your patience when answering 
questions of those who are less experienced.

And now to the main topic: I would like to learn whether it is possible to 
restructure a Solr cloud programmatically.

Let me describe the system we are designing to make the requirements clear. The 
indexed documents are certain log entries. We are planning to shard them by 
month, and only keep the last 12 months in the index. We are going to replicate 
each shard across several servers.

Now, the user is always required to search within a single month (= shard). Most 
importantly, we expect an absolute majority of the requests to query the current month, 
with only a minor load on the previous months. In order to utilise the cluster most 
efficiently, we would like a majority of the servers to contain replicas of the current 
month data, and have only one or two servers per older month. To this end, we are 
planning to have a set of slaves that migrate from master to master, 
depending on which master holds the data for the current month. When a new month starts, 
those slaves have to be reconfigured to hold the new shard and to replicate from the new 
master (their old master now holding the data for the previous month).

Since this operation has to be done every month, we are naturally considering 
automating it. So my question is whether anyone has faced a similar problem 
before, and what is the best way to solve it. We are not committed to any 
solution, or even architecture, so feel free to propose different solutions. 
The only requirement is that a majority of the servers should be able to serve 
requests to the current month at any given moment.

Thank you in advance for your answers.

Best regards,
Sergey Sazonov.




Programmatic restructuring of a Solr cloud

2011-05-05 Thread Sergey Sazonov

Dear Solr Experts,

First of all, I would like to thank you for your patience when answering 
questions of those who are less experienced.


And now to the main topic: I would like to learn whether it is possible 
to restructure a Solr cloud programmatically.


Let me describe the system we are designing to make the requirements 
clear. The indexed documents are certain log entries. We are planning to 
shard them by month, and only keep the last 12 months in the index. We 
are going to replicate each shard across several servers.


Now, the user is always required to search within a single month (= 
shard). Most importantly, we expect an absolute majority of the requests 
to query the current month, with only a minor load on the previous 
months. In order to utilise the cluster most efficiently, we would like 
a majority of the servers to contain replicas of the current month data, 
and have only one or two servers per older month. To this end, we are 
planning to have a set of slaves that migrate from master to master, 
depending on which master holds the data for the current month. When a 
new month starts, those slaves have to be reconfigured to hold the new 
shard and to replicate from the new master (their old master now holding 
the data for the previous month).


Since this operation has to be done every month, we are naturally 
considering automating it. So my question is whether anyone has faced a 
similar problem before, and what is the best way to solve it. We are not 
committed to any solution, or even architecture, so feel free to propose 
different solutions. The only requirement is that a majority of the 
servers should be able to serve requests to the current month at any 
given moment.


Thank you in advance for your answers.

Best regards,
Sergey Sazonov.