Master /Slave Architecture3.6.1
Hi, Our current architecture is as follows , - Single server [ On which we do both Indexing and Searching] - Solr version 3.6.1 Multicores - We have several small big indexes as cores within a webapp - Our Indexing to the individual cores happen via an index queue ,due to which at any given time ,we are indexing only to one or at most 2 cores - Also we processing our pdf's and html files externally to text files before feeding it to solr We are planning to move to the AWS using 3.6.1 and would want to - Separate the Indexing and Searching to separate servers as master /slave .This is mainly so that the both the activities are not competing for resources - Also to use Tika to process pdf and also to process html files directly via solr ,which might increase the CPU load. - But ,if I set up so that all Indexing request are going to one server sequentially and each core in slave polls the master core for index changes ,and then issues a commit to load a new index reader,then all this activity might happen in parallel which will actually spike the CPU activity on slave and hence will degrade the search performance? Is this assumption correct?Then is there any advantage other than availability to this architecture ,any advice on this? Regards Sujatha
Re: Master /Slave Architecture3.6.1
Hi, You are going in the right direction and your assumptions are correct. In short, if the performance hit is too big then you simply need more ec2 instances (some have high cpu, some memory, some disk IO ... pick wisely). Otis Solr ElasticSearch Support http://sematext.com/ On Jan 10, 2013 4:44 AM, Sujatha Arun suja.a...@gmail.com wrote: Hi, Our current architecture is as follows , - Single server [ On which we do both Indexing and Searching] - Solr version 3.6.1 Multicores - We have several small big indexes as cores within a webapp - Our Indexing to the individual cores happen via an index queue ,due to which at any given time ,we are indexing only to one or at most 2 cores - Also we processing our pdf's and html files externally to text files before feeding it to solr We are planning to move to the AWS using 3.6.1 and would want to - Separate the Indexing and Searching to separate servers as master /slave .This is mainly so that the both the activities are not competing for resources - Also to use Tika to process pdf and also to process html files directly via solr ,which might increase the CPU load. - But ,if I set up so that all Indexing request are going to one server sequentially and each core in slave polls the master core for index changes ,and then issues a commit to load a new index reader,then all this activity might happen in parallel which will actually spike the CPU activity on slave and hence will degrade the search performance? Is this assumption correct?Then is there any advantage other than availability to this architecture ,any advice on this? Regards Sujatha
Re: Master /Slave Architecture3.6.1
Thanks,Otis.. But then what exactly is the advantage for a master slave architecture for multicore ,when replication has the same effect as that of a commit and if I am going to have worse performance by moving to master/ slave over a single server with sequential indexing?Am I missing anything? Would it make sense to have each server act as both master and slave and LB the indexing and searching requests to both servers? Regards, Sujatha On Thu, Jan 10, 2013 at 8:41 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You are going in the right direction and your assumptions are correct. In short, if the performance hit is too big then you simply need more ec2 instances (some have high cpu, some memory, some disk IO ... pick wisely). Otis Solr ElasticSearch Support http://sematext.com/ On Jan 10, 2013 4:44 AM, Sujatha Arun suja.a...@gmail.com wrote: Hi, Our current architecture is as follows , - Single server [ On which we do both Indexing and Searching] - Solr version 3.6.1 Multicores - We have several small big indexes as cores within a webapp - Our Indexing to the individual cores happen via an index queue ,due to which at any given time ,we are indexing only to one or at most 2 cores - Also we processing our pdf's and html files externally to text files before feeding it to solr We are planning to move to the AWS using 3.6.1 and would want to - Separate the Indexing and Searching to separate servers as master /slave .This is mainly so that the both the activities are not competing for resources - Also to use Tika to process pdf and also to process html files directly via solr ,which might increase the CPU load. - But ,if I set up so that all Indexing request are going to one server sequentially and each core in slave polls the master core for index changes ,and then issues a commit to load a new index reader,then all this activity might happen in parallel which will actually spike the CPU activity on slave and hence will degrade the search performance? Is this assumption correct?Then is there any advantage other than availability to this architecture ,any advice on this? Regards Sujatha
Re: Master /Slave Architecture3.6.1
In the end, the best advice is try it. You'll save the effort of indexing with this master/slave setup, but you'll still need to warm your caches on each slave, which is a reasonable portion of the work done on a commit. However, with a master/slave setup, you get the option to go to two slaves, or three, etc as demand increases - and you can put them all behind an elastic load balancer, and scale easily. You may have multiple cores on your Solr system, but note that servers have multiple CPUs, so two simultaneous replication requests needn't be a disaster. Upayavira On Thu, Jan 10, 2013, at 05:50 PM, Sujatha Arun wrote: Thanks,Otis.. But then what exactly is the advantage for a master slave architecture for multicore ,when replication has the same effect as that of a commit and if I am going to have worse performance by moving to master/ slave over a single server with sequential indexing?Am I missing anything? Would it make sense to have each server act as both master and slave and LB the indexing and searching requests to both servers? Regards, Sujatha On Thu, Jan 10, 2013 at 8:41 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You are going in the right direction and your assumptions are correct. In short, if the performance hit is too big then you simply need more ec2 instances (some have high cpu, some memory, some disk IO ... pick wisely). Otis Solr ElasticSearch Support http://sematext.com/ On Jan 10, 2013 4:44 AM, Sujatha Arun suja.a...@gmail.com wrote: Hi, Our current architecture is as follows , - Single server [ On which we do both Indexing and Searching] - Solr version 3.6.1 Multicores - We have several small big indexes as cores within a webapp - Our Indexing to the individual cores happen via an index queue ,due to which at any given time ,we are indexing only to one or at most 2 cores - Also we processing our pdf's and html files externally to text files before feeding it to solr We are planning to move to the AWS using 3.6.1 and would want to - Separate the Indexing and Searching to separate servers as master /slave .This is mainly so that the both the activities are not competing for resources - Also to use Tika to process pdf and also to process html files directly via solr ,which might increase the CPU load. - But ,if I set up so that all Indexing request are going to one server sequentially and each core in slave polls the master core for index changes ,and then issues a commit to load a new index reader,then all this activity might happen in parallel which will actually spike the CPU activity on slave and hence will degrade the search performance? Is this assumption correct?Then is there any advantage other than availability to this architecture ,any advice on this? Regards Sujatha