Master /Slave Architecture3.6.1

2013-01-10 Thread Sujatha Arun
Hi,

Our current architecture is as follows ,

   - Single server  [ On which we do both Indexing and Searching]
   - Solr version 3.6.1  Multicores
   - We have several small  big indexes as cores within a webapp
   - Our Indexing to the individual cores happen via an index queue ,due to
   which at any given time ,we are indexing only to one or at most 2 cores
   - Also we processing our pdf's and html files externally to text files
   before feeding it to solr


We are planning to move to the AWS using 3.6.1 and  would want to

   -  Separate the  Indexing and  Searching to separate servers as master
   /slave .This is mainly   so that the both the activities are not competing
   for resources
   - Also to use  Tika to process pdf and also to process html files
   directly via solr ,which might increase the CPU load.
   -  But ,if I set up so that all Indexing request are going to one server
   sequentially and each core in slave polls the master core for index changes
   ,and then issues a commit to load a new index reader,then all this activity
   might happen in parallel which will actually spike the CPU activity on
   slave and hence will degrade the search performance?

Is this assumption correct?Then is there any advantage other
than availability to this architecture ,any advice on this?

Regards
Sujatha


Re: Master /Slave Architecture3.6.1

2013-01-10 Thread Otis Gospodnetic
Hi,

You are going in the right direction and your assumptions are correct. In
short, if the performance hit is too big then you simply need more ec2
instances (some have high cpu, some memory, some disk IO ... pick wisely).

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 10, 2013 4:44 AM, Sujatha Arun suja.a...@gmail.com wrote:

 Hi,

 Our current architecture is as follows ,

- Single server  [ On which we do both Indexing and Searching]
- Solr version 3.6.1  Multicores
- We have several small  big indexes as cores within a webapp
- Our Indexing to the individual cores happen via an index queue ,due to
which at any given time ,we are indexing only to one or at most 2 cores
- Also we processing our pdf's and html files externally to text files
before feeding it to solr


 We are planning to move to the AWS using 3.6.1 and  would want to

-  Separate the  Indexing and  Searching to separate servers as master
/slave .This is mainly   so that the both the activities are not
 competing
for resources
- Also to use  Tika to process pdf and also to process html files
directly via solr ,which might increase the CPU load.
-  But ,if I set up so that all Indexing request are going to one server
sequentially and each core in slave polls the master core for index
 changes
,and then issues a commit to load a new index reader,then all this
 activity
might happen in parallel which will actually spike the CPU activity on
slave and hence will degrade the search performance?

 Is this assumption correct?Then is there any advantage other
 than availability to this architecture ,any advice on this?

 Regards
 Sujatha



Re: Master /Slave Architecture3.6.1

2013-01-10 Thread Sujatha Arun
Thanks,Otis..

But then what exactly is the advantage  for a master slave architecture
 for  multicore  ,when  replication has the same effect as that of a commit
and if I am going to have worse performance by moving to master/ slave over
a single server with sequential indexing?Am I missing anything?

Would it make sense to have each server act as both master and slave and
 LB the indexing and  searching requests to both servers?

Regards,
Sujatha
On Thu, Jan 10, 2013 at 8:41 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 You are going in the right direction and your assumptions are correct. In
 short, if the performance hit is too big then you simply need more ec2
 instances (some have high cpu, some memory, some disk IO ... pick wisely).

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On Jan 10, 2013 4:44 AM, Sujatha Arun suja.a...@gmail.com wrote:

  Hi,
 
  Our current architecture is as follows ,
 
 - Single server  [ On which we do both Indexing and Searching]
 - Solr version 3.6.1  Multicores
 - We have several small  big indexes as cores within a webapp
 - Our Indexing to the individual cores happen via an index queue ,due
 to
 which at any given time ,we are indexing only to one or at most 2
 cores
 - Also we processing our pdf's and html files externally to text files
 before feeding it to solr
 
 
  We are planning to move to the AWS using 3.6.1 and  would want to
 
 -  Separate the  Indexing and  Searching to separate servers as master
 /slave .This is mainly   so that the both the activities are not
  competing
 for resources
 - Also to use  Tika to process pdf and also to process html files
 directly via solr ,which might increase the CPU load.
 -  But ,if I set up so that all Indexing request are going to one
 server
 sequentially and each core in slave polls the master core for index
  changes
 ,and then issues a commit to load a new index reader,then all this
  activity
 might happen in parallel which will actually spike the CPU activity on
 slave and hence will degrade the search performance?
 
  Is this assumption correct?Then is there any advantage other
  than availability to this architecture ,any advice on this?
 
  Regards
  Sujatha
 



Re: Master /Slave Architecture3.6.1

2013-01-10 Thread Upayavira
In the end, the best advice is try it. 

You'll save the effort of indexing with this master/slave setup, but
you'll still need to warm your caches on each slave, which is a
reasonable portion of the work done on a commit. However, with a
master/slave setup, you get the option to go to two slaves, or three,
etc as demand increases - and you can put them all behind an elastic
load balancer, and scale easily.

You may have multiple cores on your Solr system, but note that servers
have multiple CPUs, so two simultaneous replication requests needn't be
a disaster.

Upayavira

On Thu, Jan 10, 2013, at 05:50 PM, Sujatha Arun wrote:
 Thanks,Otis..
 
 But then what exactly is the advantage  for a master slave architecture
  for  multicore  ,when  replication has the same effect as that of a
  commit
 and if I am going to have worse performance by moving to master/ slave
 over
 a single server with sequential indexing?Am I missing anything?
 
 Would it make sense to have each server act as both master and slave and
  LB the indexing and  searching requests to both servers?
 
 Regards,
 Sujatha
 On Thu, Jan 10, 2013 at 8:41 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:
 
  Hi,
 
  You are going in the right direction and your assumptions are correct. In
  short, if the performance hit is too big then you simply need more ec2
  instances (some have high cpu, some memory, some disk IO ... pick wisely).
 
  Otis
  Solr  ElasticSearch Support
  http://sematext.com/
  On Jan 10, 2013 4:44 AM, Sujatha Arun suja.a...@gmail.com wrote:
 
   Hi,
  
   Our current architecture is as follows ,
  
  - Single server  [ On which we do both Indexing and Searching]
  - Solr version 3.6.1  Multicores
  - We have several small  big indexes as cores within a webapp
  - Our Indexing to the individual cores happen via an index queue ,due
  to
  which at any given time ,we are indexing only to one or at most 2
  cores
  - Also we processing our pdf's and html files externally to text files
  before feeding it to solr
  
  
   We are planning to move to the AWS using 3.6.1 and  would want to
  
  -  Separate the  Indexing and  Searching to separate servers as master
  /slave .This is mainly   so that the both the activities are not
   competing
  for resources
  - Also to use  Tika to process pdf and also to process html files
  directly via solr ,which might increase the CPU load.
  -  But ,if I set up so that all Indexing request are going to one
  server
  sequentially and each core in slave polls the master core for index
   changes
  ,and then issues a commit to load a new index reader,then all this
   activity
  might happen in parallel which will actually spike the CPU activity on
  slave and hence will degrade the search performance?
  
   Is this assumption correct?Then is there any advantage other
   than availability to this architecture ,any advice on this?
  
   Regards
   Sujatha