RE: Replication Issue with Repeater Please help

2014-08-16 Thread waqas sarwar


 Date: Thu, 14 Aug 2014 06:51:02 -0600
 From: s...@elyograg.org
 To: solr-user@lucene.apache.org
 Subject: Re: Replication Issue with Repeater Please help
 
 On 8/14/2014 2:09 AM, waqas sarwar wrote:
  Thanks Shawn. What i got is Circular replication is totally impossible  
  Solr fails in distributed environment. Then why solr documentation says 
  that configure REPEATER for distributed architecture, because REPEATER 
  behave like master-slave at a time.
  Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
  Please provide me any solution for LAN distributed servers. If zookeeper in 
  only solution then provide me any link to configure it that can help me  
  to avoid wrong direction.
 
 The repeater config is designed to avoid master overload from many
 slaves.  So instead of configuring ten slaves to replicate from one
 master, you configure two slaves to replicate directly from your master,
 and then you configure those as repeaters.  The other eight slaves are
 configured so that four of them replicate from each of the repeaters
 instead of the true master, reducing the load.
 
 SolrCloud is the easiest way to build a fully distributed and redundant
 solution.  It is designed for a LAN.  You configure three machines as
 your zookeeper ensemble, using the zookeeper download and instructions
 for a clustered setup:
 
 http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup
 
 The way to start Solr in cloud mode is to give it a zkHost system
 property.  That informs Solr about all of your ZK servers.  If you have
 another way of setting that property, you can use that instead.  I
 strongly recommend using a chroot with the zkHost parameter, but that is
 not required.  Search the zookeeper page linked above for chroot to
 find a link to additional documentation about chroot.
 
 You can use the same servers for ZK as you do for Solr, but be aware
 that if Solr puts a large I/O load on the disks, you may want the ZK
 database to be on its own disks(s) so that it responds quickly. 
 Separate servers is even better, but not strictly required unless the
 servers are under extreme load.
 
 https://cwiki.apache.org/confluence/display/solr/SolrCloud
 
 You will find a Getting Started link on the page above.  Note that the
 Getting Started page talks about a zkRun option, which starts an
 embedded zookeeper as part of Solr.  I strongly recommend that you do
 NOT take this route, except for *initial* testing.  SolrCloud works much
 better if the Zookeeper ensemble is in its own process, separate from Solr.
 
 Thanks,
 Shawn
 
 Thank you so much. You helped alot. One more question is that can i use only 
 one zookeeper server to manage 3 solr servers, or i've to configure 3 
 zookeeper servers for each. And zookeeper servers should be stand alone or 
 better to use same solr server machine ?Best Regards,Waqas 


Re: Replication Issue with Repeater Please help

2014-08-16 Thread Erick Erickson
It Depends (tm).

 One ZooKeeper is a single point of failure. It goes away and your SolrCloud 
 cluster is kinda hosed. OTOH, with only 3 servers, the chance that one of 
 them is going down is low anyway. How lucky do you feel?

 I would be cautious about running your ZK instances embedded, 
 super-especially if there's only one ZK instance. That couples your ZK 
 instances with your Solr instances. So if for any reason you want  to 
 stop/start Solr, you will stop/start ZK as well and it's easy to fall below a 
 quorum. It's perfectly viable to run them embedded, especially on a very 
 small cluster. You do have to think a bit more about sequencing Solr nodes 
 going up/down is all.

Best,
Erick

On Sat, Aug 16, 2014 at 7:11 AM, waqas sarwar waqassarwa...@hotmail.com wrote:


 Date: Thu, 14 Aug 2014 06:51:02 -0600
 From: s...@elyograg.org
 To: solr-user@lucene.apache.org
 Subject: Re: Replication Issue with Repeater Please help

 On 8/14/2014 2:09 AM, waqas sarwar wrote:
  Thanks Shawn. What i got is Circular replication is totally impossible  
  Solr fails in distributed environment. Then why solr documentation says 
  that configure REPEATER for distributed architecture, because REPEATER 
  behave like master-slave at a time.
  Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
  Please provide me any solution for LAN distributed servers. If zookeeper 
  in only solution then provide me any link to configure it that can help me 
   to avoid wrong direction.

 The repeater config is designed to avoid master overload from many
 slaves.  So instead of configuring ten slaves to replicate from one
 master, you configure two slaves to replicate directly from your master,
 and then you configure those as repeaters.  The other eight slaves are
 configured so that four of them replicate from each of the repeaters
 instead of the true master, reducing the load.

 SolrCloud is the easiest way to build a fully distributed and redundant
 solution.  It is designed for a LAN.  You configure three machines as
 your zookeeper ensemble, using the zookeeper download and instructions
 for a clustered setup:

 http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup

 The way to start Solr in cloud mode is to give it a zkHost system
 property.  That informs Solr about all of your ZK servers.  If you have
 another way of setting that property, you can use that instead.  I
 strongly recommend using a chroot with the zkHost parameter, but that is
 not required.  Search the zookeeper page linked above for chroot to
 find a link to additional documentation about chroot.

 You can use the same servers for ZK as you do for Solr, but be aware
 that if Solr puts a large I/O load on the disks, you may want the ZK
 database to be on its own disks(s) so that it responds quickly.
 Separate servers is even better, but not strictly required unless the
 servers are under extreme load.

 https://cwiki.apache.org/confluence/display/solr/SolrCloud

 You will find a Getting Started link on the page above.  Note that the
 Getting Started page talks about a zkRun option, which starts an
 embedded zookeeper as part of Solr.  I strongly recommend that you do
 NOT take this route, except for *initial* testing.  SolrCloud works much
 better if the Zookeeper ensemble is in its own process, separate from Solr.

 Thanks,
 Shawn

 Thank you so much. You helped alot. One more question is that can i use only 
 one zookeeper server to manage 3 solr servers, or i've to configure 3 
 zookeeper servers for each. And zookeeper servers should be stand alone or 
 better to use same solr server machine ?Best Regards,Waqas


Re: Replication Issue with Repeater Please help

2014-08-16 Thread Shawn Heisey
On 8/16/2014 8:11 AM, waqas sarwar wrote:
 Thank you so much. You helped alot. One more question is that can i use only 
 one zookeeper server to manage 3 solr servers, or i've to configure 3 
 zookeeper servers for each. And zookeeper servers should be stand alone or 
 better to use same solr server machine ?Best Regards,Waqas


I think Erick basically said the same thing as this, in a slightly
different way:

If you want zookeeper to be fault tolerant, you must have at least three
servers running it.  One zookeeper will work, but if it goes down,
SolrCloud doesn't function properly.  Three are needed for full
redundancy.  If one of the three goes down, the other two will still
function as a quorum.

You can use the same servers for Zookeeper and Solr.  This *can* be a
source of performance problems, but that will usually only be a problem
if you put a major load on your SolrCloud.  If you do put them on the
same server, I would recommend putting the zk database on a separate
disk or disks -- the CPU requirements for Zookeeper are very small, but
it relies on extremely responsive I/O to/from its database.

As Erick said, we strongly recommend that you don't use the embedded ZK
-- this starts up a zookeeper server in the same Java process as Solr.
If Solr is stopped or goes down, you also lose zookeeper.

Thanks,
Shawn



RE: Replication Issue with Repeater Please help

2014-08-14 Thread waqas sarwar


 Date: Wed, 13 Aug 2014 07:19:58 -0600
 From: s...@elyograg.org
 To: solr-user@lucene.apache.org
 Subject: Re: Replication Issue with Repeater Please help
 
 On 8/13/2014 12:49 AM, waqas sarwar wrote:
  Hi, I'm using Solr. I need a little bit assistance from you. I am 
  bit stuck with Solr replication, before discussing issue let me write a 
  brief description.Scenario:- I want to set up solr in distributed 
  architecture, suppose start with least no of nodes (suppose 3), how can i 
  replicate data of each node to 2 others and vice versa.My Solution:- I 
  set up “REPEATER” on all nodes, each node is master to other, and 
  configured circular replication. Issue i'm facing:- All nodes are 
  working fine replicating data from other node, but when node1 replicate 
  data from node2, node1 loses its own data. I think node1 don’t have to 
  atleast lose its own data  have to merge new data. I think now question is 
  pretty simple and clear, I want to set up solr in distributed architecture, 
  each node is replica to other, how may i achieve it. Is there be any other 
  way except Repeater and circular replication using repeater, to replicate 
  data of each node to all others.  Environme
  nt:- LA
 N, Solr (3.6 to 4.9), Redhat
 
 With master-slave replication, there must be a clear master, from which
 slaves replicate.  You can't set up fully circular replication, or the
 master will replicate from the empty slave and your data will be gone.
 This form of replication does not merge data -- it makes the slave index
 identical to the master by copying the actual files on disk for the index.
 
 I think you'll want to use SolrCloud.  You have three machines, so you
 have the minimum number for a redundant zookeeper ensemble.  SolrCloud
 relies on zookeeper to handle cluster functions.  SolrCloud is a true
 cluster -- no replication, no master.
 
 https://cwiki.apache.org/confluence/display/solr/SolrCloud
 
 Thanks,
 Shawn


Thanks Shawn. What i got is Circular replication is totally impossible  Solr 
fails in distributed environment. Then why solr documentation says that 
configure REPEATER for distributed architecture, because REPEATER behave 
like master-slave at a time.
Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. Please 
provide me any solution for LAN distributed servers. If zookeeper in only 
solution then provide me any link to configure it that can help me  to avoid 
wrong direction.
Regards,Waqas 

Re: Replication Issue with Repeater Please help

2014-08-14 Thread Shawn Heisey
On 8/14/2014 2:09 AM, waqas sarwar wrote:
 Thanks Shawn. What i got is Circular replication is totally impossible  Solr 
 fails in distributed environment. Then why solr documentation says that 
 configure REPEATER for distributed architecture, because REPEATER behave 
 like master-slave at a time.
 Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
 Please provide me any solution for LAN distributed servers. If zookeeper in 
 only solution then provide me any link to configure it that can help me  to 
 avoid wrong direction.

The repeater config is designed to avoid master overload from many
slaves.  So instead of configuring ten slaves to replicate from one
master, you configure two slaves to replicate directly from your master,
and then you configure those as repeaters.  The other eight slaves are
configured so that four of them replicate from each of the repeaters
instead of the true master, reducing the load.

SolrCloud is the easiest way to build a fully distributed and redundant
solution.  It is designed for a LAN.  You configure three machines as
your zookeeper ensemble, using the zookeeper download and instructions
for a clustered setup:

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup

The way to start Solr in cloud mode is to give it a zkHost system
property.  That informs Solr about all of your ZK servers.  If you have
another way of setting that property, you can use that instead.  I
strongly recommend using a chroot with the zkHost parameter, but that is
not required.  Search the zookeeper page linked above for chroot to
find a link to additional documentation about chroot.

You can use the same servers for ZK as you do for Solr, but be aware
that if Solr puts a large I/O load on the disks, you may want the ZK
database to be on its own disks(s) so that it responds quickly. 
Separate servers is even better, but not strictly required unless the
servers are under extreme load.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

You will find a Getting Started link on the page above.  Note that the
Getting Started page talks about a zkRun option, which starts an
embedded zookeeper as part of Solr.  I strongly recommend that you do
NOT take this route, except for *initial* testing.  SolrCloud works much
better if the Zookeeper ensemble is in its own process, separate from Solr.

Thanks,
Shawn



Re: Replication Issue with Repeater Please help

2014-08-13 Thread Shawn Heisey
On 8/13/2014 12:49 AM, waqas sarwar wrote:
 Hi, I'm using Solr. I need a little bit assistance from you. I am bit 
 stuck with Solr replication, before discussing issue let me write a brief 
 description.Scenario:- I want to set up solr in distributed architecture, 
 suppose start with least no of nodes (suppose 3), how can i replicate data of 
 each node to 2 others and vice versa.My Solution:- I set up “REPEATER” on 
 all nodes, each node is master to other, and configured circular replication. 
 Issue i'm facing:- All nodes are working fine replicating data from other 
 node, but when node1 replicate data from node2, node1 loses its own data. I 
 think node1 don’t have to atleast lose its own data  have to merge new data. 
 I think now question is pretty simple and clear, I want to set up solr in 
 distributed architecture, each node is replica to other, how may i achieve 
 it. Is there be any other way except Repeater and circular replication using 
 repeater, to replicate data of each node to all others.  Environme
 nt:- LA
N, Solr (3.6 to 4.9), Redhat  

With master-slave replication, there must be a clear master, from which
slaves replicate.  You can't set up fully circular replication, or the
master will replicate from the empty slave and your data will be gone.
This form of replication does not merge data -- it makes the slave index
identical to the master by copying the actual files on disk for the index.

I think you'll want to use SolrCloud.  You have three machines, so you
have the minimum number for a redundant zookeeper ensemble.  SolrCloud
relies on zookeeper to handle cluster functions.  SolrCloud is a true
cluster -- no replication, no master.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

Thanks,
Shawn