Re: Replication Issue with Repeater Please help

2014-08-16 Thread Shawn Heisey
On 8/16/2014 8:11 AM, waqas sarwar wrote:
>> Thank you so much. You helped alot. One more question is that can i use only 
>> one zookeeper server to manage 3 solr servers, or i've to configure 3 
>> zookeeper servers for each. >And zookeeper servers should be stand alone or 
>> better to use same solr server machine ?>>Best Regards,>Waqas
>>

I think Erick basically said the same thing as this, in a slightly
different way:

If you want zookeeper to be fault tolerant, you must have at least three
servers running it.  One zookeeper will work, but if it goes down,
SolrCloud doesn't function properly.  Three are needed for full
redundancy.  If one of the three goes down, the other two will still
function as a quorum.

You can use the same servers for Zookeeper and Solr.  This *can* be a
source of performance problems, but that will usually only be a problem
if you put a major load on your SolrCloud.  If you do put them on the
same server, I would recommend putting the zk database on a separate
disk or disks -- the CPU requirements for Zookeeper are very small, but
it relies on extremely responsive I/O to/from its database.

As Erick said, we strongly recommend that you don't use the embedded ZK
-- this starts up a zookeeper server in the same Java process as Solr.
If Solr is stopped or goes down, you also lose zookeeper.

Thanks,
Shawn



Re: Replication Issue with Repeater Please help

2014-08-16 Thread Erick Erickson
It Depends (tm).

> One ZooKeeper is a single point of failure. It goes away and your SolrCloud 
> cluster is kinda hosed. OTOH, with only 3 servers, the chance that one of 
> them is going down is low anyway. How lucky do you feel?

> I would be cautious about running your ZK instances embedded, 
> super-especially if there's only one ZK instance. That couples your ZK 
> instances with your Solr instances. So if for any reason you want  to 
> stop/start Solr, you will stop/start ZK as well and it's easy to fall below a 
> quorum. It's perfectly viable to run them embedded, especially on a very 
> small cluster. You do have to think a bit more about sequencing Solr nodes 
> going up/down is all.

Best,
Erick

On Sat, Aug 16, 2014 at 7:11 AM, waqas sarwar  wrote:
>
>
>> Date: Thu, 14 Aug 2014 06:51:02 -0600
>> From: s...@elyograg.org
>> To: solr-user@lucene.apache.org
>> Subject: Re: Replication Issue with Repeater Please help
>>
>> On 8/14/2014 2:09 AM, waqas sarwar wrote:
>> > Thanks Shawn. What i got is Circular replication is totally impossible & 
>> > Solr fails in distributed environment. Then why solr documentation says 
>> > that configure "REPEATER" for distributed architecture, because "REPEATER" 
>> > behave like master-slave at a time.
>> > Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
>> > Please provide me any solution for LAN distributed servers. If zookeeper 
>> > in only solution then provide me any link to configure it that can help me 
>> > & to avoid wrong direction.
>>
>> The repeater config is designed to avoid master overload from many
>> slaves.  So instead of configuring ten slaves to replicate from one
>> master, you configure two slaves to replicate directly from your master,
>> and then you configure those as repeaters.  The other eight slaves are
>> configured so that four of them replicate from each of the repeaters
>> instead of the true master, reducing the load.
>>
>> SolrCloud is the easiest way to build a fully distributed and redundant
>> solution.  It is designed for a LAN.  You configure three machines as
>> your zookeeper ensemble, using the zookeeper download and instructions
>> for a clustered setup:
>>
>> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup
>>
>> The way to start Solr in cloud mode is to give it a zkHost system
>> property.  That informs Solr about all of your ZK servers.  If you have
>> another way of setting that property, you can use that instead.  I
>> strongly recommend using a chroot with the zkHost parameter, but that is
>> not required.  Search the zookeeper page linked above for "chroot" to
>> find a link to additional documentation about chroot.
>>
>> You can use the same servers for ZK as you do for Solr, but be aware
>> that if Solr puts a large I/O load on the disks, you may want the ZK
>> database to be on its own disks(s) so that it responds quickly.
>> Separate servers is even better, but not strictly required unless the
>> servers are under extreme load.
>>
>> https://cwiki.apache.org/confluence/display/solr/SolrCloud
>>
>> You will find a "Getting Started" link on the page above.  Note that the
>> "Getting Started" page talks about a zkRun option, which starts an
>> embedded zookeeper as part of Solr.  I strongly recommend that you do
>> NOT take this route, except for *initial* testing.  SolrCloud works much
>> better if the Zookeeper ensemble is in its own process, separate from Solr.
>>
>> Thanks,
>> Shawn
>>
>> Thank you so much. You helped alot. One more question is that can i use only 
>> one zookeeper server to manage 3 solr servers, or i've to configure 3 
>> zookeeper servers for each. >And zookeeper servers should be stand alone or 
>> better to use same solr server machine ?>>Best Regards,>Waqas


RE: Replication Issue with Repeater Please help

2014-08-16 Thread waqas sarwar


> Date: Thu, 14 Aug 2014 06:51:02 -0600
> From: s...@elyograg.org
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Issue with Repeater Please help
> 
> On 8/14/2014 2:09 AM, waqas sarwar wrote:
> > Thanks Shawn. What i got is Circular replication is totally impossible & 
> > Solr fails in distributed environment. Then why solr documentation says 
> > that configure "REPEATER" for distributed architecture, because "REPEATER" 
> > behave like master-slave at a time.
> > Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
> > Please provide me any solution for LAN distributed servers. If zookeeper in 
> > only solution then provide me any link to configure it that can help me & 
> > to avoid wrong direction.
> 
> The repeater config is designed to avoid master overload from many
> slaves.  So instead of configuring ten slaves to replicate from one
> master, you configure two slaves to replicate directly from your master,
> and then you configure those as repeaters.  The other eight slaves are
> configured so that four of them replicate from each of the repeaters
> instead of the true master, reducing the load.
> 
> SolrCloud is the easiest way to build a fully distributed and redundant
> solution.  It is designed for a LAN.  You configure three machines as
> your zookeeper ensemble, using the zookeeper download and instructions
> for a clustered setup:
> 
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup
> 
> The way to start Solr in cloud mode is to give it a zkHost system
> property.  That informs Solr about all of your ZK servers.  If you have
> another way of setting that property, you can use that instead.  I
> strongly recommend using a chroot with the zkHost parameter, but that is
> not required.  Search the zookeeper page linked above for "chroot" to
> find a link to additional documentation about chroot.
> 
> You can use the same servers for ZK as you do for Solr, but be aware
> that if Solr puts a large I/O load on the disks, you may want the ZK
> database to be on its own disks(s) so that it responds quickly. 
> Separate servers is even better, but not strictly required unless the
> servers are under extreme load.
> 
> https://cwiki.apache.org/confluence/display/solr/SolrCloud
> 
> You will find a "Getting Started" link on the page above.  Note that the
> "Getting Started" page talks about a zkRun option, which starts an
> embedded zookeeper as part of Solr.  I strongly recommend that you do
> NOT take this route, except for *initial* testing.  SolrCloud works much
> better if the Zookeeper ensemble is in its own process, separate from Solr.
> 
> Thanks,
> Shawn
> 
> Thank you so much. You helped alot. One more question is that can i use only 
> one zookeeper server to manage 3 solr servers, or i've to configure 3 
> zookeeper servers for each. >And zookeeper servers should be stand alone or 
> better to use same solr server machine ?>>Best Regards,>Waqas 
>

Re: Replication Issue with Repeater Please help

2014-08-14 Thread Shawn Heisey
On 8/14/2014 2:09 AM, waqas sarwar wrote:
> Thanks Shawn. What i got is Circular replication is totally impossible & Solr 
> fails in distributed environment. Then why solr documentation says that 
> configure "REPEATER" for distributed architecture, because "REPEATER" behave 
> like master-slave at a time.
> Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. 
> Please provide me any solution for LAN distributed servers. If zookeeper in 
> only solution then provide me any link to configure it that can help me & to 
> avoid wrong direction.

The repeater config is designed to avoid master overload from many
slaves.  So instead of configuring ten slaves to replicate from one
master, you configure two slaves to replicate directly from your master,
and then you configure those as repeaters.  The other eight slaves are
configured so that four of them replicate from each of the repeaters
instead of the true master, reducing the load.

SolrCloud is the easiest way to build a fully distributed and redundant
solution.  It is designed for a LAN.  You configure three machines as
your zookeeper ensemble, using the zookeeper download and instructions
for a clustered setup:

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_zkMulitServerSetup

The way to start Solr in cloud mode is to give it a zkHost system
property.  That informs Solr about all of your ZK servers.  If you have
another way of setting that property, you can use that instead.  I
strongly recommend using a chroot with the zkHost parameter, but that is
not required.  Search the zookeeper page linked above for "chroot" to
find a link to additional documentation about chroot.

You can use the same servers for ZK as you do for Solr, but be aware
that if Solr puts a large I/O load on the disks, you may want the ZK
database to be on its own disks(s) so that it responds quickly. 
Separate servers is even better, but not strictly required unless the
servers are under extreme load.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

You will find a "Getting Started" link on the page above.  Note that the
"Getting Started" page talks about a zkRun option, which starts an
embedded zookeeper as part of Solr.  I strongly recommend that you do
NOT take this route, except for *initial* testing.  SolrCloud works much
better if the Zookeeper ensemble is in its own process, separate from Solr.

Thanks,
Shawn



RE: Replication Issue with Repeater Please help

2014-08-14 Thread waqas sarwar


> Date: Wed, 13 Aug 2014 07:19:58 -0600
> From: s...@elyograg.org
> To: solr-user@lucene.apache.org
> Subject: Re: Replication Issue with Repeater Please help
> 
> On 8/13/2014 12:49 AM, waqas sarwar wrote:
> > Hi, I'm using Solr. I need a little bit assistance from you. I am 
> > bit stuck with Solr replication, before discussing issue let me write a 
> > brief description.Scenario:- I want to set up solr in distributed 
> > architecture, suppose start with least no of nodes (suppose 3), how can i 
> > replicate data of each node to 2 others and vice versa.My Solution:- I 
> > set up “REPEATER” on all nodes, each node is master to other, and 
> > configured circular replication. Issue i'm facing:- All nodes are 
> > working fine replicating data from other node, but when node1 replicate 
> > data from node2, node1 loses its own data. I think node1 don’t have to 
> > atleast lose its own data & have to merge new data. I think now question is 
> > pretty simple and clear, I want to set up solr in distributed architecture, 
> > each node is replica to other, how may i achieve it. Is there be any other 
> > way except Repeater and circular replication using repeater, to replicate 
> > data of each node to all others.  Environme
>  nt:- LA
> N, Solr (3.6 to 4.9), Redhat
> 
> With master-slave replication, there must be a clear master, from which
> slaves replicate.  You can't set up fully circular replication, or the
> master will replicate from the empty slave and your data will be gone.
> This form of replication does not merge data -- it makes the slave index
> identical to the master by copying the actual files on disk for the index.
> 
> I think you'll want to use SolrCloud.  You have three machines, so you
> have the minimum number for a redundant zookeeper ensemble.  SolrCloud
> relies on zookeeper to handle cluster functions.  SolrCloud is a true
> cluster -- no replication, no master.
> 
> https://cwiki.apache.org/confluence/display/solr/SolrCloud
> 
> Thanks,
> Shawn


Thanks Shawn. What i got is Circular replication is totally impossible & Solr 
fails in distributed environment. Then why solr documentation says that 
configure "REPEATER" for distributed architecture, because "REPEATER" behave 
like master-slave at a time.
Can i configure SolrCloud on LAN, or i've to configure zookeeper myself. Please 
provide me any solution for LAN distributed servers. If zookeeper in only 
solution then provide me any link to configure it that can help me & to avoid 
wrong direction.
Regards,Waqas 

Re: Replication Issue with Repeater Please help

2014-08-13 Thread Shawn Heisey
On 8/13/2014 12:49 AM, waqas sarwar wrote:
> Hi, I'm using Solr. I need a little bit assistance from you. I am bit 
> stuck with Solr replication, before discussing issue let me write a brief 
> description.Scenario:- I want to set up solr in distributed architecture, 
> suppose start with least no of nodes (suppose 3), how can i replicate data of 
> each node to 2 others and vice versa.My Solution:- I set up “REPEATER” on 
> all nodes, each node is master to other, and configured circular replication. 
> Issue i'm facing:- All nodes are working fine replicating data from other 
> node, but when node1 replicate data from node2, node1 loses its own data. I 
> think node1 don’t have to atleast lose its own data & have to merge new data. 
> I think now question is pretty simple and clear, I want to set up solr in 
> distributed architecture, each node is replica to other, how may i achieve 
> it. Is there be any other way except Repeater and circular replication using 
> repeater, to replicate data of each node to all others.  Environme
 nt:- LA
N, Solr (3.6 to 4.9), Redhat  

With master-slave replication, there must be a clear master, from which
slaves replicate.  You can't set up fully circular replication, or the
master will replicate from the empty slave and your data will be gone.
This form of replication does not merge data -- it makes the slave index
identical to the master by copying the actual files on disk for the index.

I think you'll want to use SolrCloud.  You have three machines, so you
have the minimum number for a redundant zookeeper ensemble.  SolrCloud
relies on zookeeper to handle cluster functions.  SolrCloud is a true
cluster -- no replication, no master.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

Thanks,
Shawn



Replication Issue with Repeater Please help

2014-08-12 Thread waqas sarwar
Hi, I'm using Solr. I need a little bit assistance from you. I am bit 
stuck with Solr replication, before discussing issue let me write a brief 
description.Scenario:- I want to set up solr in distributed architecture, 
suppose start with least no of nodes (suppose 3), how can i replicate data of 
each node to 2 others and vice versa.My Solution:- I set up “REPEATER” on 
all nodes, each node is master to other, and configured circular replication.   
  Issue i'm facing:- All nodes are working fine replicating data from other 
node, but when node1 replicate data from node2, node1 loses its own data. I 
think node1 don’t have to atleast lose its own data & have to merge new data. I 
think now question is pretty simple and clear, I want to set up solr in 
distributed architecture, each node is replica to other, how may i achieve it. 
Is there be any other way except Repeater and circular replication using 
repeater, to replicate data of each node to all others.  Environment:- LAN, 
Solr (3.6 to 4.9), Redhat