Re: how do you replicate solr-cloud between datacenters?
Yes, thank you. On Tue, Mar 31, 2015 at 9:54 AM, Davis, Daniel (NIH/NLM) [C] daniel.da...@nih.gov wrote: I got the answer to my most recent question without even asking it! Thanks -Original Message- From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: Monday, March 30, 2015 6:40 PM To: solr-user@lucene.apache.org Subject: Re: how do you replicate solr-cloud between datacenters? That's an open issue. See: https://issues.apache.org/jira/browse/SOLR-6273 -- Jack Krupansky On Mon, Mar 30, 2015 at 5:45 PM, Timothy Ehlers ehle...@gmail.com wrote: Can you use /replication ??? How would you do this between datacenters? -- Tim Ehlers -- Tim Ehlers
RE: how do you replicate solr-cloud between datacenters?
I got the answer to my most recent question without even asking it! Thanks -Original Message- From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: Monday, March 30, 2015 6:40 PM To: solr-user@lucene.apache.org Subject: Re: how do you replicate solr-cloud between datacenters? That's an open issue. See: https://issues.apache.org/jira/browse/SOLR-6273 -- Jack Krupansky On Mon, Mar 30, 2015 at 5:45 PM, Timothy Ehlers ehle...@gmail.com wrote: Can you use /replication ??? How would you do this between datacenters? -- Tim Ehlers
how do you replicate solr-cloud between datacenters?
Can you use /replication ??? How would you do this between datacenters? -- Tim Ehlers
Re: how do you replicate solr-cloud between datacenters?
That's an open issue. See: https://issues.apache.org/jira/browse/SOLR-6273 -- Jack Krupansky On Mon, Mar 30, 2015 at 5:45 PM, Timothy Ehlers ehle...@gmail.com wrote: Can you use /replication ??? How would you do this between datacenters? -- Tim Ehlers
Re: Replicate Solr Cloud
You'll have to provide some more details on your problem. What do you mean by location A and B : 2 different machines? By default SolrCloud shards can have replicas which can be hosted on different machines. It can offer you redundancy, if one of you machines dies, your search system will still be up if the other machine(s) are up and running. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/Replicate-Solr-Cloud-tp4100410p4100434.html Sent from the Solr - User mailing list archive at Nabble.com.
Replicate Solr Cloud
Hi i want to create solr cloud like this: 1 solr cloud in location A, and another solr cloud in location B how to make that solr cloud is location B is replicate solr cloud in location A. And if all node in slor cloud A is die slor cloud B is still working and vice versa. any body know how to create this thanks
Re: how to replicate Solr Cloud
On the lengthy TODO list is making SolrCloud nodes rack aware that should help with this, but it's not real high in the priority queue as I recall. The current architecture sends updates and requests all over the cluster, so there are lots of messages that go across the presumably expensive pipe between data centers. Not to mention the Zookeeper quorum problem. Hmmm, Zookeeper Quorum problem. Say 1 ZK node is in DC1 and 2 are in DC2. If DC2 goes down, DC1 will not accept updates because there is no available ZK quorum. I've seen one proposal where you use 3 DCs, each with a ZK node to ameliorate this. But all this is an issue only if the communications link between the datacenters is expensive where that term can mean that it literally costs more, that it is slow, whatever. Best Erick On Tue, Jun 25, 2013 at 12:14 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Uh, I remember that email, but can't recall where we did it will try to recall it some more and reply if I can manage to dig it out of my brain... Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jun 25, 2013 at 2:24 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: Otis, I did actually stumble upon this link. http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/74870 This was from you. You were attempting to replicate data from SolrCloud to some other slaves for heavy-duty queries. You said that you accomplished this. Can you provide a few pointers on how you did this? Thanks. On Tue, Jun 25, 2013 at 10:25 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I think what is needed is a Leader that, while being a Leader for its own Slice in its local Cluster and Collection (I think I'm using all the latest terminology correctly here), is at the same time a Replica of its own Leader counterpart in the Primary Cluster. Not currently possible, AFAIK. Or maybe there is a better way? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jun 25, 2013 at 1:07 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions] -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
how to replicate Solr Cloud
We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
Re: how to replicate Solr Cloud
I think what is needed is a Leader that, while being a Leader for its own Slice in its local Cluster and Collection (I think I'm using all the latest terminology correctly here), is at the same time a Replica of its own Leader counterpart in the Primary Cluster. Not currently possible, AFAIK. Or maybe there is a better way? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jun 25, 2013 at 1:07 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
Re: how to replicate Solr Cloud
Kevin, I can imagine this working if you consider your second data center a pure slave relationship to your SolrCloud cluster. I haven't tried it, but I don't see why the solrconfig.xml can't identify as a master allowing you to call any of your cores in the cluster to replicate out. That being said, this idea doesn't facilitate a SolrCloud cluster in the second data center…just a slave that could be a repeater. You say that sending the data in both directions is not idea, but it works and is conceptually very simple. What is the reasoning behind wanting to get away from that approach? Jason On Jun 25, 2013, at 10:07 AM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
Re: how to replicate Solr Cloud
Otis, I did actually stumble upon this link. http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/74870 This was from you. You were attempting to replicate data from SolrCloud to some other slaves for heavy-duty queries. You said that you accomplished this. Can you provide a few pointers on how you did this? Thanks. On Tue, Jun 25, 2013 at 10:25 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I think what is needed is a Leader that, while being a Leader for its own Slice in its local Cluster and Collection (I think I'm using all the latest terminology correctly here), is at the same time a Replica of its own Leader counterpart in the Primary Cluster. Not currently possible, AFAIK. Or maybe there is a better way? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jun 25, 2013 at 1:07 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions] -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
Re: how to replicate Solr Cloud
Jason, My initial reluctance to indexing directly to both data centers is that we are doing a lot of bulk loading through CSV handler. We never get just 1 document at a time. It comes in large batch updates. And now we would have to send the batch updates twice. That is not to say that we won't go this way. But I am exploring other solutions as well. On Tue, Jun 25, 2013 at 11:21 AM, Jason Hellman jhell...@innoventsolutions.com wrote: Kevin, I can imagine this working if you consider your second data center a pure slave relationship to your SolrCloud cluster. I haven't tried it, but I don't see why the solrconfig.xml can't identify as a master allowing you to call any of your cores in the cluster to replicate out. That being said, this idea doesn't facilitate a SolrCloud cluster in the second data center…just a slave that could be a repeater. You say that sending the data in both directions is not idea, but it works and is conceptually very simple. What is the reasoning behind wanting to get away from that approach? Jason On Jun 25, 2013, at 10:07 AM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions] -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
Re: how to replicate Solr Cloud
Also, you have to track two sets of batches, failures, and retries. --wunder On Jun 25, 2013, at 11:30 AM, Kevin Osborn wrote: Jason, My initial reluctance to indexing directly to both data centers is that we are doing a lot of bulk loading through CSV handler. We never get just 1 document at a time. It comes in large batch updates. And now we would have to send the batch updates twice. That is not to say that we won't go this way. But I am exploring other solutions as well. On Tue, Jun 25, 2013 at 11:21 AM, Jason Hellman jhell...@innoventsolutions.com wrote: Kevin, I can imagine this working if you consider your second data center a pure slave relationship to your SolrCloud cluster. I haven't tried it, but I don't see why the solrconfig.xml can't identify as a master allowing you to call any of your cores in the cluster to replicate out. That being said, this idea doesn't facilitate a SolrCloud cluster in the second data center…just a slave that could be a repeater. You say that sending the data in both directions is not idea, but it works and is conceptually very simple. What is the reasoning behind wanting to get away from that approach? Jason On Jun 25, 2013, at 10:07 AM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions] -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions] -- Walter Underwood wun...@wunderwood.org
Re: how to replicate Solr Cloud
Uh, I remember that email, but can't recall where we did it will try to recall it some more and reply if I can manage to dig it out of my brain... Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jun 25, 2013 at 2:24 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: Otis, I did actually stumble upon this link. http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/74870 This was from you. You were attempting to replicate data from SolrCloud to some other slaves for heavy-duty queries. You said that you accomplished this. Can you provide a few pointers on how you did this? Thanks. On Tue, Jun 25, 2013 at 10:25 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I think what is needed is a Leader that, while being a Leader for its own Slice in its local Cluster and Collection (I think I'm using all the latest terminology correctly here), is at the same time a Replica of its own Leader counterpart in the Primary Cluster. Not currently possible, AFAIK. Or maybe there is a better way? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jun 25, 2013 at 1:07 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: We are going to have two datacenters, each with their own SolrCloud and ZooKeeper quorums. The end result will be that they should be replicas of each other. One method that has been mentioned is that we should add documents to each cluster separately. For various reasons, this may not be ideal for us. Instead, we are playing around with the idea of always indexing to one datacenter. And then having that replicate to the other datacenter. And this is where I am having some trouble on how to proceed. The nice thing about SolrCloud is that there is no masters and slaves. Each node is equals, has the same configs, etc. But in this case, I want to have a node in one datacenter poll for changes in another data center. Before SolrCloud, I would have used slave/master replication. But in the SolrCloud world, I am not sure how to configure this setup? Or is there any better ideas on how to use replication to push or pull data from one datacenter to another? In my case, NRT is not a requirement. And I will also be dealing with about 3 collections and 5 or 6 shards. Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions] -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]