Re: solrcloud shards backup/restoration
I also want to know how to realization it. -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-shards-backup-restoration-tp4088447p4138358.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud shards backup/restoration
We've managed some success restoring existing/backed up indexes into solr cloud and even building the indexes offline and dumping the lucene files into the directories that solr expects. The general steps we follow are: 1) Round up your files. It doesn't matter if you pull from a master or slave so long as you've committed and get a consistent copy of the data. 2) Use the collection api to create a collection in solr. The collection you're creating must have the same number of shards as the collection you've backed up and are restoring. 3) Stop all solr nodes. 4) Remove the index_name/data/ directory from the shards you're going to make the leader. In our case we've got 6 shards and a replication factor of 3 on a 6 node cluster so each server/jvm has three shards on it. Conveniently the shards are all either even or odd per jvm. 5) Populate the index_name/data/ directories on your intended leaders. As mentioned above since we've got six shards and any two jvm contain the entire index we only populate the data on two servers. 6) Start up *JUST* the servers that you've just populated. The goal here is to make these servers you've populated the leaders for the new collection and to have the official full copy of the index. Upon startup you might have to wait $leaderVoteWait for previously non-leader servers to timeout and become leaders 7) Once you've got at least one core up in each shard of your collection go ahead and start the others up. I think Aditya was failing by removing all the zookeeper data and starting everything up at once. If you force solr's hand a bit to pick leaders with the data that you want you'll have success when it replicates out to other nodes. It might also be possible to do this on-line by not stopping solr after creating the empty collection then copying the files into place on the leaders and issuing a RELOAD to pick up the changed indexes. I'm not sure how replicas would handle that though. Thanks, Greg On Jan 24, 2014, at 12:47 AM, Allan Mascarenhas allan.mascarenhas1...@gmail.com wrote: Any update on this ? I am also stuck with same problem, I want to install snapshot of master solr server to my local environment. but i could't :( All most spend 2 days to figure it out the way. Please help!! -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-shards-backup-restoration-tp4088447p4113142.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud shards backup/restoration
Any update on this ? I am also stuck with same problem, I want to install snapshot of master solr server to my local environment. but i could't :( All most spend 2 days to figure it out the way. Please help!! -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-shards-backup-restoration-tp4088447p4113142.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud shards backup/restoration
: http://lucene.472066.n3.nabble.com/solrcloud-shards-backup-restoration-tp4088447p4099789.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcloud shards backup/restoration
Hi, Sorry for the late followup on this. Let me put in more details here. *The problem:* Cannot successfully restore back the index backed up with '/replication?command=backup'. The backup was generated as * snapshot.mmdd* *My setup and steps:* * * 6 solrcloud instances 7 zookeepers instances Steps: 1. Take snapshot using *http://host1:8893/solr/replication?command=backup*, on one host only. move *snapshot.mmdd *to some reliable storage. 2. Stop all 6 solr instances, all 7 zk instances. 3. Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting the index data completely. 4. Delete zookeeper/data/version*/* on all zookeeper nodes. 5. Copy back index from backup to one of the nodes. \ cp *snapshot.mmdd/* *../collectionname/data/index/* 6. Restart all zk instances. Restart all solrcloud instances. *Outcome:* * * All solr instances are up. However, *num of docs = 0 *for all nodes. Looking at the node where the index was restored, there is a new index.yymmddhhmmss directory being created and index.properties pointing to it. That explains why no documents are reported. How do I have solrcloud pickup data from the index directory on a restart ? Thanks in advance, Aditya On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja aditya.sakh...@gmail.comwrote: Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1. I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2. How do we go about restoring the index in a fresh solrcloud cluster ? From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote: Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar. -- Regards, -Aditya Sakhuja -- Regards, -Aditya Sakhuja
Re: solrcloud shards backup/restoration
How does one recover from an index corruption ? That's what I am trying to eventually tackle here. Thanks Aditya On Thursday, September 19, 2013, Aditya Sakhuja wrote: Hi, Sorry for the late followup on this. Let me put in more details here. *The problem:* Cannot successfully restore back the index backed up with '/replication?command=backup'. The backup was generated as * snapshot.mmdd* *My setup and steps:* * * 6 solrcloud instances 7 zookeepers instances Steps: 1. Take snapshot using *http://host1:8893/solr/replication?command=backup *, on one host only. move *snapshot.mmdd *to some reliable storage. 2. Stop all 6 solr instances, all 7 zk instances. 3. Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting the index data completely. 4. Delete zookeeper/data/version*/* on all zookeeper nodes. 5. Copy back index from backup to one of the nodes. \ cp *snapshot.mmdd/* *../collectionname/data/index/* 6. Restart all zk instances. Restart all solrcloud instances. *Outcome:* * * All solr instances are up. However, *num of docs = 0 *for all nodes. Looking at the node where the index was restored, there is a new index.yymmddhhmmss directory being created and index.properties pointing to it. That explains why no documents are reported. How do I have solrcloud pickup data from the index directory on a restart ? Thanks in advance, Aditya On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja aditya.sakh...@gmail.comwrote: Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1. I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2. How do we go about restoring the index in a fresh solrcloud cluster ? From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote: Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar. -- Regards, -Aditya Sakhuja -- Regards, -Aditya Sakhuja -- Regards, -Aditya Sakhuja
Re: solrcloud shards backup/restoration
The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar.
Re: solrcloud shards backup/restoration
I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar.
Re: solrcloud shards backup/restoration
Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar.
Re: solrcloud shards backup/restoration
Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1. I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2. How do we go about restoring the index in a fresh solrcloud cluster ? From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote: Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar. -- Regards, -Aditya Sakhuja
Re: solrcloud shards backup/restoration
I wouldn't say I love this idea, but wouldn't it be safe to LVM snapshot the Solr index? I think this may even work on a live server, depending on some file I/O details. Has anyone tried this? An in-Solr solution sounds more elegant, but considering the tlog concern Shalin mentioned, I think this may work as an interim solution. Cheers! Tim On 6 September 2013 15:41, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1. I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2. How do we go about restoring the index in a fresh solrcloud cluster ? From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote: Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar. -- Regards, -Aditya Sakhuja
solrcloud shards backup/restoration
Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja