Re: SolrCloud Collection Backup - Solr 5.5.4
On 6/4/2018 5:36 AM, Greenhorn Techie wrote: 1. In the SolrCloud, as a single host can have information about multiple shards (either leader or replica), how does the backup API handle the underlying data copy? I presume it will simply copy the data across ALL the shards (both leader and replicas) for the specified collection. The Collections API backup would indeed work this way. I see this line of code in the patch for SOLR-5750: log.debug("Sent backup requests to all shard leaders for snapshotName={}", backupName); So it sounds like the leader replica will write the backup for each shard. 2. If I am invoking the backup command periodically to backup the data and then invoke restore command later (possibly due to cluster shutdown and create a fresh SolrCloud cluster), I presume I don't need to tinker with the hash values as long as the default settings have been used in both backup and restore situations? The Collections API restore capability creates a new collection from the backup. The backup includes information gathered from ZK. The restored collection should have all the same hash ranges found in the original collection. Thanks, Shawn
Re: SolrCloud Collection Backup - Solr 5.5.4
Thanks Shawn for your detailed reply. It has helped to better my understanding. Below is my summarised understanding. In a SolrCloud setup with version less than 6.1, there is no ‘elegant’ way of handling collection backups and restore. Instead, have to use the manual backup and restore APIs using replication handler. However, as these APIs were primarily designed for standalone Solr installations, we can only backup data stored on a single Solr host for a particular core. Hence, in order to get the complete collection data backed-up for a SolrCloud collection, backup API should be used for all the nodes belonging to the SolrCloud cluster and then manually backup ZooKeeper clusterstate, with possible tweaking needed to ensure hash value consistency. Few follow-up questions: 1. In the SolrCloud, as a single host can have information about multiple shards (either leader or replica), how does the backup API handle the underlying data copy? I presume it will simply copy the data across ALL the shards (both leader and replicas) for the specified collection. 2. If I am invoking the backup command periodically to backup the data and then invoke restore command later (possibly due to cluster shutdown and create a fresh SolrCloud cluster), I presume I don't need to tinker with the hash values as long as the default settings have been used in both backup and restore situations? Thanks On 2 June 2018 at 08:59:26, Shawn Heisey (apa...@elyograg.org) wrote: On 6/2/2018 1:50 AM, Shawn Heisey wrote: > If you provide a location parameter, it will write a new backup > directory in that location. > > https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups > > I verified that this parameter is in the 5.5 docs too, I would suggest > you download that version in PDF format if you want a full reference. A followup: I suspect that if you try to use the restore functionality on the replication handler and have multiple shard replicas, that SolrCloud would not replicate things properly. I could be wrong about that, but I think that restoring from replication handler backups to SolrCloud could get a little messy. Thanks, Shawn
Re: SolrCloud Collection Backup - Solr 5.5.4
On 6/2/2018 1:50 AM, Shawn Heisey wrote: If you provide a location parameter, it will write a new backup directory in that location. https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups I verified that this parameter is in the 5.5 docs too, I would suggest you download that version in PDF format if you want a full reference. A followup: I suspect that if you try to use the restore functionality on the replication handler and have multiple shard replicas, that SolrCloud would not replicate things properly. I could be wrong about that, but I think that restoring from replication handler backups to SolrCloud could get a little messy. Thanks, Shawn
Re: SolrCloud Collection Backup - Solr 5.5.4
On 6/1/2018 7:23 AM, Greenhorn Techie wrote: > We are running SolrCloud with version 5.5.4. As I understand, Solr > Collection Backup and Restore API are only supported from version 6 > onwards. So wondering what is the best mechanism to get our collections > backed-up on older Solr version. That functionality was added in 6.1. https://issues.apache.org/jira/browse/SOLR-5750 > When I ran backup command on a particular node (curl > http://localhost:8983/solr/gettingstarted/replication?command=backup) it > seems it only creates a snapshot for the collection data stored on that > particular node. Does that mean, if I run this command for every node > hosting my SolrCloud collection, I will be getting the required backup? > Will this backup the metadata as well from ZK? I presume not. If you provide a location parameter, it will write a new backup directory in that location. https://lucene.apache.org/solr/guide/6_6/making-and-restoring-backups.html#standalone-mode-backups I verified that this parameter is in the 5.5 docs too, I would suggest you download that version in PDF format if you want a full reference. It would probably be a good idea to create a separate directory for each core that you work on. If the backup is done on all the right cores, you will get all the index data, but you will have no info from ZK. If the collection has more than one shard and uses the compositeId router, then you will need the info frpom the collection's clusterstate aabout hash shard ranges, and those would have to be verified and possibly adjusted on the new collection before you started putting the data back in. If the new collection uses different hash ranges than the one you backed up, then the restored collection would not function correctly. > If so, what > are the best possible approaches to get the same. Is there something made > available by Solr for the same? If you can do it, upgrading to the latest 6.x or 7.x version would be a good idea, to have full SolrCloud backup and restore functionality. -- You asked me some questions via IRC when I wasn't around, then were logged off by the time I got back to IRC. I don't know when you might come back online there. Here's some info for those questions: The reason that 'ant server' isn't working is that you're at the top level of the source. It should work if you change to the solr directory first. Similar to what you've encountered, I can't get eclipse to work properly when using a downloaded 6.6.2 source package (solr-6.6.2-src.tgz). But if I use these commands instead, then import into eclipse, it works: git clone https://git-wip-us.apache.org/repos/asf/lucene-solr.git cd lucene-solr git checkout refs/tags/releases/lucene-solr/6.6.2 ant clean clean-jars clean-eclipse eclipse The clean targets are not strictly necessary with a fresh clone, but that works even when the tree isn't fresh. I've never had very good luck with the downloadable source packages. Some of the build system functionality *only* works when the source is obtained with git, so I prefer that. Thanks, Shawn
SolrCloud Collection Backup - Solr 5.5.4
Hi, We are running SolrCloud with version 5.5.4. As I understand, Solr Collection Backup and Restore API are only supported from version 6 onwards. So wondering what is the best mechanism to get our collections backed-up on older Solr version. When I ran backup command on a particular node (curl http://localhost:8983/solr/gettingstarted/replication?command=backup) it seems it only creates a snapshot for the collection data stored on that particular node. Does that mean, if I run this command for every node hosting my SolrCloud collection, I will be getting the required backup? Will this backup the metadata as well from ZK? I presume not. If so, what are the best possible approaches to get the same. Is there something made available by Solr for the same? Thanks