[jira] [Comment Edited] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS
[ https://issues.apache.org/jira/browse/SOLR-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527505#comment-16527505 ] Jan Høydahl edited comment on SOLR-12523 at 6/29/18 11:57 AM: -- Tested the patch, and it passes precommit. Here's the new error text from the API when attempting a backup across two nodes that do NOT share the backup drive. The same error will also be logged in the logs on both nodes. Will commit soon. {noformat} { "responseHeader": { "status": 500, "QTime": 135 }, "failure": { "10.5.0.5:8983_solr": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://10.5.0.5:8983/solr: Failed to backup core=coll2_shard1_replica_n2 because org.apache.solr.common.SolrException: Directory to contain snapshots doesn't exist: file:///back/myback2. Backup folder must already exist. Note also that Backup/Restore of a SolrCloud collection requires a shared file system mounted at the same path on all nodes!" }, "Operation backup caused exception:": "org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not backup all replicas", "exception": { "msg": "Could not backup all replicas", "rspCode": 500 }, "error": { "metadata": [ "error-class", "org.apache.solr.common.SolrException", "root-error-class", "org.apache.solr.common.SolrException" ], "msg": "Could not backup all replicas", "trace": "org.apache.solr.common.SolrException: Could not backup all replicas\n\tat org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat [...snip...] } }{noformat} One unrelated observation here. Part of the error response says: *Could not backup all replicas*. While we might call any core a "replica", it would perhaps in the context of a collection backup be more precise to say *Could not backup all shards*? [~markrmil...@gmail.com] it's your code line :) was (Author: janhoy): Tested the patch, and it passes precommit. Here's the new error text from the API when attempting a backup across two nodes that do NOT share the backup drive. The same error will also be logged in the logs on both nodes. Will commit soon. {noformat} { "responseHeader": { "status": 500, "QTime": 135 }, "failure": { "10.5.0.5:8983_solr": "org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error from server at http://10.5.0.5:8983/solr: Failed to backup core=coll2_shard1_replica_n2 because org.apache.solr.common.SolrException: Directory to contain snapshots doesn't exist: file:///back/myback2. Backup folder must already exist. Note also that Backup/Restore of a SolrCloud collection requires a shared file system mounted at the same path on all nodes!" }, "Operation backup caused exception:": "org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not backup all replicas", "exception": { "msg": "Could not backup all replicas", "rspCode": 500 }, "error": { "metadata": [ "error-class", "org.apache.solr.common.SolrException", "root-error-class", "org.apache.solr.common.SolrException" ], "msg": "Could not backup all replicas", "trace": "org.apache.solr.common.SolrException: Could not backup all replicas\n\tat org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat [...snip...] } }{noformat} One unrelated observation here. Part of the error response says: *Could not backup all replicas*. While we might call any core a "replica", it would perhaps in the context of a collection backup be more precise to say *Could not backup all shards*? > Confusing error reporting if backup attempted on non-shared FS > -- > > Key: SOLR-12523 > URL: https://issues.apache.org/jira/browse/SOLR-12523 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 7.3.1 >Reporter: Timothy Potter >Assignee: Jan Høydahl >Priority: Minor > Fix For: master (8.0), 7.5 > > Attachments: SOLR-12523.patch > > > So I have a large collection with 4 shards across 2 nodes. When I try to back > it up with: > {code} > curl > "http://localhost:8984/solr/admin/collections?action=BACKUP=sigs=foo_signals=5=backups; > {code} > I either get: > {code} > "5170256188349065":{ > "responseHeader":{ > "status":0, > "QTime":0}, > "STATUS":"failed", > "Response":"Failed to backup core=foo_signals_shard1_replica_n2 because > org.apache.solr.common.SolrException: Directory to contain snapshots doesn't > exist: file:///vol1/cloud84/backups/sigs"}, > "5170256187999044":{ > "responseHeader":{ > "status":0, > "QTime":0}, > "STATUS":"failed", > "Response":"Failed to backup
[jira] [Comment Edited] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS
[ https://issues.apache.org/jira/browse/SOLR-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527029#comment-16527029 ] Hrishikesh Gadre edited comment on SOLR-12523 at 6/29/18 1:51 AM: -- {quote}So for me, separating the concerns of creating the snapshot for each shard (Solr's job) and moving big files out to cloud storage (Solr needs to do much better in this regard or punt) is what I'm looking for. {quote} [~thelabdude] this is the exact use case for which we added snapshots mechanism (Ref: SOLR-9038). As part of Cloudera Search, we use this functionality to provide backup and disaster recovery functionality for Solr, [https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/] When user creates a snapshot, Solr associates user specified snapshot name with the latest commit point for each core associated with the given collection. Once the snapshot is created, Solr ensures that the files associated with the commit point associated with the snapshot name are not deleted (e.g. as part of optimize operation). It also records the snapshot metadata in Zookeeper and provides access to it via Collections API. Now you are free to use any mechanism to copy these index files to remote location (e.g. in our case we use DistCp - a tool specifically designed for large scale data copy which also works well with cloud object stores). I agree with your point about slow restore operation. May be we can extend the snapshot API to restore in-place ? e.g. create index.xxx directory automatically and copy the files. Once this is done, we can just switch the index directory on-the-fly (just the way we do at the time of full replication as part of core recovery). was (Author: hgadre): {quote}So for me, separating the concerns of creating the snapshot for each shard (Solr's job) and moving big files out to cloud storage (Solr needs to do much better in this regard or punt) is what I'm looking for. {quote} [~thelabdude] this is the exact use case for which we added snapshots mechanism (Ref: SOLR-9038). As part of Cloudera Search, we use this functionality to provide backup and disaster recovery functionality for Solr, [https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/] When user creates a snapshot, Solr associates user specified snapshot name with the latest commit point for each core associated with the given collection. Once the snapshot is created, Solr ensures that the files associated with the commit point associated with the snapshot name are not deleted (e.g. as part of optimize operation). It also records the snapshot metadata in Zookeeper and provides access to it via Collections API. Now you are free to use any mechanism to copy these index files to remote location (e.g. in our case we use DistCp - a tool specifically designed large scale data copy which also works well with cloud object stores). I agree with your point about slow restore operation. May be we can extend the snapshot API to restore in-place ? e.g. create index.xxx directory automatically and copy the files. Once this is done, we can just switch the index directory on-the-fly (just the way we do at the time of full replication as part of core recovery). > Confusing error reporting if backup attempted on non-shared FS > -- > > Key: SOLR-12523 > URL: https://issues.apache.org/jira/browse/SOLR-12523 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Backup/Restore >Affects Versions: 7.3.1 >Reporter: Timothy Potter >Assignee: Jan Høydahl >Priority: Minor > Fix For: master (8.0), 7.5 > > Attachments: SOLR-12523.patch > > > So I have a large collection with 4 shards across 2 nodes. When I try to back > it up with: > {code} > curl > "http://localhost:8984/solr/admin/collections?action=BACKUP=sigs=foo_signals=5=backups; > {code} > I either get: > {code} > "5170256188349065":{ > "responseHeader":{ > "status":0, > "QTime":0}, > "STATUS":"failed", > "Response":"Failed to backup core=foo_signals_shard1_replica_n2 because > org.apache.solr.common.SolrException: Directory to contain snapshots doesn't > exist: file:///vol1/cloud84/backups/sigs"}, > "5170256187999044":{ > "responseHeader":{ > "status":0, > "QTime":0}, > "STATUS":"failed", > "Response":"Failed to backup core=foo_signals_shard3_replica_n10 because > org.apache.solr.common.SolrException: Directory to contain snapshots doesn't > exist: file:///vol1/cloud84/backups/sigs"}, > {code} > or if I create the directory, then I get: > {code} > { >