[jira] [Comment Edited] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS

2018-06-29 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527505#comment-16527505
 ] 

Jan Høydahl edited comment on SOLR-12523 at 6/29/18 11:57 AM:
--

Tested the patch, and it passes precommit. Here's the new error text from the 
API when attempting a backup across two nodes that do NOT share the backup 
drive. The same error will also be logged in the logs on both nodes. Will 
commit soon.
{noformat}
{
"responseHeader": {
"status": 500, "QTime": 135
}, "failure": {
"10.5.0.5:8983_solr": 
"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error 
from server at http://10.5.0.5:8983/solr: Failed to backup 
core=coll2_shard1_replica_n2 because org.apache.solr.common.SolrException: 
Directory to contain snapshots doesn't exist: file:///back/myback2. Backup 
folder must already exist. Note also that Backup/Restore of a SolrCloud 
collection requires a shared file system mounted at the same path on all nodes!"
}, "Operation backup caused exception:": 
"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not backup all replicas", "exception": {
"msg": "Could not backup all replicas", "rspCode": 500
}, "error": {
"metadata": [
"error-class", "org.apache.solr.common.SolrException", "root-error-class", 
"org.apache.solr.common.SolrException"
], "msg": "Could not backup all replicas", "trace": 
"org.apache.solr.common.SolrException: Could not backup all replicas\n\tat 
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
 
[...snip...]
}
}{noformat}
One unrelated observation here. Part of the error response says: *Could not 
backup all replicas*. While we might call any core a "replica", it would 
perhaps in the context of a collection backup be more precise to say *Could not 
backup all shards*? [~markrmil...@gmail.com] it's your code line :) 


was (Author: janhoy):
Tested the patch, and it passes precommit. Here's the new error text from the 
API when attempting a backup across two nodes that do NOT share the backup 
drive. The same error will also be logged in the logs on both nodes. Will 
commit soon.
{noformat}
{
"responseHeader": {
"status": 500, "QTime": 135
}, "failure": {
"10.5.0.5:8983_solr": 
"org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error 
from server at http://10.5.0.5:8983/solr: Failed to backup 
core=coll2_shard1_replica_n2 because org.apache.solr.common.SolrException: 
Directory to contain snapshots doesn't exist: file:///back/myback2. Backup 
folder must already exist. Note also that Backup/Restore of a SolrCloud 
collection requires a shared file system mounted at the same path on all nodes!"
}, "Operation backup caused exception:": 
"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not backup all replicas", "exception": {
"msg": "Could not backup all replicas", "rspCode": 500
}, "error": {
"metadata": [
"error-class", "org.apache.solr.common.SolrException", "root-error-class", 
"org.apache.solr.common.SolrException"
], "msg": "Could not backup all replicas", "trace": 
"org.apache.solr.common.SolrException: Could not backup all replicas\n\tat 
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
 
[...snip...]
}
}{noformat}
One unrelated observation here. Part of the error response says: *Could not 
backup all replicas*. While we might call any core a "replica", it would 
perhaps in the context of a collection backup be more precise to say *Could not 
backup all shards*?

> Confusing error reporting if backup attempted on non-shared FS
> --
>
> Key: SOLR-12523
> URL: https://issues.apache.org/jira/browse/SOLR-12523
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 7.3.1
>Reporter: Timothy Potter
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: master (8.0), 7.5
>
> Attachments: SOLR-12523.patch
>
>
> So I have a large collection with 4 shards across 2 nodes. When I try to back 
> it up with:
> {code}
> curl 
> "http://localhost:8984/solr/admin/collections?action=BACKUP=sigs=foo_signals=5=backups;
> {code}
> I either get:
> {code}
> "5170256188349065":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard1_replica_n2 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
>   "5170256187999044":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup 

[jira] [Comment Edited] (SOLR-12523) Confusing error reporting if backup attempted on non-shared FS

2018-06-28 Thread Hrishikesh Gadre (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527029#comment-16527029
 ] 

Hrishikesh Gadre edited comment on SOLR-12523 at 6/29/18 1:51 AM:
--

{quote}So for me, separating the concerns of creating the snapshot for each 
shard (Solr's job) and moving big files out to cloud storage (Solr needs to do 
much better in this regard or punt) is what I'm looking for.
{quote}
[~thelabdude] this is the exact use case for which we added snapshots mechanism 
(Ref: SOLR-9038). As part of Cloudera Search, we use this functionality to 
provide backup and disaster recovery functionality for Solr,

[https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/]

 

When user creates a snapshot, Solr associates user specified snapshot name with 
the latest commit point for each core associated with the given collection. 
Once the snapshot is created, Solr ensures that the files associated with the 
commit point associated with the snapshot name are not deleted (e.g. as part of 
optimize operation). It also records the snapshot metadata in Zookeeper and 
provides access to it via Collections API. Now you are free to use any 
mechanism to copy these index files to remote location (e.g. in our case we use 
DistCp - a tool specifically designed for large scale data copy which also 
works well with cloud object stores). I agree with your point about slow 
restore operation. May be we can extend the snapshot API to restore in-place ? 
e.g. create index.xxx directory automatically and copy the files. Once this is 
done, we can just switch the index directory on-the-fly (just the way we do at 
the time of full replication as part of core recovery). 

 

 

 


was (Author: hgadre):
{quote}So for me, separating the concerns of creating the snapshot for each 
shard (Solr's job) and moving big files out to cloud storage (Solr needs to do 
much better in this regard or punt) is what I'm looking for.
{quote}
[~thelabdude] this is the exact use case for which we added snapshots mechanism 
(Ref: SOLR-9038). As part of Cloudera Search, we use this functionality to 
provide backup and disaster recovery functionality for Solr,

[https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/]

 

When user creates a snapshot, Solr associates user specified snapshot name with 
the latest commit point for each core associated with the given collection. 
Once the snapshot is created, Solr ensures that the files associated with the 
commit point associated with the snapshot name are not deleted (e.g. as part of 
optimize operation). It also records the snapshot metadata in Zookeeper and 
provides access to it via Collections API. Now you are free to use any 
mechanism to copy these index files to remote location (e.g. in our case we use 
DistCp - a tool specifically designed large scale data copy which also works 
well with cloud object stores). I agree with your point about slow restore 
operation. May be we can extend the snapshot API to restore in-place ? e.g. 
create index.xxx directory automatically and copy the files. Once this is done, 
we can just switch the index directory on-the-fly (just the way we do at the 
time of full replication as part of core recovery). 

 

 

 

> Confusing error reporting if backup attempted on non-shared FS
> --
>
> Key: SOLR-12523
> URL: https://issues.apache.org/jira/browse/SOLR-12523
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Backup/Restore
>Affects Versions: 7.3.1
>Reporter: Timothy Potter
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: master (8.0), 7.5
>
> Attachments: SOLR-12523.patch
>
>
> So I have a large collection with 4 shards across 2 nodes. When I try to back 
> it up with:
> {code}
> curl 
> "http://localhost:8984/solr/admin/collections?action=BACKUP=sigs=foo_signals=5=backups;
> {code}
> I either get:
> {code}
> "5170256188349065":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard1_replica_n2 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
>   "5170256187999044":{
>     "responseHeader":{
>       "status":0,
>       "QTime":0},
>     "STATUS":"failed",
>     "Response":"Failed to backup core=foo_signals_shard3_replica_n10 because 
> org.apache.solr.common.SolrException: Directory to contain snapshots doesn't 
> exist: file:///vol1/cloud84/backups/sigs"},
> {code}
> or if I create the directory, then I get:
> {code}
> {
>