[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
[ https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907812#comment-16907812 ] Ishan Chattopadhyaya commented on SOLR-13695: - Seems like there are other factors at play here which caused data loss. SPLITSHARD was actually issued with method=rewrite, but failed immediately (the speed with which the SPLITSHARD completed fooled me to believe it is method=link). However, the status of SPLITSHARD was 0 (success?), and after a subsequent DELETESHARD, there were no documents. So, actually, the SPLITSHARD itself caused the data loss, not the DELETESHARD. I'll close this issue and re-open another one for this problem. > SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss > --- > > Key: SOLR-13695 > URL: https://issues.apache.org/jira/browse/SOLR-13695 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Critical > > One of my clients experienced data loss with the following sequence of > operations: > 1) SPLITSHARD with method as "link". > 2) DELETESHARD of the parent (inactive) shard. > 3) Query for documents in the subshards, seems like both subshards have 0 > documents. > Proposing a fix (after offline discussion with [~noble.paul]) based on > running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and > not letting DELETESHARD delete the data directory until the FORCEMERGE > operations finish. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
[ https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907581#comment-16907581 ] Yonik Seeley commented on SOLR-13695: - Was the SPLITSHARD asynchronous? I'm wondering if maybe the DELETESHARD happened before the SPLITSHARD completed. > SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss > --- > > Key: SOLR-13695 > URL: https://issues.apache.org/jira/browse/SOLR-13695 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Critical > > One of my clients experienced data loss with the following sequence of > operations: > 1) SPLITSHARD with method as "link". > 2) DELETESHARD of the parent (inactive) shard. > 3) Query for documents in the subshards, seems like both subshards have 0 > documents. > Proposing a fix (after offline discussion with [~noble.paul]) based on > running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and > not letting DELETESHARD delete the data directory until the FORCEMERGE > operations finish. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
[ https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907432#comment-16907432 ] Ishan Chattopadhyaya commented on SOLR-13695: - Happened on 7.7.1. I don't have the logs now, but I can try to reproduce on master or 8.2 and attach logs. > SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss > --- > > Key: SOLR-13695 > URL: https://issues.apache.org/jira/browse/SOLR-13695 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Critical > > One of my clients experienced data loss with the following sequence of > operations: > 1) SPLITSHARD with method as "link". > 2) DELETESHARD of the parent (inactive) shard. > 3) Query for documents in the subshards, seems like both subshards have 0 > documents. > Proposing a fix (after offline discussion with [~noble.paul]) based on > running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and > not letting DELETESHARD delete the data directory until the FORCEMERGE > operations finish. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
[ https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907376#comment-16907376 ] Andrzej Bialecki commented on SOLR-13695: -- Also, what version of Solr is this? Any pertinent logs? > SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss > --- > > Key: SOLR-13695 > URL: https://issues.apache.org/jira/browse/SOLR-13695 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Critical > > One of my clients experienced data loss with the following sequence of > operations: > 1) SPLITSHARD with method as "link". > 2) DELETESHARD of the parent (inactive) shard. > 3) Query for documents in the subshards, seems like both subshards have 0 > documents. > Proposing a fix (after offline discussion with [~noble.paul]) based on > running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and > not letting DELETESHARD delete the data directory until the FORCEMERGE > operations finish. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
[ https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907370#comment-16907370 ] Andrzej Bialecki commented on SOLR-13695: -- Theoretically this should not happen... the index files of the sub-shards are hard-linked to the original shard BUT they are located in a different directory so deleting the parent shard should simply delete those directory entries (decrementing the number of existing links to the FS inodes). I'll try to reproduce this. The proposed fix is a temporary workaround at best because it defeats the whole point of {{splitMethod=link}}, which is to avoid rewriting segments. > SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss > --- > > Key: SOLR-13695 > URL: https://issues.apache.org/jira/browse/SOLR-13695 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Critical > > One of my clients experienced data loss with the following sequence of > operations: > 1) SPLITSHARD with method as "link". > 2) DELETESHARD of the parent (inactive) shard. > 3) Query for documents in the subshards, seems like both subshards have 0 > documents. > Proposing a fix (after offline discussion with [~noble.paul]) based on > running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and > not letting DELETESHARD delete the data directory until the FORCEMERGE > operations finish. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org