[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss

2019-08-14 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907812#comment-16907812
 ] 

Ishan Chattopadhyaya commented on SOLR-13695:
-

Seems like there are other factors at play here which caused data loss. 
SPLITSHARD was actually issued with method=rewrite, but failed immediately (the 
speed with which the SPLITSHARD completed fooled me to believe it is 
method=link). However, the status of SPLITSHARD was 0 (success?), and after a 
subsequent DELETESHARD, there were no documents.  So, actually, the SPLITSHARD 
itself caused the data loss, not the DELETESHARD.

I'll close this issue and re-open another one for this problem.

> SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
> ---
>
> Key: SOLR-13695
> URL: https://issues.apache.org/jira/browse/SOLR-13695
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Critical
>
> One of my clients experienced data loss with the following sequence of 
> operations:
> 1) SPLITSHARD with method as "link".
> 2) DELETESHARD of the parent (inactive) shard.
> 3) Query for documents in the subshards, seems like both subshards have 0 
> documents.
> Proposing a fix (after offline discussion with [~noble.paul]) based on 
> running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and 
> not letting DELETESHARD delete the data directory until the FORCEMERGE 
> operations finish.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss

2019-08-14 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907581#comment-16907581
 ] 

Yonik Seeley commented on SOLR-13695:
-

Was the SPLITSHARD asynchronous?  I'm wondering if maybe the DELETESHARD 
happened before the SPLITSHARD completed.

> SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
> ---
>
> Key: SOLR-13695
> URL: https://issues.apache.org/jira/browse/SOLR-13695
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Critical
>
> One of my clients experienced data loss with the following sequence of 
> operations:
> 1) SPLITSHARD with method as "link".
> 2) DELETESHARD of the parent (inactive) shard.
> 3) Query for documents in the subshards, seems like both subshards have 0 
> documents.
> Proposing a fix (after offline discussion with [~noble.paul]) based on 
> running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and 
> not letting DELETESHARD delete the data directory until the FORCEMERGE 
> operations finish.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss

2019-08-14 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907432#comment-16907432
 ] 

Ishan Chattopadhyaya commented on SOLR-13695:
-

Happened on 7.7.1. I don't have the logs now, but I can try to reproduce on 
master or 8.2 and attach logs.

> SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
> ---
>
> Key: SOLR-13695
> URL: https://issues.apache.org/jira/browse/SOLR-13695
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Critical
>
> One of my clients experienced data loss with the following sequence of 
> operations:
> 1) SPLITSHARD with method as "link".
> 2) DELETESHARD of the parent (inactive) shard.
> 3) Query for documents in the subshards, seems like both subshards have 0 
> documents.
> Proposing a fix (after offline discussion with [~noble.paul]) based on 
> running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and 
> not letting DELETESHARD delete the data directory until the FORCEMERGE 
> operations finish.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss

2019-08-14 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907376#comment-16907376
 ] 

Andrzej Bialecki  commented on SOLR-13695:
--

Also, what version of Solr is this? Any pertinent logs?

> SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
> ---
>
> Key: SOLR-13695
> URL: https://issues.apache.org/jira/browse/SOLR-13695
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Critical
>
> One of my clients experienced data loss with the following sequence of 
> operations:
> 1) SPLITSHARD with method as "link".
> 2) DELETESHARD of the parent (inactive) shard.
> 3) Query for documents in the subshards, seems like both subshards have 0 
> documents.
> Proposing a fix (after offline discussion with [~noble.paul]) based on 
> running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and 
> not letting DELETESHARD delete the data directory until the FORCEMERGE 
> operations finish.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13695) SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss

2019-08-14 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907370#comment-16907370
 ] 

Andrzej Bialecki  commented on SOLR-13695:
--

Theoretically this should not happen... the index files of the sub-shards are 
hard-linked to the original shard BUT they are located in a different directory 
so deleting the parent shard should simply delete those directory entries 
(decrementing the number of existing links to the FS inodes).

I'll try to reproduce this. The proposed fix is a temporary workaround at best 
because it defeats the whole point of {{splitMethod=link}}, which is to avoid 
rewriting segments.

> SPLITSHARD (link), followed by DELETESHARD of parent shard causes data loss
> ---
>
> Key: SOLR-13695
> URL: https://issues.apache.org/jira/browse/SOLR-13695
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Critical
>
> One of my clients experienced data loss with the following sequence of 
> operations:
> 1) SPLITSHARD with method as "link".
> 2) DELETESHARD of the parent (inactive) shard.
> 3) Query for documents in the subshards, seems like both subshards have 0 
> documents.
> Proposing a fix (after offline discussion with [~noble.paul]) based on 
> running FORCEMERGE after SPLITSHARD (such that segments are rewritten), and 
> not letting DELETESHARD delete the data directory until the FORCEMERGE 
> operations finish.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org