[jira] [Commented] (SOLR-7408) Race condition can cause a config directory listener to be removed
[ https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501152#comment-14501152 ] Shai Erera commented on SOLR-7408: -- There are two ctors which aren't called by Solr code. Is SolrCore considered public API, and so we should keep these ctors, or can I remove them? Race condition can cause a config directory listener to be removed -- Key: SOLR-7408 URL: https://issues.apache.org/jira/browse/SOLR-7408 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Shai Erera Assignee: Shai Erera Attachments: SOLR-7408.patch, SOLR-7408.patch This has been reported here: http://markmail.org/message/ynkm2axkdprppgef, and I was able to reproduce it in a test, although I am only able to reproduce if I put break points and manually simulate the problematic context switches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7408) Race condition can cause a config directory listener to be removed
[ https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499217#comment-14499217 ] Shai Erera commented on SOLR-7408: -- bq. Though I think I understand what you're saying here, can you elaborate more on this? If we wanted to change the code such that we put a listener in the map on a SolrCore creation, and remove it from the map on a SolrCore close, I believe we wouldn't be running into such concurrency issues. In a sense, this is what is done when all is *good*: a SolrCore puts a listener in its ctor, and removes it in its close(). But if something goes *wrong*, we may leave dangling listeners, of SolrCore instances that no longer exist. This is what I believe ({{CoreAdminHandler.handleCreateAction}} attempts to do -- if a core creation failed, it attempts to unregister all listeners of a configDir from the map, and lets {{unregister}} decide if the entry itself can be removed or not. This ensures that we won't be left w/ dangling listeners that will never be released - what I referred to as leaking listeners. The code in {{unregister}} relies on the same logic that introduces the bug -- if there is core in SolrCores which references this configDir, remove all listeners. The problem is that a core registers a listener, before it is put in SolrCores, and hence the race condition. I would personally prefer that we stop removing all listeners, and let a core take care of itself, but I don't know how safe is Solr code in that regard. I.e. are all places that create a SolrCore clean up after it in the event of a failure? Clearly {{CoreAdminHandler.handleCreateAction}} doesn't, which got me thinking what other places don't do that as well. But, if we want to change the logic like that, we can certainly look at all the places that do {{new SolrCore(...)}} and make sure they call {{SolrCore.close()}} in the event of any failure. Race condition can cause a config directory listener to be removed -- Key: SOLR-7408 URL: https://issues.apache.org/jira/browse/SOLR-7408 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Shai Erera Assignee: Shai Erera Attachments: SOLR-7408.patch, SOLR-7408.patch This has been reported here: http://markmail.org/message/ynkm2axkdprppgef, and I was able to reproduce it in a test, although I am only able to reproduce if I put break points and manually simulate the problematic context switches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7408) Race condition can cause a config directory listener to be removed
[ https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499394#comment-14499394 ] Anshum Gupta commented on SOLR-7408: I like the idea of calling {{SolrCore.close()}} and letting that handle the responsibility of unregistering (already happens). Does it make more sense to have this in a different JIRA or at least change the title/summary of this one to highlight the new/updated intention? Race condition can cause a config directory listener to be removed -- Key: SOLR-7408 URL: https://issues.apache.org/jira/browse/SOLR-7408 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Shai Erera Assignee: Shai Erera Attachments: SOLR-7408.patch, SOLR-7408.patch This has been reported here: http://markmail.org/message/ynkm2axkdprppgef, and I was able to reproduce it in a test, although I am only able to reproduce if I put break points and manually simulate the problematic context switches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7408) Race condition can cause a config directory listener to be removed
[ https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499354#comment-14499354 ] Noble Paul commented on SOLR-7408: -- looks good to me Race condition can cause a config directory listener to be removed -- Key: SOLR-7408 URL: https://issues.apache.org/jira/browse/SOLR-7408 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Shai Erera Assignee: Shai Erera Attachments: SOLR-7408.patch, SOLR-7408.patch This has been reported here: http://markmail.org/message/ynkm2axkdprppgef, and I was able to reproduce it in a test, although I am only able to reproduce if I put break points and manually simulate the problematic context switches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7408) Race condition can cause a config directory listener to be removed
[ https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499636#comment-14499636 ] Shai Erera commented on SOLR-7408: -- I will update the title of this JIRA and handle it here. I like this better than doing what I consider more of a hack to the code and later change it. SolrCore is initialized in two places, so shouldn't be complicated to ensure it is closed in case of errors. While I'm at it, I'll try to simplify the ctor by breaking it out to some auxiliary methods, instead of having a 250 lines method! Race condition can cause a config directory listener to be removed -- Key: SOLR-7408 URL: https://issues.apache.org/jira/browse/SOLR-7408 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Shai Erera Assignee: Shai Erera Attachments: SOLR-7408.patch, SOLR-7408.patch This has been reported here: http://markmail.org/message/ynkm2axkdprppgef, and I was able to reproduce it in a test, although I am only able to reproduce if I put break points and manually simulate the problematic context switches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7408) Race condition can cause a config directory listener to be removed
[ https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499956#comment-14499956 ] Anshum Gupta commented on SOLR-7408: +1 to both of those! Race condition can cause a config directory listener to be removed -- Key: SOLR-7408 URL: https://issues.apache.org/jira/browse/SOLR-7408 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Shai Erera Assignee: Shai Erera Attachments: SOLR-7408.patch, SOLR-7408.patch This has been reported here: http://markmail.org/message/ynkm2axkdprppgef, and I was able to reproduce it in a test, although I am only able to reproduce if I put break points and manually simulate the problematic context switches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7408) Race condition can cause a config directory listener to be removed
[ https://issues.apache.org/jira/browse/SOLR-7408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498933#comment-14498933 ] Anshum Gupta commented on SOLR-7408: Thanks Shai. The patch looks good to me in what it does. Though I think I understand what you're saying here, can you elaborate more on this: bq. However, it seems like we call unregister even in the event of errors (CoreAdminHandler.handleCreateAction), forcing the removal of the configDir listener, and it's not clear if we also call SolrCore.close(). So we may start leaking listeners that will never be removed .. Race condition can cause a config directory listener to be removed -- Key: SOLR-7408 URL: https://issues.apache.org/jira/browse/SOLR-7408 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Shai Erera Assignee: Shai Erera Attachments: SOLR-7408.patch, SOLR-7408.patch This has been reported here: http://markmail.org/message/ynkm2axkdprppgef, and I was able to reproduce it in a test, although I am only able to reproduce if I put break points and manually simulate the problematic context switches. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org