[jira] [Commented] (SOLR-11900) API command to delete oldest collections in a time routed alias
[ https://issues.apache.org/jira/browse/SOLR-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344455#comment-16344455 ] David Smiley commented on SOLR-11900: - I was chatting with [~gus_heck] and we figured the need for this isn't very compelling (in either form above). Instead, if the user wants to delete old collections explicitly, they could do these commands themselves (update the alias, delete the collections). Collection deletion could even be enhanced to detect its a part of an alias and auto-remove itself, which would make it easier and would eliminate a race condition of the target collection list getting updated at the same time more collections get added (however unlikely). And after SOLR-11925, the user could also temporarily adjust whatever metadata setting that establishes the automatic collection deletion time span, assuming that new data is coming in to trigger the logic. So I'll stop this now and re-use most of the code here in SOLR-11925 which needs most of the same stuff. > API command to delete oldest collections in a time routed alias > --- > > Key: SOLR-11900 > URL: https://issues.apache.org/jira/browse/SOLR-11900 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11900.patch > > > For Time Routed Aliases, we'll need an API command to delete the oldest > collection(s). Perhaps the command action name is > DELETE_COLLECTION_OF_ROUTED_ALIAS (yes that's long). And input is of course > the routed alias name, plus a mandatory "before" which is a standard time > input that Solr accepts that will likely include date math. Thus if you used > before="NOW/DAY-90DAYS" then your guaranteed to have the last 90 days worth > of data. If a collection overlaps past what "before" is computed to be then > it needs to stay. The pattern might match any number of collections, perhaps > none. But in all cases, the most recent collection must be retained -- the > time routed aliases must at all times refer to at least one collection. > The underlying steps will be to first update the alias, and then delete the > collection(s). It ought to return the collections that get deleted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11900) API command to delete oldest collections in a time routed alias
[ https://issues.apache.org/jira/browse/SOLR-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16344109#comment-16344109 ] David Smiley commented on SOLR-11900: - The attached patch uses the idea above, and is mostly done. The main thing left is to add alias metadata flag to control this, defaulting to false. Suggested: "deleteQueryDeletesCollections". I'm not sure wether to also pass-through the delete query as a normal query as well... there are distinctions in the timezone since a NOW/MONTH for this code I added will use the TZ from the alias metadata but the delete query against Solr will use the TZ parameter sent in the update request. (P.S. I believe there is another issue about tlog replay not serializing the update request params). So that's not nice. Maybe I'm stubbornly latching onto this idea and I ought to instead make yet another conventional SolrCloud collections API request. DELETEROUTEDALIASCOLLECTION? Ugh. It'd be interesting to see what happens if the incoming delete request is flowing into the oldest collection. It will try to delete itself. Does that work? I'm guessing it would, albeit with a timeout error. If it doesn't; is it a big deal? I don't think so since an incoming request to the alias will always route to the first collection ("soonest"), and this one is not delete-able by this code. > API command to delete oldest collections in a time routed alias > --- > > Key: SOLR-11900 > URL: https://issues.apache.org/jira/browse/SOLR-11900 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.3 > > Attachments: SOLR-11900.patch > > > For Time Routed Aliases, we'll need an API command to delete the oldest > collection(s). Perhaps the command action name is > DELETE_COLLECTION_OF_ROUTED_ALIAS (yes that's long). And input is of course > the routed alias name, plus a mandatory "before" which is a standard time > input that Solr accepts that will likely include date math. Thus if you used > before="NOW/DAY-90DAYS" then your guaranteed to have the last 90 days worth > of data. If a collection overlaps past what "before" is computed to be then > it needs to stay. The pattern might match any number of collections, perhaps > none. But in all cases, the most recent collection must be retained -- the > time routed aliases must at all times refer to at least one collection. > The underlying steps will be to first update the alias, and then delete the > collection(s). It ought to return the collections that get deleted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11900) API command to delete oldest collections in a time routed alias
[ https://issues.apache.org/jira/browse/SOLR-11900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341937#comment-16341937 ] David Smiley commented on SOLR-11900: - Perhaps in fact we don't actually need a new API but instead have a delete query that looks like this {{timeRoutedField:[* TO NOW/MONTH]}} auto-purge the old collections. We've already got the URP in place to intercept and act. Arguably if new data creates collections, telling it to delete old stuff should delete the old collections. Regardless of how this feature looks, there will be a separate issue to auto-delete. The issue here is about being explicit about it. > API command to delete oldest collections in a time routed alias > --- > > Key: SOLR-11900 > URL: https://issues.apache.org/jira/browse/SOLR-11900 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.3 > > > For Time Routed Aliases, we'll need an API command to delete the oldest > collection(s). Perhaps the command action name is > DELETE_COLLECTION_OF_ROUTED_ALIAS (yes that's long). And input is of course > the routed alias name, plus a mandatory "before" which is a standard time > input that Solr accepts that will likely include date math. Thus if you used > before="NOW/DAY-90DAYS" then your guaranteed to have the last 90 days worth > of data. If a collection overlaps past what "before" is computed to be then > it needs to stay. The pattern might match any number of collections, perhaps > none. But in all cases, the most recent collection must be retained -- the > time routed aliases must at all times refer to at least one collection. > The underlying steps will be to first update the alias, and then delete the > collection(s). It ought to return the collections that get deleted. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org