[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435662#comment-16435662 ] Gus Heck commented on SOLR-11487: - {quote}Decomposing aliases.json has pros/cons, but it won't remove the possibility of races between modifying some portion of the aliases state; it just makes it more rare. So we still need to deal with races in code using a zkVersion with a retry and eventual timeout, etc. {quote} The context of the above comment was a discussion where I was suggesting an alternate storage for the alias metadata (properties). I was proposing a different style of storing the data (in zk nodes, not as part of aliases.json). This idea was deemed too costly in terms of "bookkeeping" and dropped. Existing behavior was retained, and nowhere was there an intent in this ticket to change the existing behavior with respect to updating aliases. Before and after this ticket, an update to an alias should have detected prior update to any part of aliases.json (version change), reloaded state and retried. There is a limit on the retry loop. So it depends on what your user means by "running into a race condition..." * If writing one alias lost an update to a *different* alias, that would be a bug. * If two updates to the same alias are racing, that's a "feature." and is documented the first paragraph of the ref guide docs for CREATEALIAS. * If the retries were exceeded and the second update eventually returns an error, that's a feature, and something is causing a LOT of churn in aliases.json. The retry limit is set high to accommodate a fairly aggressive unit test I wrote. See org.apache.solr.common.cloud.ZkStateReader.AliasesManager#applyModificationAndExportToZk for details. Much cleanup was done in this code so if there was a bug with one alias interfering with another it may well have been eliminated but this should be tested since it was not the focus of our work. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.2 > > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434467#comment-16434467 ] Varun Thacker commented on SOLR-11487: -- Hi David, I've added the fix version as 7.2 for this Jira for reference We recently had a user who ran into a race condition when updating aliases on a Solr 5.x . Looking at master today it looks like we're dealing with race conditions and this comment confirms that we fixed it as part of this Jira {quote}Decomposing aliases.json has pros/cons, but it won't remove the possibility of races between modifying some portion of the aliases state; it just makes it more rare. So we still need to deal with races in code using a zkVersion with a retry and eventual timeout, etc. {quote} > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Fix For: 7.2 > > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257676#comment-16257676 ] ASF subversion and git services commented on SOLR-11487: Commit 7c64847d80e1d6025822d991598711cba5ace123 in lucene-solr's branch refs/heads/branch_7x from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c64847 ] SOLR-11487: Put back sleep(100) in CreateAliasCmd. Update AliasIntegrationTest with some sleeps and use new alias names where possible to avoid eventual consistency challenges. (cherry picked from commit 51b2dea) > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257672#comment-16257672 ] ASF subversion and git services commented on SOLR-11487: Commit 51b2dea68e291141e2bfb98a2e07420a6b5869b2 in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=51b2dea ] SOLR-11487: Put back sleep(100) in CreateAliasCmd. Update AliasIntegrationTest with some sleeps and use new alias names where possible to avoid eventual consistency challenges. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255784#comment-16255784 ] ASF subversion and git services commented on SOLR-11487: Commit 6d9f6cda1a0fa6a48e36a153f69a8aa2cfcd943f in lucene-solr's branch refs/heads/branch_7x from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6d9f6cd ] SOLR-11487: Collection Aliases may now have metadata (cherry picked from commit fd1820a) > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255782#comment-16255782 ] ASF subversion and git services commented on SOLR-11487: Commit fd1820a430c321e6a2b2910004d7d2be60d3db4a in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fd1820a ] SOLR-11487: Collection Aliases may now have metadata > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255683#comment-16255683 ] Gus Heck commented on SOLR-11487: - * Bug: heh, I almost wrote a test for that too... I clearly should have. Thx * Love the elimination of the top map. definitely cleaner. * Less Clone... yes you've done a nice job of actually cashing in on our immutability there. That's a very logical thing to do. Other stuff good too... Map.replaceAll()... cool! :). All looks good to me. Does look cleaner overall. One other really nice benefit here is by eliminating the top map we eliminated almost all the unchecked cast stuff, only 2 methods need it now, and the class level @SuppressWarnings can go away I think. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252963#comment-16252963 ] David Smiley commented on SOLR-11487: - This looks really good, I'll commit this with some very small tweaks tomorrow. BTW I don't think you get how to use Map.computeIfAbsent (as seen in Aliases constructor). The idea is to simply return the new value from the lambda -- no need to actually try to map.put(...) it; it's the job of the code behind computeIfAbsent to handle that. The upshot is less code and can be more efficient as well if the Map impl natively implements it (most do). I noticed Aliases.cloneCollectionMap does not actually do a deep clone, despite its caller saying it shares nothing. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252802#comment-16252802 ] Gus Heck commented on SOLR-11487: - All good points, many of them bits that didn't get cleaned up as related code made them obsolete... . * *computeIfAbsent()* - used it in the constructor, elsewhere we don't want to modify the map we are querying. * *getZNodeVersion* - (/) * *resolveAliases* - (/) That check was previously needed because a conversion to map of list from map of comma separated string and the method doing the conversion didn't like nulls. Now that that has gone away this can to. * *Unmodifiable lists* - This thought had crossed my mind but I had a vague worry that making them unmodifiable from the start might cause issues and wanted to get the patch up so I didn't investigate, but I've found no support for my worries. Lists are now unmodifiable from the start. I very much prefer that they not be modifiable. If we are going to be immutable we should really be immutable so as not to trick someone later, be they internal or external. * *EMPTY_MAP* - (/) Yup, now we can go back to Collections.emptyMap() :) * *cloneCollectionMetadataMap* - (/) * *Array.equals()* - whoops, forgot to do that, thx. * *sleep(1)* - that was my compromise such that the loop wasn't flaming hot, just very toasty. I took it out and now stop the loop with a different message if we have tried at least 5 times and failed vs timing out. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252748#comment-16252748 ] Gus Heck commented on SOLR-11487: - * compute if absent: used it in the constructor, elsewhere we don't want to modify the map we are querying. * getZNodeVersion (/) * resolveAliases - That check was previously needed because a conversion to map of list from map of comma separated string and the method doing the conversion didn't like nulls. Now that that has gone away this can to. (removed) * unmodifiable lists: This thought had crossed my mind but I had a vague worry that making them unmodifiable from the start might cause issues and wanted to get the patch up so I didn't invesigate, but I've found no support for my worries. Lists are now unmodifiable from the start. I very much prefer that they not be modifiable. If we are going to be immutable we should really be immutable so as not to trick someone later, be they internal or external. * EMPTY_MAP (/) * cloneCollectionMetadataMap: (/) * > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250254#comment-16250254 ] David Smiley commented on SOLR-11487: - Looks like we're finally super close! Aliases: * good -- all my feedback below is little stuff * you can use aliasMap.computeIfAbset to reduce the LOC * maybe use "getZNodeVersion" naming to be consistent with DocCollection * resolveAliases: it has a null check here but I think it's not necessary; resolveAliasesGivenAliasMap handles this. * getCollectionAliasListMap seems to replace it's content with unmodifiable Lists each time it's called which I think is bad. I suggest not wrapping the List collection values in this method; instead we can do that on creation? The outer unmodifiableMap call here is fine, however... again if we're going to do the unmodifiable list wrapping on creation, might as well do so for the map too? BTW I'm fine with removing some or all of this immutable wrapping because this is internal code. I tend to do this wrapping too but if it's painful (and it's painful here!) I'm not religious about it. * I got what you were saying earlier in this issue about EMPTY_MAP but (thankfully) we're no longer doing instanceof equality with EMPTY_MAP so we can just not use EMPTY_MAP (directly) anymore; right? * cloneCollectionMetadataMap: the outer recreation of the HashMap is pointless because you're ultimately overwriting the reference and replacing it. You could keep the check for null but in that event, exit early with a new empty HashMap. ZkStateReader: * in IRC we spoke about removing the modification equality check in the loop of applyModificationAndExportToZk; did you change your mind? I think it's fine either way FWIW. bq. see comment/code in ZkStateReader around L1495. Yeah, if we can't save in a few tries (ZK BadVersionException each time), it's hard to believe trying again will be successful. Either timeout or fix # retries; I don't care. Why did you add the sleep? > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246858#comment-16246858 ] David Smiley commented on SOLR-11487: - * The zkVersion int need not be volatile because it is only ever read/written from within synchronized block. Any way, if you want to try to put it back in Aliases, that's fine. I found it a bit annoying to have Aliases with zkVersion yet also find a way to set it despite Aliases immutability. Nothing that we can't figure out but it was that trip-up that led me to the path of zkVersion decoupled from the Aliases class. * I introduced a bug causing the AliasIntegrationTest.test() failure. ZkStateReader.createClusterStateWatchersAndUpdate should call refreshAliases with the field reference aliasesHolder instead of constructing a new instance. This took a while to figure out; DEBUG logging (with additional log statements and references to "this" to get the object ID) proved indispensable. I think this bug would never have happened if the AliasesManager did not implement Watcher but instead had a newWatcher() method to return an anonymous instance. * At the end of CreateAliasCmd.call, I sadly think we need to put back the 100ms wait (I added more commentary below): {code} // Give other nodes a bit of time to see these changes. Solr is eventually consistent, so we expect other Solr nodes // and even CloudSolrClient (ZkClientClusterStateProvider) to eventually become aware of the change. Thread.sleep(100); {code} If we remove it with this new change for metadata, we might add more test instability (and it's already on fire) or increase the likelihood that some real code out there won't work. The caller should sleep perhaps but that's also sad. I've been ruminating on this a bit and may file an issue with more specific ideas. * in CollectionsHandler, LISTALIASES_OP (~line 480) add this line: {code} zkStateReader.aliasesHolder.update(); // just in case there are changes being propagated through ZK {code} > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245905#comment-16245905 ] Gus Heck commented on SOLR-11487: - * Consolidation of the alias related stuff in ZkStateReader is nice. * I think the getter would be consistent and play nicer with IDE auto complete, but as you say, its more of a taste/style issue, not a material issue. * The use of UnaryOperator rather than Functionof course makes good sense * I suspect in this patch the int version should also be volatile, but I haven't looked carefully enough to see if we have sufficient monitor locking to make that unnecessary yet... * I don't like moving the version out of the Aliases object. The version in zk that this instance was derived from is information about the Aliases object and therefore should be a property of the object. I like it much better as an immutable property on Aliases that is set directly upon creation, and can be made accessible from the Aliases object (don't recall if I provided a getter in my patch but it should probably be there to support folks who are working with aliases and some other data in zk so they can know if changes to aliases.json have occurred). Future modifications to the code could more easily get the version out of sync this way by failing to update the field in AliasManager whereas having it as required in the constructor enforces and communicates the need to track the version. * This patch places the burden of coordinating a set of changes on the caller of the API instead of handling it transparently. This is reflected by line 111 in Test where you wrapped the previously independent clone operations in a single UnaryOperator, which basically redesigns the test such that it passes due to the special case in the test of consecutive invocations that are easily wrapped together. The present patch will require that UnaryOperation be used like a transaction wrapper whereas the previous patch tracked changes internally and then transparently re-applied them in the event of a conflict. This made series of changes transactional by default without any explicit coordination code on the caller's part, and thus somewhat fool proofed the usage of the API. If substantial logic is involved in calculating multiple pieces of metadata and/or a collection name and that logic that has to all be applied at the same time to ensure consistent information in zookeeper then ALL that logic has to be place inside the UnaryOperation. In the Prior patch it was sufficient to perform several clone operations and then exportToZk with no effect on the organization of the calling code. I feel this patch simplifies the current code by adding complexity to future code using the API. * AliasIntegrationTest.test() seems to fail? > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242826#comment-16242826 ] Gus Heck commented on SOLR-11487: - Added SOLR-11617 to track the creation of an API > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242637#comment-16242637 ] Gus Heck commented on SOLR-11487: - {quote}Decomposing aliases.json has pros/cons, but it won't remove the possibility of races between modifying some portion of the aliases state; it just makes it more rare. So we still need to deal with races in code using a zkVersion with a retry and eventual timeout, etc.{quote} If decomposed to the level of a metadata item (or whatever level at which we become ok with "last one wins") we do get rid of this because we are not trying to do optimistic concurrency, only ensure unrelated stuff is not wiped out. If no "unrelated stuff" is part of the update then this problem goes away. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242236#comment-16242236 ] David Smiley commented on SOLR-11487: - Decomposing aliases.json has pros/cons, but it won't remove the possibility of races between modifying some portion of the aliases state; it just makes it more rare. So we still need to deal with races in code using a zkVersion with a retry and eventual timeout, etc. Good observation in "Map Conversion"; there is no perfect choice. I suppose "complicated serialization" needn't be too bad? This would mean that a user-provided comma separated list might end up normalized according to the rules of StrUtils.splitSmart and reversed as StrUtils.join and that's probably okay. So if a user strangely supplies "\f\o\o" it will see "foo" back. I could take a stab at ZkStateReader loop/retry, and removing the Aliases.priorChange stuff. I don't argue what you have now doesn't work, only that the particular arrangement is unclear. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242091#comment-16242091 ] Gus Heck commented on SOLR-11487: - *Constructor* - Yeah that can be simplified. Much of the code directly accesses the field, so I try to make it impossible to observe invalid state, but I haven't covered the EMPTY_MAP case it seems. It might be that the null checks are not actually necessary if I have actually provided this guarantee up front. *Map Conversion* - This is a result of my not caching/duplicating state. At one point I began to have issues (test failures) due to the cached state getting out of sync, and rather than continue to try to maintain that duplicated state I opted to remove the duplication. My dislike for the possibility of repeated splitting of the list was why I originally changed things such that the main map contained a list. As you pointed out that complicates serialization if we are to maintain the existing comma separated format. So we wind up with one of these three things, none of which I like: - Duplicated state - Complicated serialization - Repeated splitting of the comma separated list. This sort of conundrum is more or less why I had previously suggested we do the metadata via zk nodes and don't expand the complexity of aliases.json... Now that everything else is working it should be more tractable to push the duplication/caching back in than it was to maintain it while things were evolving so I can do that if you like, but basically we have to pay for the fact that we are clumping this into a single json file somewhere. *convertMap* - ah yes good catch thx. *priorChange* - The task of avoiding competition among unrelated nodes of aliases.json is complicated by the fact that the API allows several consecutive clones to be made before the result is given to zkStateReader.exportAllAliases (again, issues arising from to the "one big json" strategy). We could fix that in documentation, and/or set a package private flag that prevents further cloning until ZkStateReader has written the current changes... in that case we could possibly have a few fields that retained the previous change data as string data rather than a function closure. Not sure how fields containing strings and a flag is less hokey though, and the flag would technically break immutability. Think of it this way: The state in aliasMap is "candidate" state, and the chain of Function calls is an immutable change history that can be applied to a new value read from zk if needed. *API* - Yeah I had attempted to raise this issue above, but confusingly conflated it with the possibility of collection metadata earlier, you responded to the latter in the negative, and I took it to mean negative vs the former. Sorry for the confusing question. This can certainly be added :) *ZkStateReader* - These loops perform different tasks, there are two steps here. - ensure the data we are sending includes the latest changes (exportAllAliases) - ensure (with timeout) that Zookeeper got the data we eventually decided to send. We do in fact call clone in the first loop via the Function closure, if needed. The one you see in exportAliasToZk is just the initial attempt. *Field order* - yup, agree. *over all* I am increasingly feeling like there's a lot of complication here that derives from our attempts to provide zookeeper like guarantees and prevent competition within a single json file. Can you perhaps elaborate on the bookkeeping that worries you and [~noble.paul]? Is it really heavier than what we have here? > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240417#comment-16240417 ] David Smiley commented on SOLR-11487: - Aliases * constructor is confusing to me; you save aliasMap to a field and *then* swap with EMPTY_MAP if you can? Why not before? And maybe we can simply not bother with initializing the metadata map here... worry about that when we get/populate it. It'll probably simplify seeing that the get/set code for it in other methods will be correct since won't require assumptions about being initialized. * in general... these changes invoke "convertMapOfCommaDelimitedToMapOfList" way more than before. It used to be only at Alias construction, now it's at every call to resolveAliases (!) and getCollectionAliasListMap. Sorry but can we avoid that? I think it's not a big deal to lazy split the value for the particular collection being requested, but doing so for all collections seems excessive to me. * convertMapOfCommaDelimitedToMapOfList: you've converted this to Java 8 streams which is fine. However you've wrapped the result in new LinkedHashMap which actually doesn't retain the original order of the input since Collectors.toMap is going to use a HashMap inbetween. toMap is overloaded with a Supplier; you can call that one suppling the LinkedHashMap. * The introduced use of a field priorChange Function seems really hokey; I sorta see what you're doing with it but I think we need to go about this in some other way. It feels like too much of a wart on Aliases. Maybe we can chat about this on IRC. *From a Solr API perspective, it seems we've forgotten to expose the read/write of metadata; no? (!)* I feel badly I didn't recognize this earlier; it's obvious in retrospect. When in Solr tests we can work directly with ZooKeeper and Solr's internals, it's easy to forget the need for a public API. ZkStateReader * exportAliasToZk computes a new Aliases instance at the first line (Aliases.cloneWithCollectionAlias) before calling exportAllAlias then checkForAlias, both of which have loops to do their jobs with retries / re-checking. It's hard for me to see how this is correct... shouldn't Aliases.cloneWithCollectionAlias be called _within_ a loop? * nice use of aliasLock with wait & notifyAll * minor: aliasWatcher & aliasLock fields should probably be adjacent to aliases. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, > SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225618#comment-16225618 ] David Smiley commented on SOLR-11487: - *Aliases.java* * cloneAliases: you can probably reduce the LOC a lot here using Java 8 streams; see https://stackoverflow.com/a/28288729/92186 for similar code * AFAICT you've changed the storage of the value side of the map of the "collections" key to be a List instead of a String (comma delimited). But since this map is serialized to JSON and back and stored in ZooKeeper, wouldn't this affect compatibility with aliases.json in existing setups? We definitely don't want to break things. * It's a shame that getCollectionAliasMap and getCollectionAliasListMap are no longer simple getters (albeit with unmodifiable wrappers). I don't think the extra code & cost is worth true immutability. This is internal code. As a compromise, perhaps instead you might consider creating a view that splits the comma on the fly only when it's value is asked for? See Guava {{Maps.transformValues()}}. *ZkStateReader* * The TODO in "exportAliasToZk" is a real concern; we don't want race conditions between various alias modifications to cause some to be overridden/ignored. I think there are two parts to this: firstly use ZK versions when reading/writing aliases.json so that we overwrite the version we are expecting. This will mean putting the version number in Aliases (see DocCollection.zknodeversion for the same idea) whenever we read the aliases. Also, if we do need to retry, that retry loop needs to incorporate fetching the aliases and doing the modification over again, rather than repeatedly trying in vain to save the serialized aliases from when it started. * In AliasWatcher, the LOG.debug should use "{}" template to avoid possible expensive aliases.toString() * I very much like that you're removing the hot loop and thinking about creating abstractions/mechanisms to make it nicer. But I admit the change here with Observer & Observable seem very odd to me. IMO weird that ZkStateReader is an Observer (Observable seems more intuitive -- it has state that can be observed) and likewise it's weird that a Watcher is Observable (wouldn't it fundamentally be an Observer?). I think you're trying to chain some existing observable stuff which means the Watchers thus become Observable themselves... but I don't like the result even if it works. Perhaps instead, you could have a Watcher subclass called ChainableWatcher, and thus remove the need for Observable/Observer altogether? I don't know if it's that simple without trying to tackle it myself, but it's at least an idea. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224353#comment-16224353 ] Gus Heck commented on SOLR-11487: - Attaching revised patch. Highlights: * Immutability was restored * After getting irritated with a number of issues caused by miss-matches between the Mapversion of the aliaes and Map notions of aliases that were being kept, including a need to clone, tweak and re-clone to get the list version back in sync, I nixed the duplicated state, and made the method providing the Map view construct that view on the fly. There is now only one canonical map, nothing to keep in sync, and it contains the List values. I could be talked into flip flopping to keep comma strings and construct lists on the fly, but at the moment it all passes (and still writes comma strings to zookeeper). I don't think it matters much which way we do it. * I fixed immutability leakage with the Map version where the Map was immutable, but the list elements were still mutable, and thus could be added/removed producing state (reflected in the original Alias instance) out of sync with the Map version of aliases held in the "collections" key of the main aliasMap field. New code now provides Immutable map with immutable list values. * TimeOut class removed, replaced with a OneTimeListener class that allows easy, temporary piggybacking on the AliasWatcher I extracted in the prior patch. This is coded in a way that if other watchers simply extend Observer they can also use this too. The net effect is rather than polling on a loop we wait on a countdown latch which is released after a timeout or if the watcher hears an update that passes a test (written by the implementor of the OneTimeListener). The listener is automatically removed on successful test or timeout. This means we don't evaluate the test condition any more times than aliases are updated. This could executions of the test condition if aliases were under heavy updates, but that seems like an oddball condition so this will probably result in many fewer tests. In any case it will be way less expensive than the hot loop. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216287#comment-16216287 ] David Smiley commented on SOLR-11487: - Fantastic response to my code review by the way :-) RE {{aliasMap}}: my suggestion was embarassing; of course the value side should now be what you have -- just Map! bq. Based on your earlier response we won't have a collections API call to set metadata What's this in reference to? But you've got me thinking... what if this was a collection-or-alias metadata thing? That sounds pretty useful/cool from a user/conceptual standpoint. From a code details standpoint... maybe this would be no change -- alias metadata goes in one place (aliases.json) whereas collections would theoretically have it in their state.json? Any way I don't want to create extra work for hypothetical features that are not in scope. RE ZkStateReader field: Naturally we need to save the data in ZK but that doesn't require Aliases.java to have the field and it hurts immutability (more on that in a sec). Couldn't ZkStateReader make this happen (Law of Demeter perhaps?)? RE immutability: I believe ZkStateReader is keeping the Aliases instance up to date via a ZK watcher... so if code doesn't hold a durable reference to Aliases (outside of ZkStateReader) then we're good? DocCollection is immutable; I think it's consistent for Aliases to follow the same approach too; no? I don't think we want to break with the trend here. If it were mutable, the caller might not be sure when exactly the ZK interaction happens (hidden behind some innocent method call?). I get this is a trade-off and you've articulated the other side well I think. RE Collections.EMPTY_MAP: Okay. RE Aliases CRUD in ZkStateReader: I like it. RE TimeOut: nice catch on finding the hot loop! I recommend not copying TimeOut; just add some utility method if wanted. Classes add more conceptual complexity than a static method for a case like this IMO. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216223#comment-16216223 ] Gus Heck commented on SOLR-11487: - Thx for the review Dave. I'll start in on some of the fix-ups, here's some of the whys for what I did, I can probably be talked out of anything here, but this is what I was thinking... Let me know what you think. *Fix List* * Will add test for removal of alias case and fix it. * docs: yes of course good point :) * names: sure sounds fine :) * I had initially started supporting lists and then decided to axe that until discussion. I will move back tofrom .. *Warnings*: this gets difficult because we don't really *have* type safety anymore... We have a Map with two keys "collection" and "collection_metadata" the values for these two keys don't match. The former is Map and the later is a Map > String and Map are not convertible types so one use case or the other wont compile... unless you back off from type safety. To achieve type safety we either need to keep two separate maps or we need to be serializing an actual object hierarchy rather than collection classes. *The ZkStateReader field* is necessary because we need to get the metadata back to zookeeper. Based on your earlier response we won't have a collections API call to set metadata, so we need to have a ZkStateReader somehow or nothing gets written to zookeeper. In the case of cloneWithCollectionAlias, yeah that can be eliminated there, good catch. I can add an overload for the signature you suggest as well for the case where both are to be updated at the same time, but WRT removing ZkStateReader entirely, see comments about immutability below... *Immutability* is nice of course, and great for things that are immutable, or only held for a short duration, but once you have a long held reference and the underlying data is actually mutable, it gets difficult to be sure nobody is retaining a reference a stale copy every time a change is made... A little digging reveals that CoreContainer has a ZkContainer which has a ZkController, which has a ZkStateReader, which therefore holds onto an immutable copy of Aliases for a difficult to determine time frame The existing cloning/immutability scheme therefore worries me. It seems like it would make more sense for Aliases to function as a wrapper around the (fundamentally mutable) json in zookeeper. If we never want to know if there was a change after we get our initial copy, and we never give away a reference to the copy we got, and we never retain that copy after we make an update we could have immutable copies... hard to make those stick however. It might be that we want a snapshot at the start of a request that doesn't change for the duration request (I can imagine that getting funky fast), but the long held versions need to be mutable I think... maybe a mutable super class and an immutable subclass that throws UnsupportedOperation on mutators? The *Collections.EMPTY_MAP* came up because I can't put anything into an empty map, and need to test if that's what we currently have or if we have a regular map that I can add stuff to. Collections.emptyMap() is not required to return the same instance, or any particular implementation class, so in order to test for it and not be subject to breakage in odd VM's or future versions (or past versions?) I have to use the single unique instance in Collections.EMPTY_MAP. I've grown slightly unsure as to whether that if block is still necessary, a possible hold over from early versions of this code, so I might give a shot at eliminating it and go back to Collections.emptyMap(). I *moved CRUD stuff* to ZkStateReader so I didn't have to duplicate it in Aliases to get metadata written back to zookeeper. Also it feels reasonable to have a ZK class doing the Zk CRUD rather than having that code live in a command class that grabs ZkClient from ZkStateReader and writes the data directly itself... (law of Demeter etc). This way, the command does command type stuff like identify the data to be written and validation to be sure we really do want to write the data and then hands the data off to the thing that knows how to work with zookeeper data so it can do the actual writing... Controller & service/DAO stuff. The present code seems like the controller/action in a web app firing up a JDBC connection directly... *TimeOut*... initially I was going to copy it over verbatim since it was in core and core is not available in solrj (when I moved the CRUD to ZkStateReader) and then I realized it could be improved so I improved it. I think perhaps this timeout and the one in Core could be reconciled and moved to a commonly available location to facilitate re-use, but that seems like a
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215893#comment-16215893 ] David Smiley commented on SOLR-11487: - BTW FWIW RE TimeOut... IMO it'd be more nice to have a static utility method named something like callWithRetry(long intervalSleep, long timeout, TimeUnit unit, Callable callable) throws Exception > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215845#comment-16215845 ] David Smiley commented on SOLR-11487: - Thanks for the patch Gus! * I think just String values is fine; it's like a Properties object. Therefore you could change the type of the aliasMap field to what it was (Map,String,Map>? * Why change Collections.emptyMap() to Collections.EMPTY_MAP ? The latter results in Java unchecked assignment warnings. Speaking of which, can you please address such warnings? * getCollectionMetadataMap needs some docs. Can result be null? * setAliasMetadata: ** would you mind changing the "key" param name to "keyMetadata" and "metadata" param name to "valueMetadata" (or shorter "Meta" instead of "Metadata" if you choose)? That would read clearer to me. ** setAliasMetadata doesn't have "collection" in its name. Likewise the aliasMetadataField should be qualified as well. Or... we stop pretending at the class level that we support other alias types, yet continue to read/write with the "collection" prefix in case we actually do add new types later? ** oh... hey this method makes Aliases not immutable anymore. Maybe change this to be something like cloneWithCollectionAliasMetadata? Or we cold make immutable again but I admit Immutability is a nice property here. * cloneWithCollectionAlias seems off; I don't think it should be using zkStateReader. I think this method now needs a metadata map parameter (optional). Furthermore, if we remove an alias, remove the corresponding metadata too. I see you moved some alias CRUD stuff to ZkStateReader. Just curious; what drove that decision? TimeOut: do you envision other uses of this utility; what in particular? > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch, SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211098#comment-16211098 ] David Smiley commented on SOLR-11487: - The latter -- alias metadata. > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211026#comment-16211026 ] Gus Heck commented on SOLR-11487: - Is this ticket meant to add a general collection metadata facility (with admin commands for users to add/remove metadata of their own), or just make the metadata available in code at the ZkStateReader.getAliases() level? > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206772#comment-16206772 ] Noble Paul commented on SOLR-11487: --- I prefer the approach of adding a collection_metadata key in the current aliases.json approach. That means fewer nodes . Every extra node is extra bookkeeping > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206546#comment-16206546 ] David Smiley commented on SOLR-11487: - Thanks for sharing your idea [~gus_heck]. This is an approach I didn't think of. I'm concerned this is over-using ZooKeeper nodes for what could be a simple map instead. It's not like the metadata on the collection (as associated with the alias) is going to change so often as to benefit from the ability to change some but not all of this metadata. [~noble.paul] what do you think of this? > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > Attachments: SOLR_11487.patch > > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections
[ https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205218#comment-16205218 ] Gus Heck commented on SOLR-11487: - In zk nodes can have both values and children right? So the value of the node called aliases.json can remain the same json text, but it could also have a list of children corresponding to each member of the alias containing that metadata... yes some duplication there, but this would mean that any older clients reading the value from the node will still get what they expect... newer code could simply ignore the old json... > Collection Alias metadata for time partitioned collections > -- > > Key: SOLR-11487 > URL: https://issues.apache.org/jira/browse/SOLR-11487 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: David Smiley > > SOLR-11299 outlines an approach to using a collection Alias to refer to a > series of collections of a time series. We'll need to store some metadata > about these time series collections, such as which field of the document > contains the timestamp to route on. > The current {{/aliases.json}} is a Map with a key {{collection}} which is in > turn a Map of alias name strings to a comma delimited list of the collections. > _If we change the comma delimited list to be another Map to hold the existing > list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) > will break_. Although if it's configured with an HTTP Solr URL then it would > not break. There's also some read/write hassle to worry about -- we may need > to continue to read an aliases.json in the older format. > Alternatively, we could add a new map entry to aliases.json, say, > {{collection_metadata}} keyed by alias name? > Perhaps another very different approach is to attach metadata to the > configset in use? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org