subject:"\[jira\] \[Commented\] \(SOLR\-11487\) Collection Alias metadata for time partitioned collections"

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2018-04-12 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435662#comment-16435662
 ] 

Gus Heck commented on SOLR-11487:
-

{quote}Decomposing aliases.json has pros/cons, but it won't remove the 
possibility of races between modifying some portion of the aliases state; it 
just makes it more rare. So we still need to deal with races in code using 
a zkVersion with a retry and eventual timeout, etc.
{quote}
The context of the above comment was a discussion where I was suggesting an 
alternate storage for the alias metadata (properties). I was proposing a 
different style of storing the data (in zk nodes, not as part of aliases.json). 
This idea was deemed too costly in terms of "bookkeeping" and dropped. Existing 
behavior was retained, and nowhere was there an intent in this ticket to change 
the existing behavior with respect to updating aliases. Before and after this 
ticket, an update to an alias should have detected prior update to any part of 
aliases.json (version change), reloaded state and retried. There is a limit on 
the retry loop.

So it depends on what your user means by "running into a race condition..."
 * If writing one alias lost an update to a *different* alias, that would be a 
bug.
 * If two updates to the same alias are racing, that's a "feature." and is 
documented the first paragraph of the ref guide docs for CREATEALIAS.
 * If the retries were exceeded and the second update eventually returns an 
error, that's a feature, and something is causing a LOT of churn in 
aliases.json. The retry limit is set high to accommodate a fairly aggressive 
unit test I wrote. See 
org.apache.solr.common.cloud.ZkStateReader.AliasesManager#applyModificationAndExportToZk
 for details.

Much cleanup was done in this code so if there was a bug with one alias 
interfering with another it may well have been eliminated but this should be 
tested since it was not the focus of our work. 

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: 7.2
>
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2018-04-11 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434467#comment-16434467
 ] 

Varun Thacker commented on SOLR-11487:
--

Hi David,

I've added the fix version as 7.2 for this Jira for reference

 

We recently had a user who ran into a race condition when updating aliases on a 
Solr 5.x . Looking at master today it looks like we're dealing with race 
conditions and this comment confirms that we fixed it as part of this Jira 

 
{quote}Decomposing aliases.json has pros/cons, but it won't remove the 
possibility of races between modifying some portion of the aliases state; it 
just makes it more rare. So we still need to deal with races in code using 
a zkVersion with a retry and eventual timeout, etc.
{quote}
 

 

 

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: 7.2
>
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257676#comment-16257676
 ] 

ASF subversion and git services commented on SOLR-11487:


Commit 7c64847d80e1d6025822d991598711cba5ace123 in lucene-solr's branch 
refs/heads/branch_7x from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7c64847 ]

SOLR-11487: Put back sleep(100) in CreateAliasCmd.
Update AliasIntegrationTest with some sleeps and use new alias names where
possible to avoid eventual consistency challenges.

(cherry picked from commit 51b2dea)


> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257672#comment-16257672
 ] 

ASF subversion and git services commented on SOLR-11487:


Commit 51b2dea68e291141e2bfb98a2e07420a6b5869b2 in lucene-solr's branch 
refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=51b2dea ]

SOLR-11487: Put back sleep(100) in CreateAliasCmd.
Update AliasIntegrationTest with some sleeps and use new alias names where
possible to avoid eventual consistency challenges.


> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255784#comment-16255784
 ] 

ASF subversion and git services commented on SOLR-11487:


Commit 6d9f6cda1a0fa6a48e36a153f69a8aa2cfcd943f in lucene-solr's branch 
refs/heads/branch_7x from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6d9f6cd ]

SOLR-11487: Collection Aliases may now have metadata

(cherry picked from commit fd1820a)


> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-16 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255782#comment-16255782
 ] 

ASF subversion and git services commented on SOLR-11487:


Commit fd1820a430c321e6a2b2910004d7d2be60d3db4a in lucene-solr's branch 
refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=fd1820a ]

SOLR-11487: Collection Aliases may now have metadata


> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-16 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16255683#comment-16255683
 ] 

Gus Heck commented on SOLR-11487:
-

* Bug: heh, I almost wrote a test for that too... I clearly should have. Thx
* Love the elimination of the top map. definitely cleaner. 
* Less Clone... yes you've done a nice job of actually cashing in on our 
immutability there. That's a very logical thing to do.

Other stuff good too... Map.replaceAll()... cool! :).

All looks good to me. Does look cleaner overall. One other really nice benefit 
here is by eliminating the top map we eliminated almost all the unchecked cast 
stuff, only 2 methods need it now, and the class level @SuppressWarnings can go 
away I think.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-14 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252963#comment-16252963
 ] 

David Smiley commented on SOLR-11487:
-

This looks really good, I'll commit this with some very small tweaks tomorrow.

BTW I don't think you get how to use Map.computeIfAbsent (as seen in Aliases 
constructor).  The idea is to simply return the new value from the lambda -- no 
need to actually try to map.put(...) it; it's the job of the code behind 
computeIfAbsent to handle that.  The upshot is less code and can be more 
efficient as well if the Map impl natively implements it (most do).

I noticed Aliases.cloneCollectionMap does not actually do a deep clone, despite 
its caller saying it shares nothing.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-14 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252802#comment-16252802
 ] 

Gus Heck commented on SOLR-11487:
-

All good points, many of them bits that didn't get cleaned up as related code 
made them obsolete...
.
* *computeIfAbsent()* - used it in the constructor, elsewhere we don't want to 
modify the map we are querying.
* *getZNodeVersion* - (/) 
* *resolveAliases* - (/) That check was previously needed because a conversion 
to map of list from map of comma separated string and the method doing the 
conversion didn't like nulls. Now that that has gone away this can to. 
* *Unmodifiable lists* - This thought had crossed my mind but I had a vague 
worry that making them unmodifiable from the start might cause issues and 
wanted to get the patch up so I didn't investigate, but I've found no support 
for my worries. Lists are now unmodifiable from the start. I very much prefer 
that they not be modifiable. If we are going to be immutable we should really 
be immutable so as not to trick someone later, be they internal or external.
* *EMPTY_MAP* - (/) Yup, now we can go back to Collections.emptyMap() :)
* *cloneCollectionMetadataMap* - (/)
* *Array.equals()* - whoops, forgot to do that, thx.
* *sleep(1)* - that was my compromise such that the loop wasn't flaming hot, 
just very toasty. I took it out and now stop the loop with a different message 
if we have tried at least 5 times and failed vs timing out. 

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-14 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16252748#comment-16252748
 ] 

Gus Heck commented on SOLR-11487:
-

* compute if absent: used it in the constructor, elsewhere we don't want to 
modify the map we are querying.
* getZNodeVersion (/) 
* resolveAliases - That check was previously needed because a conversion to map 
of list from map of comma separated string and the method doing the conversion 
didn't like nulls. Now that that has gone away this can to. (removed)
* unmodifiable lists: This thought had crossed my mind but I had a vague worry 
that making them unmodifiable from the start might cause issues and wanted to 
get the patch up so I didn't invesigate, but I've found no support for my 
worries. Lists are now unmodifiable from the start. I very much prefer that 
they not be modifiable. If we are going to be immutable we should really be 
immutable so as not to trick someone later, be they internal or external.
* EMPTY_MAP (/)
* cloneCollectionMetadataMap: (/)
* 

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-13 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250254#comment-16250254
 ] 

David Smiley commented on SOLR-11487:
-

Looks like we're finally super close!

Aliases:
* good -- all my feedback below is little stuff
* you can use aliasMap.computeIfAbset to reduce the LOC
* maybe use "getZNodeVersion" naming to be consistent with DocCollection
* resolveAliases: it has a null check here but I think it's not necessary; 
resolveAliasesGivenAliasMap handles this.
* getCollectionAliasListMap seems to replace it's content with unmodifiable 
Lists each time it's called which I think is bad.  I suggest not wrapping the 
List collection values in this method; instead we can do that on 
creation?  The outer unmodifiableMap call here is fine, however... again if 
we're going to do the unmodifiable list wrapping on creation, might as well do 
so for the map too?  BTW I'm fine with removing some or all of this immutable 
wrapping because this is internal code.  I tend to do this wrapping too but if 
it's painful (and it's painful here!) I'm not religious about it.
* I got what you were saying earlier in this issue about EMPTY_MAP but 
(thankfully) we're no longer doing instanceof equality with EMPTY_MAP so we can 
just not use EMPTY_MAP (directly) anymore; right?
* cloneCollectionMetadataMap: the outer recreation of the HashMap is pointless 
because you're ultimately overwriting the reference and replacing it.  You 
could keep the check for null but in that event, exit early with a new empty 
HashMap.

ZkStateReader:
* in IRC we spoke about removing the modification equality check in the loop of 
applyModificationAndExportToZk; did you change your mind?  I think it's fine 
either way FWIW.

bq. see comment/code in ZkStateReader around L1495.

Yeah, if we can't save in a few tries (ZK BadVersionException each time), it's 
hard to believe trying again will be successful.  Either timeout or fix # 
retries; I don't care. Why did you add the sleep?

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-09 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246858#comment-16246858
 ] 

David Smiley commented on SOLR-11487:
-

* The zkVersion int need not be volatile because it is only ever read/written 
from within synchronized block.  Any way, if you want to try to put it back in 
Aliases, that's fine.  I found it a bit annoying to have Aliases with zkVersion 
yet also find a way to set it despite Aliases immutability.  Nothing that we 
can't figure out but it was that trip-up that led me to the path of zkVersion 
decoupled from the Aliases class.
* I introduced a bug causing the AliasIntegrationTest.test() failure.  
ZkStateReader.createClusterStateWatchersAndUpdate should call refreshAliases 
with the field reference aliasesHolder instead of constructing a new instance.  
This took a while to figure out; DEBUG logging (with additional log statements 
and references to "this" to get the object ID) proved indispensable. I think 
this bug would never have happened if the AliasesManager did not implement 
Watcher but instead had a newWatcher() method to return an anonymous instance.
* At the end of CreateAliasCmd.call, I sadly think we need to put back the 
100ms wait (I added more commentary below):
{code}
// Give other nodes a bit of time to see these changes. Solr is eventually 
consistent, so we expect other Solr nodes
// and even CloudSolrClient (ZkClientClusterStateProvider) to eventually 
become aware of the change.
Thread.sleep(100);
{code}
If we remove it with this new change for metadata, we might add more test 
instability (and it's already on fire) or increase the likelihood that some 
real code out there won't work. The caller should sleep perhaps but that's also 
sad.  I've been ruminating on this a bit and may file an issue with more 
specific ideas.
* in CollectionsHandler, LISTALIASES_OP (~line 480) add this line:
{code}
zkStateReader.aliasesHolder.update(); // just in case there are changes being 
propagated through ZK
{code}

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-09 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245905#comment-16245905
 ] 

Gus Heck commented on SOLR-11487:
-

* Consolidation of the alias related stuff in ZkStateReader is nice.
* I think the getter would be consistent and play nicer with IDE auto complete, 
but as you say, its more of a taste/style issue, not a material issue.
* The use of UnaryOperator rather than Function of course 
makes good sense
* I suspect in this patch the int version should also be volatile, but I 
haven't looked carefully enough to see if we have sufficient monitor locking to 
make that unnecessary yet...
* I don't like moving the version out of the Aliases object. The version in zk 
that this instance was derived from is information about the Aliases object and 
therefore should be a property of the object. I like it much better as an 
immutable property on Aliases that is set directly upon creation, and can be 
made accessible from the Aliases object (don't recall if I provided a getter in 
my patch but it should probably be there to support folks who are working with 
aliases and some other data in zk so they can know if changes to aliases.json 
have occurred). Future modifications to the code could more easily get the 
version out of sync this way by failing to update the field in AliasManager 
whereas having it as required in the constructor enforces and communicates the 
need to track the version. 
* This patch places the burden of coordinating a set of changes on the caller 
of the API instead of handling it transparently. This is reflected by line 111 
in Test where you wrapped the previously independent clone operations in a 
single UnaryOperator, which basically redesigns the test such that it passes 
due to the special case in the test of consecutive invocations that are easily 
wrapped together. The present patch will require that UnaryOperation be used 
like a transaction wrapper whereas the previous patch tracked changes 
internally and then transparently re-applied them in the event of a conflict. 
This made series of changes transactional by default without any explicit 
coordination code on the caller's part, and thus somewhat fool proofed the 
usage of the API. If substantial logic is involved in calculating multiple 
pieces of metadata and/or a collection name and that logic that has to all be 
applied at the same time to ensure consistent information in zookeeper then ALL 
that logic has to be place inside the UnaryOperation. In the Prior patch it was 
sufficient to perform several clone operations and then exportToZk with no 
effect on the organization of the calling code. I feel this patch simplifies 
the current code by adding complexity to future code using the API.
* AliasIntegrationTest.test() seems to fail?

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-07 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242826#comment-16242826
 ] 

Gus Heck commented on SOLR-11487:
-

Added SOLR-11617 to track the creation of an API

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-07 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242637#comment-16242637
 ] 

Gus Heck commented on SOLR-11487:
-

{quote}Decomposing aliases.json has pros/cons, but it won't remove the 
possibility of races between modifying some portion of the aliases state; it 
just makes it more rare. So we still need to deal with races in code using 
a zkVersion with a retry and eventual timeout, etc.{quote}

If decomposed to the level of a metadata item (or whatever level at which we 
become ok with "last one wins") we do get rid of this because we are not trying 
to do optimistic concurrency, only ensure unrelated stuff is not wiped out. If 
no "unrelated stuff" is part of the update then this problem goes away.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-07 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242236#comment-16242236
 ] 

David Smiley commented on SOLR-11487:
-

Decomposing aliases.json has pros/cons, but it won't remove the possibility of 
races between modifying some portion of the aliases state; it just makes it 
more rare.  So we still need to deal with races in code using a zkVersion 
with a retry and eventual timeout, etc.

Good observation in "Map Conversion"; there is no perfect choice.  I suppose 
"complicated serialization" needn't be too bad? This would mean that a 
user-provided comma separated list might end up normalized according to the 
rules of StrUtils.splitSmart and reversed as StrUtils.join and that's probably 
okay.  So if a user strangely supplies "\f\o\o" it will see "foo" back.

I could take a stab at ZkStateReader loop/retry, and removing the 
Aliases.priorChange stuff.  I don't argue what you have now doesn't work, only 
that the particular arrangement is unclear.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-07 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242091#comment-16242091
 ] 

Gus Heck commented on SOLR-11487:
-

*Constructor* - Yeah that can be simplified. Much of the code directly accesses 
the field, so I try to make it impossible to observe invalid state, but I 
haven't covered the EMPTY_MAP case it seems. It might be that the null checks 
are not actually necessary if I have actually provided this guarantee up front. 

*Map Conversion* - This is a result of my not caching/duplicating state. At one 
point I began to have issues (test failures) due to the cached state getting 
out of sync, and rather than continue to try to maintain that duplicated state 
I opted to remove the duplication. My dislike for the possibility of repeated 
splitting of the list was why I originally changed things such that the main 
map contained a list. As you pointed out that complicates serialization if we 
are to maintain the existing comma separated format. So we wind up with one of 
these three things, none of which I like:

- Duplicated state
- Complicated serialization
- Repeated splitting of the comma separated list.

This sort of conundrum is more or less why I had previously suggested we do the 
metadata via zk nodes and don't expand the complexity of aliases.json... Now 
that everything else is working it should be more tractable to push the 
duplication/caching back in than it was to maintain it while things were 
evolving so I can do that if you like, but basically we have to pay for the 
fact that we are clumping this into a single json file somewhere.

*convertMap*  - ah yes good catch thx.
 
*priorChange* - The task of avoiding competition among unrelated nodes of 
aliases.json is complicated by the fact that the API allows several consecutive 
clones to be made before the result is given to zkStateReader.exportAllAliases 
(again, issues arising from to the "one big json" strategy). We could fix that 
in documentation, and/or set a package private flag that prevents further 
cloning until ZkStateReader has written the current changes... in that case we 
could possibly have a few fields that retained the previous change data as 
string data rather than a function closure. Not sure how fields containing 
strings and a flag is less hokey though, and the flag would technically break 
immutability.

Think of it this way: The state in aliasMap is "candidate" state, and the chain 
of Function calls is an immutable change history that can be applied to a new 
value read from zk if needed. 

*API* - Yeah I had attempted to raise this issue above, but confusingly 
conflated it with the possibility of collection metadata earlier, you responded 
to the latter in the negative, and I took it to mean negative vs the former. 
Sorry for the confusing question. This can certainly be added :)

*ZkStateReader* - These loops perform different tasks, there are two steps 
here. 
 - ensure the data we are sending includes the latest changes (exportAllAliases)
 - ensure (with timeout) that Zookeeper got the data we eventually decided to 
send. 

We do in fact call clone in the first loop via the Function closure, if needed. 
The one you see in exportAliasToZk is just the initial attempt.

*Field order* - yup, agree.

*over all*
I am increasingly feeling like there's a lot of complication here that derives 
from our attempts to provide zookeeper like guarantees and prevent competition 
within a single json file. Can you perhaps elaborate on the bookkeeping that 
worries you and [~noble.paul]? Is it really heavier than what we have here?

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-11-06 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240417#comment-16240417
 ] 

David Smiley commented on SOLR-11487:
-

Aliases
* constructor is confusing to me; you save aliasMap to a field and *then* swap 
with EMPTY_MAP if you can? Why not before? And maybe we can simply not bother 
with initializing the metadata map here... worry about that when we 
get/populate it. It'll probably simplify seeing that the get/set code for it in 
other methods will be correct since won't require assumptions about being 
initialized.
* in general... these changes invoke "convertMapOfCommaDelimitedToMapOfList" 
way more than before. It used to be only at Alias construction, now it's at 
every call to resolveAliases (!) and getCollectionAliasListMap.  Sorry but can 
we avoid that?  I think it's not a big deal to lazy split the value for the 
particular collection being requested, but doing so for all collections seems 
excessive to me.
* convertMapOfCommaDelimitedToMapOfList: you've converted this to Java 8 
streams which is fine.  However you've wrapped the result in new LinkedHashMap 
which actually doesn't retain the original order of the input since 
Collectors.toMap is going to use a HashMap inbetween.  toMap is overloaded with 
a Supplier; you can call that one suppling the LinkedHashMap.
* The introduced use of a field priorChange Function seems really hokey; I 
sorta see what you're doing with it but I think we need to go about this in 
some other way.  It feels like too much of a wart on Aliases. Maybe we can chat 
about this on IRC.

*From a Solr API perspective, it seems we've forgotten to expose the read/write 
of metadata; no?  (!)*  I feel badly I didn't recognize this earlier; it's 
obvious in retrospect.  When in Solr tests we can work directly with ZooKeeper 
and Solr's internals, it's easy to forget the need for a public API.

ZkStateReader
* exportAliasToZk computes a new Aliases instance at the first line 
(Aliases.cloneWithCollectionAlias)  before calling exportAllAlias then 
checkForAlias, both of which have loops to do their jobs with retries / 
re-checking.  It's hard for me to see how this is correct... shouldn't 
Aliases.cloneWithCollectionAlias be called _within_ a loop?
* nice use of aliasLock with wait & notifyAll
* minor: aliasWatcher & aliasLock fields should probably be adjacent to aliases.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch, 
> SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-30 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225618#comment-16225618
 ] 

David Smiley commented on SOLR-11487:
-

*Aliases.java*
* cloneAliases: you can probably reduce the LOC a lot here using Java 8 
streams; see https://stackoverflow.com/a/28288729/92186 for similar code
* AFAICT you've changed the storage of the value side of the map of the 
"collections" key to be a List instead of a String (comma delimited).  
But since this map is serialized to JSON and back and stored in ZooKeeper, 
wouldn't this affect compatibility with aliases.json in existing setups?  We 
definitely don't want to break things.  
* It's a shame that getCollectionAliasMap and getCollectionAliasListMap are no 
longer simple getters (albeit with unmodifiable wrappers). I don't think the 
extra code & cost is worth true immutability.  This is internal code.  As a 
compromise, perhaps instead you might consider creating a view that splits the 
comma on the fly only when it's value is asked for?  See Guava 
{{Maps.transformValues()}}.

*ZkStateReader*
* The TODO in "exportAliasToZk" is a real concern; we don't want race 
conditions between various alias modifications to cause some to be 
overridden/ignored.  I think there are two parts to this: firstly use ZK 
versions when reading/writing aliases.json so that we overwrite the version we 
are expecting. This will mean putting the version number in Aliases (see 
DocCollection.zknodeversion for the same idea) whenever we read the aliases.  
Also, if we do need to retry, that retry loop needs to incorporate fetching the 
aliases and doing the modification over again, rather than repeatedly trying in 
vain to save the serialized aliases from when it started.
* In AliasWatcher, the LOG.debug should use "{}" template to avoid possible 
expensive aliases.toString()
* I very much like that you're removing the hot loop and thinking about 
creating abstractions/mechanisms to make it nicer.  But I admit the change here 
with Observer & Observable seem very odd to me.  IMO weird that ZkStateReader 
is an Observer (Observable seems more intuitive -- it has state that can be 
observed) and likewise it's weird that a Watcher is Observable (wouldn't it 
fundamentally be an Observer?).  I think you're trying to chain some existing 
observable stuff which means the Watchers thus become Observable themselves... 
but I don't like the result even if it works.  Perhaps instead, you could have 
a Watcher subclass called ChainableWatcher, and thus remove the need for 
Observable/Observer altogether?  I don't know if it's that simple without 
trying to tackle it myself, but it's at least an idea.


> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-29 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224353#comment-16224353
 ] 

Gus Heck commented on SOLR-11487:
-

Attaching revised patch. 

Highlights:
* Immutability was restored
* After getting irritated with a number of issues caused by miss-matches 
between the Map version of the aliaes and 
Map notions of aliases that were being kept, including a 
need to clone, tweak and re-clone to get the list version back in sync, I nixed 
the duplicated state, and made the method providing the Map view 
construct that view on the fly. There is now only one canonical map, nothing to 
keep in sync, and it contains the List values. I could be talked into 
flip flopping to keep comma strings and construct lists on the fly, but at the 
moment it all passes (and still writes comma strings to zookeeper). I don't 
think it matters much which way we do it.
* I fixed immutability leakage with the Map version where 
the Map was immutable, but the list elements were still mutable, and thus could 
be added/removed producing state (reflected in the original Alias instance) out 
of sync with the Map version of aliases held in the 
"collections" key of the main aliasMap field. New code now provides Immutable 
map with immutable list values.
* TimeOut class removed, replaced with a OneTimeListener class that allows 
easy, temporary piggybacking on the AliasWatcher I extracted in the prior 
patch. This is coded in a way that if other watchers simply extend Observer 
they can also use this too. The net effect is rather than polling on a loop we 
wait on a countdown latch which is released after a timeout or if the watcher 
hears an update that passes a test (written by the implementor of the 
OneTimeListener). The listener is automatically removed on successful test or 
timeout. This means we don't evaluate the test condition any more times than 
aliases are updated. This could executions of the test condition if aliases 
were under heavy updates, but that seems like an oddball condition so this will 
probably result in many fewer tests. In any case it will be way less expensive 
than the hot loop.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-23 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216287#comment-16216287
 ] 

David Smiley commented on SOLR-11487:
-

Fantastic response to my code review by the way :-)

RE {{aliasMap}}: my suggestion was embarassing; of course the value side should 
now be what you have -- just Map!

bq.  Based on your earlier response we won't have a collections API call to set 
metadata

What's this in reference to?  But you've got me thinking... what if this was a 
collection-or-alias metadata thing?  That sounds pretty useful/cool from a 
user/conceptual standpoint.  From a code details standpoint... maybe this would 
be no change -- alias metadata goes in one place (aliases.json) whereas 
collections would theoretically have it in their state.json?  Any way I don't 
want to create extra work for hypothetical features that are not in scope.

RE ZkStateReader field: Naturally we need to save the data in ZK but that 
doesn't require Aliases.java to have the field and it hurts immutability (more 
on that in a sec).  Couldn't ZkStateReader make this happen (Law of Demeter 
perhaps?)?

RE immutability: I believe ZkStateReader is keeping the Aliases instance up to 
date via a ZK watcher... so if code doesn't hold a durable reference to Aliases 
(outside of ZkStateReader) then we're good?  DocCollection is immutable; I 
think it's consistent for Aliases to follow the same approach too; no?  I don't 
think we want to break with the trend here.  If it were mutable, the caller 
might not be sure when exactly the ZK interaction happens (hidden behind some 
innocent method call?).  I get this is a trade-off and you've articulated the 
other side well I think.

RE Collections.EMPTY_MAP:  Okay.

RE Aliases CRUD in ZkStateReader: I like it.

RE TimeOut: nice catch on finding the hot loop!  I recommend not copying 
TimeOut; just add some utility method if wanted.  Classes add more conceptual 
complexity than a static method for a case like this IMO.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-23 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216223#comment-16216223
 ] 

Gus Heck commented on SOLR-11487:
-

Thx for the review Dave. 

I'll start in on some of the fix-ups, here's some of the whys for what I did, I 
can probably be talked out of anything here, but this is what I was thinking... 
Let me know what you think.

*Fix List*
* Will add test for removal of alias case and fix it. 
* docs: yes of course good point :) 
* names: sure sounds fine :)
* I had initially started supporting lists and then decided to axe that until 
discussion. I will move back to  from .. 



*Warnings*: this gets difficult because we don't really *have* type safety 
anymore... 

We have a Map with two keys "collection" and "collection_metadata" 
the values for these two keys don't match. The former is Map and 
the later is a Map> String and Map are not 
convertible types so one use case or the other wont compile... unless you back 
off from type safety. To achieve type safety we either need to keep two 
separate maps or we need to be serializing an actual object hierarchy rather 
than collection classes.



*The ZkStateReader field* is necessary because we need to get the metadata back 
to zookeeper. Based on your earlier response we won't have a collections API 
call to set metadata, so we need to have a ZkStateReader somehow or nothing 
gets written to zookeeper. In the case of cloneWithCollectionAlias, yeah that 
can be eliminated there, good catch. I can add an overload for the signature 
you suggest as well for the case where both are to be updated at the same time, 
but WRT removing ZkStateReader entirely, see comments about immutability 
below... 



*Immutability* is nice of course, and great for things that are immutable, or 
only held for a short duration, but once you have a long held reference and the 
underlying data is actually mutable, it gets difficult to be sure nobody is 
retaining a reference a stale copy every time a change is made... A little 
digging reveals that CoreContainer has a ZkContainer which has a ZkController, 
which has a ZkStateReader, which therefore holds onto an immutable copy of 
Aliases for a difficult to determine time frame

The existing cloning/immutability scheme therefore worries me. It seems like it 
would make more sense for Aliases to function as a wrapper around the 
(fundamentally mutable) json in zookeeper. If we never want to know if there 
was a change after we get our initial copy, and we never give away a reference 
to the copy we got, and we never retain that copy after we make an update we 
could have immutable copies... hard to make those stick however. It might be 
that we want a snapshot at the start of a request that doesn't change for the 
duration request (I can imagine that getting funky fast), but the long held 
versions need to be mutable I think... maybe a mutable super class and an 
immutable subclass that throws UnsupportedOperation on mutators?



The *Collections.EMPTY_MAP* came up because I can't put anything into an empty 
map, and need to test if that's what we currently have or if we have a regular 
map that I can add stuff to. Collections.emptyMap() is not required to return 
the same instance, or any particular implementation class, so in order to test 
for it and not be subject to breakage in odd VM's or future versions (or past 
versions?) I have to use the single unique instance in Collections.EMPTY_MAP. 
I've grown slightly unsure as to whether that if block is still necessary, a 
possible hold over from early versions of this code, so I might give a shot at 
eliminating it and go back to Collections.emptyMap().



I *moved CRUD stuff* to ZkStateReader so I didn't have to duplicate it in 
Aliases to get metadata written back to zookeeper. Also it feels reasonable to 
have a ZK class doing the Zk CRUD rather than having that code live in a 
command class that grabs ZkClient from ZkStateReader and writes the data 
directly itself... (law of Demeter etc). This way, the command does command 
type stuff like identify the data to be written and validation to be sure we 
really do want to write the data and then hands the data off to the thing that 
knows how to work with zookeeper data so it can do the actual writing... 
Controller & service/DAO stuff. The present code seems like the 
controller/action in a web app firing up a JDBC connection directly...



*TimeOut*... initially I was going to copy it over verbatim since it was in 
core and core is not available in solrj (when I moved the CRUD to 
ZkStateReader) and then I realized it could be improved so I improved it. I 
think perhaps this timeout and the one in Core could be reconciled and moved to 
a commonly available location to facilitate re-use, but that seems like a

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-23 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215893#comment-16215893
 ] 

David Smiley commented on SOLR-11487:
-

BTW FWIW RE TimeOut... IMO it'd be more nice to have a static utility method 
named something like callWithRetry(long intervalSleep, long timeout, TimeUnit 
unit, Callable callable) throws Exception

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-23 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215845#comment-16215845
 ] 

David Smiley commented on SOLR-11487:
-

Thanks for the patch Gus!
* I think just String values is fine; it's like a Properties object.  Therefore 
you could change the type of the aliasMap field to what it was 
(Map,String,Map>?
* Why change Collections.emptyMap() to Collections.EMPTY_MAP ?  The latter 
results in Java unchecked assignment warnings.  Speaking of which, can you 
please address such warnings?
* getCollectionMetadataMap needs some docs.  Can result be null?
* setAliasMetadata:
** would you mind changing the "key" param name to "keyMetadata" and "metadata" 
param name to "valueMetadata" (or shorter "Meta" instead of "Metadata" if you 
choose)?  That would read clearer to me.
** setAliasMetadata doesn't have "collection" in its name.  Likewise the 
aliasMetadataField should be qualified as well.  Or... we stop pretending at 
the class level that we support other alias types, yet continue to read/write 
with the "collection" prefix in case we actually do add new types later?
** oh... hey this method makes Aliases not immutable anymore.  Maybe change 
this to be something like cloneWithCollectionAliasMetadata?  Or we cold make 
immutable again but I admit Immutability is a nice property here.
* cloneWithCollectionAlias seems off; I don't think it should be using 
zkStateReader.  I think this method now needs a metadata map parameter 
(optional).  Furthermore, if we remove an alias, remove the corresponding 
metadata too.

I see you moved some alias CRUD stuff to ZkStateReader.  Just curious; what 
drove that decision?

TimeOut: do you envision other uses of this utility; what in particular?

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch, SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-19 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211098#comment-16211098
 ] 

David Smiley commented on SOLR-11487:
-

The latter -- alias metadata.

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-19 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211026#comment-16211026
 ] 

Gus Heck commented on SOLR-11487:
-

Is this ticket meant to add a general collection metadata facility (with admin 
commands for users to add/remove metadata of their own), or just make the 
metadata available in code at the ZkStateReader.getAliases() level? 

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-16 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206772#comment-16206772
 ] 

Noble Paul commented on SOLR-11487:
---

I prefer the approach of adding a collection_metadata key in the current 
aliases.json approach. That means fewer nodes . Every extra node is extra 
bookkeeping

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-16 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206546#comment-16206546
 ] 

David Smiley commented on SOLR-11487:
-

Thanks for sharing your idea [~gus_heck].  This is an approach I didn't think 
of.  I'm concerned this is over-using ZooKeeper nodes for what could be a 
simple map instead.  It's not like the metadata on the collection (as 
associated with the alias) is going to change so often as to benefit from the 
ability to change some but not all of this metadata.  [~noble.paul] what do you 
think of this?

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
> Attachments: SOLR_11487.patch
>
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

2017-10-15 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205218#comment-16205218
 ] 

Gus Heck commented on SOLR-11487:
-

In zk nodes can have both values and children right? So the value of the node 
called aliases.json can remain the same json text, but it could also have a 
list of children corresponding to each member of the alias containing that 
metadata... yes some duplication there, but this would mean that any older 
clients reading the value from the node will still get what they expect... 
newer code could simply ignore the old json...

> Collection Alias metadata for time partitioned collections
> --
>
> Key: SOLR-11487
> URL: https://issues.apache.org/jira/browse/SOLR-11487
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: David Smiley
>
> SOLR-11299 outlines an approach to using a collection Alias to refer to a 
> series of collections of a time series. We'll need to store some metadata 
> about these time series collections, such as which field of the document 
> contains the timestamp to route on.
> The current {{/aliases.json}} is a Map with a key {{collection}} which is in 
> turn a Map of alias name strings to a comma delimited list of the collections.
> _If we change the comma delimited list to be another Map to hold the existing 
> list and more stuff, older CloudSolrClient (configured to talk to ZooKeeper) 
> will break_.  Although if it's configured with an HTTP Solr URL then it would 
> not break.  There's also some read/write hassle to worry about -- we may need 
> to continue to read an aliases.json in the older format.
> Alternatively, we could add a new map entry to aliases.json, say, 
> {{collection_metadata}} keyed by alias name?
> Perhaps another very different approach is to attach metadata to the 
> configset in use?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

[jira] [Commented] (SOLR-11487) Collection Alias metadata for time partitioned collections

29 matches

Site Navigation

Mail list logo

Footer information