[jira] [Comment Edited] (SOLR-13439) Make collection properties easier and safer to use in code

2019-06-11 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861095#comment-16861095
 ] 

Gus Heck edited comment on SOLR-13439 at 6/11/19 2:29 PM:
--

Given that this has been well discussed and the existence of the api has 
basically been agreed upon, I'm moving this forward so I can build on it, and 
if we need to tweak something about the details of the implementation we can 
still do that. Will be pushing to 8x shortly.


was (Author: gus_heck):
Given that this has been well discussed and the existence of the api has 
basically been agreed upon, I'm moving this forward so I can build on it, and 
if we need to tweak something about the details of the implementation we can 
still do that.

> Make collection properties easier and safer to use in code
> --
>
> Key: SOLR-13439
> URL: https://issues.apache.org/jira/browse/SOLR-13439
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: master (9.0)
>Reporter: Gus Heck
>Assignee: Gus Heck
>Priority: Major
> Attachments: SOLR-13439.patch, SOLR-13439.patch, SOLR-13439.patch
>
>
> (breaking this out from SOLR-13420, please read there for further background)
> Before this patch the api is quite confusing (IMHO):
>  # any code that wanted to know what the properties for a collection are 
> could call zkStateReader.getCollectionProperties(collection) but this was a 
> dangerous and trappy API because that was a query to zookeeper every time. If 
> a naive user auto-completed that in their IDE without investigating, heavy 
> use of zookeeper would ensue.
>  # To "do it right" for any code that might get called on a per-doc or per 
> request basis one had to cause caching by registering a watcher. At which 
> point the getCollectionProperties(collection) magically becomes safe to use, 
> but the watcher pattern probably looks famillar induces a user who hasn't 
> read the solr code closely to create their own cache and update it when their 
> watcher is notified. If the caching side effect of watches isn't understood 
> this will lead to many in-memory copies of collection properties maintained 
> in user code.
>  # This also creates a task to be scheduled on a thread (PropsNotification) 
> and induces an extra thread-scheduling lag before the changes can be observed 
> by user code.
>  # The code that cares about collection properties needs to have a lifecycle 
> tied to either a collection or something other object with an even more 
> ephemeral life cycle such as an URP. The user now also has to remember to 
> ensure the watch is unregistered, or there is a leak.
> After this patch
>  # Calls to getCollectionProperties(collection) are always safe to use in any 
> code anywhere. Caching and cleanup are automatic.
>  # Code that really actually wants to know if a collection property changes 
> so it can wake up and do something (autoscaling?) still has the option of 
> registering a watcher that will asynchronously send them a notification.
>  # Updates can be observed sooner via getCollectionProperties with no need to 
> wait for a thread to run. (vs a cache held in user code)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-13439) Make collection properties easier and safer to use in code

2019-05-09 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836424#comment-16836424
 ] 

Gus Heck edited comment on SOLR-13439 at 5/9/19 2:39 PM:
-

{quote}Case 1 (Frequent access):
 1. T+0: SolrCore starts (i.e. A replica is added, a collection is created, you 
name it)
 2. T+0: On initialization, a component that relies on collection properties 
registers a listener. Solr reads from ZooKeeper once.
 3. T+0 to T+end-of-life-of-the-core: The watch remains.
 4. T+end-of-life-of-the-core: no more reads are expected, the watch is removed.
{quote}
This identical to my Case 1 if the core lives for 30 minutes?
{quote}Times read with current approach: 1 + Number of modifications of 
properties.
 Times with the cache approach: unknown, at least the same as with the 
listener, but depends on when the properties are accessed.
{quote}
Not true. One (or possibly zero reads if we luck into my case 3) on step 1 
(instead of step 2 as you list it)... Properties remain cached (with updates 
when they change in zk of course) until core is removed. This is the whole 
point of why I wrote ConditionaExpiringCache. While the watch is active, it 
will exist in collectionPropsWatches, and the predicate the cache is created 
with will evaluate to true and the timeout will be refreshed without removing 
the entry
{code:java}
if (condition.test(peek)) {
  // we passed our stay alive condition refresh our 
time-out and put back in the queue.
  peek.expireAt = System.nanoTime() + retainForNanos;
  timeSortedEntries.add(peek);
} else {
  // we are past our time limit, and have failed our 
condition to stay alive so remove from cache
  cache.remove(peek.key);
} {code}
I expect that the typo you caught and I corrected in the latest version may 
have thrown you off of realizing that the cache never expires while a watch is 
set. In that respect my patch is backwards compatible.
{quote}I can give you two cases:
 Case 1: Collection properties are accessed infrequently (like in my “case 2 
above”), but collection properties change frequently (i.e. every second)
 1. T + 0: call to getCollectionProperties(), Zk watch is set and element is on 
cache
 2. T + 1 to T + 9: Collection properties changes, fires watches to Solr. Solr 
receives the watch and reads from Zookeeper
 3. T + 10 cache expires
 With cache, we read from Zookeeper 10 times, and ZooKeeper fires 10 watches. 
Without cache, we read once, ZooKeeper doens't fire any watch. Keep in mind 
that some clusters may have many collections (hundreds/thousands?), this may 
add a lot of load to ZooKeeper for things that aren’t going to be needed.
{quote}
Ah yes, I was evaluating in terms of reads caused by user action. I have 
assumed (I state it in one of the comments in code I think) that collection 
properties are not frequently updated. What you say is true, but it's no worse 
than having a watch set. I am amenable to making the expiration time tunable 
via configuration if you think frequent updates to collection properties for an 
individual collection are a realistic use case. My assumption is that they are 
likely to be set at initialization, and perhaps changed when an admin or 
autoscalling wants to adjust something. If something is thrashing collection 
properties I think that's the problem and this would be a symptom. So to 
summarize this case:

*TLDR;*This patch does pose a risk for systems with large numbers of 
collections that also frequently update collection properties on many of those 
collections if that system was +not already using collectionPropertiesWatches+ 
(or only using them on a few collections) prior to this patch (i.e. most 
updates to properties are ignored).

That seems like a pretty narrow use case, since it implies that the majority of 
updates are going un-read unless they are making frequent calls to 
getcollectionProperties() and perhaps should have been setting a watch in the 
first place... This case (which I thought of but discarded as very unlikely) 
could be mitigated by shortening the cache expiration time and the cache 
eviction frequency interval here, or making them configurable.
{code:java}
  // ten minutes -- possibly make this configurable
  private static final long CACHE_COLLECTION_PROPS_FOR_NANOS = 10L * 60 * 1000 
* 1000 * 1000;

  // one minute -- possibly make this configurable
  public static final int CACHE_COLLECTION_PROPS_REAPER_INTERVAL = 6;
{code}
Before we do that however, are we really meaning to support use cases where 
people thrash values we store in zookeeper? Or is this an anti-pattern like 
individual requests for updates rather than batching? My gut says the later, 
but maybe I'm out in left field on that?
{quote}Case 2: A component 

[jira] [Comment Edited] (SOLR-13439) Make collection properties easier and safer to use in code

2019-05-09 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836424#comment-16836424
 ] 

Gus Heck edited comment on SOLR-13439 at 5/9/19 2:36 PM:
-

{quote}Case 1 (Frequent access):
 1. T+0: SolrCore starts (i.e. A replica is added, a collection is created, you 
name it)
 2. T+0: On initialization, a component that relies on collection properties 
registers a listener. Solr reads from ZooKeeper once.
 3. T+0 to T+end-of-life-of-the-core: The watch remains.
 4. T+end-of-life-of-the-core: no more reads are expected, the watch is removed.
{quote}
This identical to my Case 1 if the core lives for 30 minutes?
{quote}Times read with current approach: 1 + Number of modifications of 
properties.
 Times with the cache approach: unknown, at least the same as with the 
listener, but depends on when the properties are accessed.
{quote}
Not true. One (or possibly zero reads if we luck into my case 3) on step 1 
(instead of step 2 as you list it)... Properties remain cached (with updates 
when they change in zk of course) until core is removed. This is the whole 
point of why I wrote ConditionaExpiringCache. While the watch is active, it 
will exist in collectionPropsWatches, and the predicate the cache is created 
with will evaluate to true and the timeout will be refreshed without removing 
the entry
{code:java}
if (condition.test(peek)) {
  // we passed our stay alive condition refresh our 
time-out and put back in the queue.
  peek.expireAt = System.nanoTime() + retainForNanos;
  timeSortedEntries.add(peek);
} else {
  // we are past our time limit, and have failed our 
condition to stay alive so remove from cache
  cache.remove(peek.key);
} {code}
I expect that the typo you caught and I corrected in the latest version may 
have thrown you off of realizing that the cache never expires while a watch is 
set. In that respect my patch is backwards compatible.
{quote}I can give you two cases:
 Case 1: Collection properties are accessed infrequently (like in my “case 2 
above”), but collection properties change frequently (i.e. every second)
 1. T + 0: call to getCollectionProperties(), Zk watch is set and element is on 
cache
 2. T + 1 to T + 9: Collection properties changes, fires watches to Solr. Solr 
receives the watch and reads from Zookeeper
 3. T + 10 cache expires
 With cache, we read from Zookeeper 10 times, and ZooKeeper fires 10 watches. 
Without cache, we read once, ZooKeeper doens't fire any watch. Keep in mind 
that some clusters may have many collections (hundreds/thousands?), this may 
add a lot of load to ZooKeeper for things that aren’t going to be needed.
{quote}
Ah yes, I was evaluating in terms of reads caused by user action. I have 
assumed (I state it in one of the comments in code I think) that collection 
properties are not frequently updated. What you say is true, but it's no worse 
than having a watch set. I am amenable to making the expiration time tunable 
via configuration if you think frequent updates to collection properties for an 
individual collection are a realistic use case. My assumption is that they are 
likely to be set at initialization, and perhaps changed when an admin or 
autoscalling wants to adjust something. If something is thrashing collection 
properties I think that's the problem and this would be a symptom. So to 
summarize this case:

*TLDR;*This patch does pose a risk for systems with large numbers of 
collections that also frequently update collection properties on many of those 
collections if that system was +not already using collectionPropertiesWatches+ 
(or only using them on a few collections) prior to this patch (i.e. most 
updates to properties are ignored).

That seems like a pretty narrow use case, since it implies that the majority of 
updates are going un-read unless they are making frequent calls to 
getcollectionProperties() and perhaps should have been setting a watch in the 
first place... This case (which I thought of but discarded as very unlikely) 
could be mitigated by shortening the cache expiration time and the cache 
eviction frequency interval here, or making them configurable.
{code:java}
  // ten minutes -- possibly make this configurable
  private static final long CACHE_COLLECTION_PROPS_FOR_NANOS = 10L * 60 * 1000 
* 1000 * 1000;

  // one minute -- possibly make this configurable
  public static final int CACHE_COLLECTION_PROPS_REAPER_INTERVAL = 6;
{code}
Before we do that however, are we really meaning to support use cases where 
people thrash values we store in zookeeper? Or is this an anti-pattern like 
individual requests for updates rather than batching? My gut says the later, 
but maybe I'm out in left field on that?
{quote}Case 2: A component 

[jira] [Comment Edited] (SOLR-13439) Make collection properties easier and safer to use in code

2019-05-08 Thread Gus Heck (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835757#comment-16835757
 ] 

Gus Heck edited comment on SOLR-13439 at 5/8/19 10:23 PM:
--

Need was probably the wrong word... want :) would have been more correct. I 
prefer that since then it can be a single line and we don't have to do the 
cache lookup for an object that we are already handling.

As for consistency/frequency of access let's consider some cases (time in 
minutes):
 # Case 1:
 ## T+0 A watch is set, cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 2:
 ## T+0 A watch is set, cache is populated from zk
 ## T+15 properties are accessed with getCollectionProperties() 
 ## T+30 the watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 3:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+5 a watch is set, cache is already populated, zk is not accessed
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case the data access frequency for zk is once instead of twice. 
This LESS than current.
 # Case 4:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+10 the cache expires
 ## T+20 a watch is set, cache is cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed twice. This is UNCHANGED vs current. 
 # Case 5:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+1 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ## T+2 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ## T+12 the cache expires
 ## T+30 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+40 the cache expires
 ## In this case zookeeper is accessed twice instead of four times This is LESS 
than current.

I will grant you that it's hard to predict that when the load on zookeeper will 
be less, but in no case will it be more, so unless you mean to stress test 
zookeeper this is not much of a problem. If I've missed a case let me know.

The existing code (without this patch) isn't really consistent either. Code 
that calls getCollectionProperties() is either fast or slow depending on 
whether or not someone else has set a watch on that collection. I think the 
current inconsistency is worse because something that was fast  due to the 
action of unrelated code (getCollectionProperties()) can become surprisingly 
slow, whereas after my patch, setting the watch may become surprisingly fast 
due to the effect of unrelated code.

Edit: re-reading that last sentence I realize now that one could flip either 
one around but I guess what I meant to express is that the delta is potentially 
much larger in the present code since it would be almost inconceivable to 
create a watch on a per-doc basis, but one could easily write code checking 
collection properties on a per-doc or per request basis.


was (Author: gus_heck):
Need was probably the wrong word... want :) would have been more correct. I 
prefer that since then it can be a single line and we don't have to do the 
cache lookup for an object that we are already handling.

As for consistency/frequency of access let's consider some cases (time in 
minutes):
 # Case 1:
 ## T+0 A watch is set, cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 2:
 ## T+0 A watch is set, cache is populated from zk
 ## T+15 properties are accessed with getCollectionProperties() 
 ## T+30 the watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed exactly once.This is UNCHANGED vs current
 # Case 3:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+5 a watch is set, cache is already populated, zk is not accessed
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case the data access frequency for zk is once instead of twice. 
This LESS than current.
 # Case 4:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+10 the cache expires
 ## T+20 a watch is set, cache is cache is populated from zk
 ## T+30 watch is unregistered
 ## T+40 cache expires
 ## In this case zookeeper is accessed twice. This is UNCHANGED vs current. 
 # Case 5:
 ## T+0 a call to getCollectionProperties() is made, cache is populated from zk
 ## T+1 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ## T+2 a call to getCollectionProperties() is made, cache is already 
populated, zk is not accessed
 ##