[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903131#comment-16903131 ] Andrzej Bialecki commented on SOLR-13579: -- Thanks for the review: # Yes, it was a bug - there was a missing conditional that checked whether the other pool of the same type already has this component. # Definitely, but the API is still in flux - I'll add it once the API is somewhat stabilized. # not yet - again, it requires declaring commands and parameters in a separate JSON file, which at this point I think is premature when the implementation keeps changing. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901705#comment-16901705 ] Shalin Shekhar Mangar commented on SOLR-13579: -- Thanks [~ab]. This is looking good. I've done a first pass through the design and code. It took a time to wrap my head around it and your jira comments describing the use-case and how it works really helped. I have some initial comments: # The DefaultResourceManaged has a bug I think. The pool can be created by createPool and it is scheduled immediately and added to the resourcePools map with the key being the name of the resource pool. So presumably we can create multiple pools of the same type which is as per the design. But the #registerComponent() method gets the pool for the given name and checks that there are no other pools with the same type? AIUI, there are no checks to see if the given managed component is actually registered in the other pools of the same type? This can be easily demonstrated by changing the TestDefaultResourceManagerPool.testBasic method and adding another pool with the same type. # The package-info.java for the managed package can benefit from some of the design documentation you have added in this Jira. # There is no v2 api for the /admin/resources? I'm going to do another pass and try it out and get back to you. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898311#comment-16898311 ] Andrzej Bialecki commented on SOLR-13579: -- Updated patch refactored to use type-safe interfaces for getting / setting the limits and retrieving the monitored values. Also, added a simple unit test. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897369#comment-16897369 ] Andrzej Bialecki commented on SOLR-13579: -- Ah, this makes perfect sense. I'll try refactoring the API along these lines. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897347#comment-16897347 ] Hoss Man commented on SOLR-13579: - bq. We could perhaps call a type-safe and name-safe component API from a generic management API by following a similar convention as the one used in SolrPluginUtils.invokeSetters? Or use marker interfaces that also provide validation / conversion. I'll look into this. Unless there's something i'm missing (and that's incredibly likely) I don't even think you'd need a SolrPluginUtils.invokeSetters type hack for any of this -- except maybe mapping REST commands in the ResourceManagerHandler to methods in the ResourceManagerPlugins? what i was imagining was a more straightfoward subclass/subinterface relationship and using generics to tightly couple the ManagedComponent impls to the corresponding ResourceManagerPlugins -- so the plugins could hav a completey staticly typed APIs for calling methods on the Components. ala... {code} public interface ManagedComponent { ManagedComponentId getManagedComponentId(); ... } public abstract ResourceManagerPlugin { /** if needed by ResourceManagerHandler or metrics */ public abstract void setResourceLimits(ManagedComponentId component, Map limits); /** if needed by ResourceManagerHandler or metrics */ public abstract Map getResourceLimits(ManagedComponentId component); ... // other general API methods needed for linking/registering type "T" components // (or Pool) and for "managing" all of them... ... } public interface ManagedCacheComponent implements ManagedComponent { // actual caches implement this, and only have to worry about type specific methods // for managing their resource realted settings -- nothing about the REST API... public void setMaxSize(long size); public void setMaxRamMB(int maxRamMB); public long getMaxSize(); public int getMaxRamMB(); } public class CacheManagerPlugin extends ResourceManagerPlugin { // comncrete impls like this can use the staticly typed get/set methods of the concrete // ManagedComponent impls in their getResourceLimits/setResourceLimits & manage methods ... } {code} > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896204#comment-16896204 ] Andrzej Bialecki commented on SOLR-13579: -- On the use cases: bq. CacheManagerPlugin would only ever reduce the maxRamMB setting of some caches at run time Again, the current implementation of {{CacheManagerPlugin}} is a simplistic draft. Ultimately the controlled value of {{maxRamMB}} would be tied proportionally to two main factors: * the {{hitratio}} metric (i.e. caches with low hit rate don't need as much RAM so their {{maxRamMB}} would be trimmed down). This is an optimization of resource usage. * and the total {{ramBytesUsed}} across all cores would be used as a hard limit, proportionally applied to all caches' {{maxRamMB}}, overriding the above optimization if necessary. This is a hard control limit, which indeed is related to the current number of cores. Initial value of {{maxRamMB}} would still come from the config, as it does today, but then during runtime it would be modified both up and down from that value depending on the situation. bq. users who want to use these pools need to change the individual cache's configured maxRamMB to be much higher then they are today. (potentially to the same value as the maxRamMB of the pool?) I think it would work the other way around - users can specify whatever they want, but if the admin sets a total {{maxRamMB}} to a lower value than the aggregate value that users requested, their requests will be proportionally scaled down (see also above for a finer-grained optimization adjustment, not just the hard limit). So in reality the amount of RAM each core and each cache would get would be determined as follows: * initial value would be set from the config's {{maxRamMB}}, unless it would already hit the global limit * this value would be quickly trimmed down based on the {{hitratio}}, and eventually scaled up as the {{hitratio}} increases. Some other metric could be used here, too, to make this scale down/up process more efficient. * if a bunch of other cores were suddenly allocated to the same node it's likely that the aggregate {{ramBytesUsed}} would hit the global ceiling and the plugin would start trimming down {{maxRamMB}} of each cache in each core (possibly using some weighted scheme instead of purely proportional). * if the number of cores were to decrease so that their aggregate {{ramBytesUsed}} would fall below a percentage of the hard limit, say 80%, the plugin could proportionally increase the {{maxRamMB}} so that they equal to eg. 80% of the hard limit. bq. how/when can/should a CacheManagerPlugin assume/recognize that the memory pressure has decreased? Using the {{ramBytesUsed}} metric for the hard limit, and the {{hitratio}} metric for optimization. If {{hitratio}} is high then we need as much RAM as possible to expand the cache, until we either hit the core's limit, or the global limit, OR the {{hitratio}} falls below a threshold. If {{hitratio}} falls below a threshold then we know the cache contains mostly useless items and we can trim down its {{maxRamMB}}, which will lead to evictions, which in turn will lead to the increased {{hitratio}}. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896177#comment-16896177 ] Andrzej Bialecki commented on SOLR-13579: -- On the API itself: bq. ...but the code in ResourceManagerPlugin is also independent of any specific type of resource(s) that a pool can manage {{ResourceManagerPlugin}} is an interface so it has no code of its own. Subclasses implement the actual logic of what to monitor and how to control it, so it made sense to make it a separate interface from a pool, which is responsible for collecting and aggregating the data from components. As I mentioned, I can easily foresee a future 1:N mapping between pool and plugins, in order to manage different types of resource limits of a component in one pool. Concrete example of a component that consumes different types of resources that we may want to manage is SolrIndexSearcher - here we have caches, merge IO, update threads and query threads. We may want to manage all of these aspects by registering SolrIndexSearcher in a single pool that supports these types of mgmt plugins, instead of registering it in several pools, each managing one aspect of the component. bq. "loose coupling" that currently exists in the patch between the ManagedComponent API and ResourceManagerPlugin I agree, this is an important concern - please remember that this is just an initial attempt to cover all bases, and I thought that using a very generic API could protect us from the combinatoric explosion of the API between the management framework and the different types of components. As you noted, the unfortunate downside of this approach is the complexity of validating and applying the modifications in the components... We could perhaps call a type-safe and name-safe component API from a generic management API by following a similar convention as the one used in {{SolrPluginUtils.invokeSetters}}? Or use marker interfaces that also provide validation / conversion. I'll look into this. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16895371#comment-16895371 ] Andrzej Bialecki commented on SOLR-13579: -- bq. Hmmm, Did you mean to upload a diff patch? the latests i see (#12975831) still contains lots of new class names refering to "Resource" instead of "Component" ... I meant the use of "component" where it refers to Solr components - previous versions of the patch confusingly referred to these components as "resources", hence eg. ManagedResource -> ManagedComponent. Other names are related to the management of actual hardware resources (ram, IO, etc.) so I felt the remaining class names with ..resource.. are still appropriate here. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894184#comment-16894184 ] Hoss Man commented on SOLR-13579: - Honestly, i'm still very lost. Part of my struggle is i'm trying to wade into the patch, and review the APIs and functionality it contains, while knowing – as you mentioned – that's not all the details are here, and it's not fully fleshed out w/everything you intend as far as configuration and customization and having more concrete implementations beyond just the {{CacheManagerPlugin}}. I know that in your mind there is more that can/should be done, and that some of this code is just "placeholder" for later, but i don't have enough familiarity with the "long term" plan to really understand what in the current patch is placeholder or stub APIs, vs what is "real" and exists because of long term visions for how all of these pieces can be used together in a more generalized system – ie: what classes might have surface APIs that look more complex then needed given what's currently implemented in the patch, because of how you envinsion those classes being used in the future? Just to pick one example, was my question about the "ResourceManagerPool" vs "ResourceManagerPlugin" – in your reply you said... {quote}The code in ResourceManagerPool is independent of the type of resource(s) that a pool can manage. ... {quote} ...but the code in {{ResourceManagerPlugin}} is _also_ independent of any specific type of resource(s) that a pool can manage – those specifics only exist in the concrete subclasses. Hence the crux of my question is why theses two very generalized pieces of abstract functionality/data collection couldn't just be a single abstract base class for all (concrete) ResourceManagerPlugin subclasses to extend? Your followup gives a clue... {quote}...perhaps at some point we could allow a single pool to manage several aspects of a component, in which case a pool could have several plugins. {quote} but w/o some "concrete hypothetical" examples of what that might look like, it's hard to evaluate if the current APIs are the "best" approach, or if maybe there is something better/simpler. {quote}Also, there can be different pools of the same type, each used for a different group of components that support the same management aspect. For example, for searcher caches we may want to eventually create separate pools for filterCache, queryResultCache and fieldValueCache. All of these pools would use the same plugin implementation CacheManagerPlugin but configured with different params and limits. {quote} But even in this situation, there could be multiple *instances* of a {{CacheManagerPlugin}}, one for each pool, each with different params and limits, w/o needing distinction between the {{ResourceManagerPlugin}} concept/instances and the {{ResourceManagerPool}} concept/instances. (To be clear, i'm not trying to harp on the specific design/seperation/linkage of {{ResourceManagerPlugin}} vs {{ResourceManagerPool}} – these are just some of the first classes i looked at and had questions about. I'm just using them as examples of where/how it's hard to ask questions or form opinions about the current API/code w/o having a better grasp of some "concrete specifcs" (or even "hypothetical specifics") of when/how/where/why each of these APIs are expected to be used and interact w/each other. Another example of where i got lost as to the specific motivation behind some of these APIs in the long term view is in the "loose coupling" that currently exists in the patch between the {{ManagedComponent}} API and {{ResourceManagerPlugin}}: As i understand it: * An object in Solr supports being managed by a particular subclass of {{ResourceManagerPlugin}} if and only if it extends {{ManagedComponent}} and implementes {{ManagedComponent.getManagedResourceTypes()}} such that the resulting {{Collection}} contains a String matching the return value of a {{ResourceManagerPlugin.getType()}} for that particular {{ResourceManagerPlugin}} ** ie: {{SolrCache}} extends the {{ManagedComponent}} interface, and all classess implementeing {{SolrCache}} should/must implement {{getManagedResourceTypes()}} by returning a java {{Collection}} containing {{CacheManagerPlugin.TYPE}} * once some {{ManagedComponent}} instances are "registered in a pool" and managed by a specific {{ResourceManagerPlugin}} intsance then that plugin expects to be able to call {{ManagedComponent.setResourceLimits(Map limits)}} and {{ManagedComponent.getResourceLimits()}} on all of those {{ManagedComponent}} instances, and that both Maps should contain/support a set of {{String}} keys specific to that {{ResourceManagerPlugin}} subclass acording to {{ResourceManagerPlugin.getControlledParams()}} ** ie: {{CacheManagerPlugin.getControlledParams()}} returns a java {{Collection}} containing
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892869#comment-16892869 ] Andrzej Bialecki commented on SOLR-13579: -- Updated patch: * renamed ManagedResource to ManagedComponent * consistently use the name "component" instead of the confusing "resource" > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892682#comment-16892682 ] Andrzej Bialecki commented on SOLR-13579: -- bq. why is the "ResourceManagerPool" class different from the "ResourceManagerPlugin" class? The code in {{ResourceManagerPool}} is independent of the type of resource(s) that a pool can manage. I decided to leave them separate for now - perhaps at some point we could allow a single pool to manage several aspects of a component, in which case a pool could have several plugins. Also, there can be different pools of the same type, each used for a different group of components that support the same management aspect. For example, for searcher caches we may want to eventually create separate pools for filterCache, queryResultCache and fieldValueCache. All of these pools would use the same plugin implementation {{CacheManagerPlugin}} but configured with different params and limits. bq. What happens if a single ManagedResource is part of two different "pools" with two different ResourceManagerPlugins that give conflicting/overlapping instructions? Currently this is not allowed ({{DefaultResourceManager.addResource:183}}). In theory, I could imagine a component to belong to more than 1 pool of the same type - eg. one being a global per-node pool for coarse-grained control and the other being a local per-core pool for fine-grained optimization. However, at this point my head explodes thinking about all possible bad interactions, so the code expressly forbids it. :) bq. does that imply that once SolrCache(s) are part of a "pool" they no longer have their own max size(s)? They still do - but it's used as the starting point for proportional adjustments. As I mentioned above, the exact details of how the adjustments are distributed among all caches are still unclear - in the current patch they are applied proportionally to each cache's maxSize / maxRamMB. It should be easy to add more complex priorities or weights - I wanted to start with something simple to illustrate the concept. bq. how/where would someone specify a "preference" for ensuring that if a "pool" is "full" that certain resources should be managed more agressively then others In the current API that would probably need to be defined somewhere between {{SolrCache.getResourceLimits()}} and {{CacheManagerPlugin}}, ie. the cache would report its "priority" as one of the limits and the plugin would know what to do about it. bq. Also, FYI: with this patch, we now have 2 "ManagedResource" classes in solr/core that have absolutely nothing to do with each other... Yeah, I'll rename this one to something else. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892670#comment-16892670 ] Andrzej Bialecki commented on SOLR-13579: -- The main scenario that prompted this development was a need to control the aggregated cache sizes across all cores in a CoreContainer in a multi-tenant (uncooperative) situation. However, it seemed like a similar approach would be applicable for controlling other runtime usage of resources in a Solr node - hence the attempt to come up with a generic framework. A particular component may support resource management of several of its aspects. Eg. a {{SolrIndexSearcher}} can have a "cache" RAM usage aspect, "mergeIO" throttling aspect, "mergeThreadCount" aspect, "queryThreadCount" aspect, etc. Each of these aspects can be managed by a different global pool that defines total resource limits of a given type. Currently a component can be registered only in a single pool of a given type, in order to avoid conflicting instructions. In the current patch the component registration and pool creation parts are primitive - the default pools are created statically and components are forced to register in a dedicated pool. In the future this could be configurable - eg. components from cores belonging to different collections may belong to different pools with different limits / priorities. In the following stories, there are always two aspects of resource management - control and optimization. The control aspect ensures that the specified hard limits are observed, while the optimization aspect ensures that each component uses resources in an optimal way. The focus of this JIRA issue is mainly on the control aspect, with optimization to follow later. h2. Story 1: controlling global cache RAM usage in a Solr node {{SolrIndexSearcher}} caches are currently configured statically, using either item count limits or {{maxRamMB}} limits. We can only specify the limit per-cache and then we can limit the number of cores in a node to arrive at a hard total upper limit. However, this is not enough because it leads to keeping the heap at the upper limit when the actual consumption by caches might be far lesser. It'd be nice for a more active core to be able to use more heap for caches than another core with less traffic while ensuring that total heap usage never exceeds a given threshold (the optimization aspect). It is also required that total heap usage of caches doesn't exceed the max threshold to ensure proper behavior of a Solr node (the control aspect). In order to do this we need a control mechanism that is able to adjust individual cache sizes per core, based on the total hard limit and the actual current "need" of a core, defined as a combination of hit ratio, QPS, and other arbitrary quality factors / SLA. This control mechanism also needs to be able to forcibly reduce excessive usage (evenly? prioritized by collection's SLA?) when the aggregated heap usage exceeds the threshold. In terms of the proposed API this scenario would work as follows: * a global resource pool "searcherCachesPool" is created with a single hard limit on eg. total {{maxRamMB}}. * this pool knows how to manage components of a "cache" type - what parameters to monitor and what parameters to use in order to control their resource usage. This logic is encapsulated in {{CacheManagerPlugin}}. * all searcher caches from all cores register themselves in this pool for the purpose of managing their "cache" aspect. * the plugin is executed periodically to check the current resource usage of all registered caches, using eg. the aggregated value of {{ramBytesUsed}}. * if this aggregated value exceeds the total {{maxRamMB}} limit configured for the pool then the plugin adjusts the {{maxRamMB}} setting of each cache in order to reduce the total RAM consumption - currently this uses a simple proportional formula without any history (the P part of PID), with a dead-band in order to avoid thrashing. Also, for now, this addresses only the control aspect (exceeding a hard threshold) and not the optimization, i.e. it doesn't proactively reduce / increase {{maxRamMB}} based on hit rate. * as a result of this action some of the cache content will be evicted sooner and more aggressively than initially configured, thus freeing more RAM. * when the memory pressure decreases the {{CacheManagerPlugin}} re-adjusts the {{maxRamMB}} settings of each cache to the initially configured values. Again, the current implementation of this algorithm is very simple but can be easily improved because it's cleanly separated from implementation details of each cache. h2. Story 2: controlling global IO usage in a Solr node. Similarly to the scenario above, currently we can only statically configure merge throttling (RateLimiter) per core but we can't monitor and control the total IO
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16892241#comment-16892241 ] Hoss Man commented on SOLR-13579: - I spent some time breifly skimming the patch, and TBH got lost very quickly. I think it would be helpful (probably to more folks then just myself) if we could discuss, in "story" form, some (existing or hypothetical) examples of scenerios that could come up; how this new system would be helpful & behave in those scenerios, and what classes/objects (either in this patch, or yet to be written) would be responsible for each bit of action/reaction in those stories. ie: I'm a solr cluster admin and I have some existing collections using the (existing) default cache configurations. When/why might i want to setup some pools? what types of steps would i take to do so? how would my configuration(s) change? After i have some pools in place, what's an example of something that might happen during runtime that would cause the ResourceManager to "do something" with my pools/caches? what would that "do something" look like in terms of method call stacks? what would the effective end result be from my perspective as an external observer? Some specific bits that confuse me as i try to wrap my head around the current patch... * If each named "pool" has exactly one ResourceManagerPlugin that contains the (type specific) actual logic for managinging "the pool" (and the resources using that pool) then why is the "ResourceManagerPool" class different from the "ResourceManagerPlugin" class? ** as opposed to combining that logic into a single common base class? ** is there a one-to-many/many-to-one relationship between them that i'm not understanding? * can you elaborate on this comment with some concrete examples: {quote}Each managed resource can be managed by multiple types of plugins and it may appear in multiple pools (of different types). This reflects the fact that a single component may have multiple aspects of resource management - eg. cache mgmt, cpu, threads, etc. {quote} ** ie: if "CacheManagerPlugin.TYPE" is one "type" of pool that a SolrCache (implements ManagedResource) might be managed by, what would another hypothetical "type" of plugin/pool be that SolrCache might also be a part of? *** or if you can't think of a good example of two diff types that a SolrCache would be managed by, any example of an concept/object in solr that might becom a "ManagedResource" that could be managed by two differnt types of polugins as part of 2 diff pools would be helpful ** What happens if a single ManagedResource is part of two different "pools" with two different ResourceManagerPlugins that give conflicting/overlapping instructions? * regarding this comment... {quote}Each pool also has plugin-specific parameters, most notably the limits - eg. max total cache size, which the CacheManagerPlugin knows how to use in order to adjust cache sizes. {quote} ** does that imply that once SolrCache(s) are part of a "pool" they no longer have their own max size(s) ? or is the configured max size of an individual cache(s) still a hard upper bound on the "managed size" that might be set at runtime as the plugins fire? ** how/where would someone specify a "preference" for ensuring that if a "pool" is "full" that certain resources should be managed more agressively then others – ex: imagine a cluster admin wants all collections to have SolrCaches that are "as big as possible" given the resources of the machines, but wants to give priority to a certain subset of the "important" collections if resources get constrained; what/where would that be done? Also, FYI: with this patch, we now have 2 "ManagedResource" classes in solr/core that have absolutely nothing to do with each other... {noformat} $ find -name ManagedResource.java ./solr/core/src/java/org/apache/solr/rest/ManagedResource.java ./solr/core/src/java/org/apache/solr/managed/ManagedResource.java {noformat} ...thats a little weird. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891285#comment-16891285 ] Andrzej Bialecki commented on SOLR-13579: -- This updated patch adds the following: * integration of SolrIndexSearcher caches into the framework * initialization of resource manager and pool limit configurations from /clusterprops.json * other refactorings > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch, > SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888270#comment-16888270 ] Andrzej Bialecki commented on SOLR-13579: -- Example requests and responses - this uses SOLR-13558 + a little glue to register SolrIndexSearcher caches: {code} http://localhost:8983/solr/admin/resources?poolAction=list { "responseHeader": { "status": 0, "QTime": 0 }, "result": { "searcherCache": { "type": "cache", "size": 10, "limits": { "maxRamMB": 500 }, "args": {}, "resources": [ "filterCache@7e351be2", "perSegFilter@4d1a23c9", "documentCache@225da02a", "fieldValueCache@90a2ca", "queryResultCache@5ff5ad0e", "queryResultCache@15c33adb", "fieldValueCache@6f100717", "perSegFilter@4d5cc184", "filterCache@13f35898", "documentCache@48dfaca7" ] } } } http://localhost:8983/solr/admin/resources?resAction=list=searcherCache { "responseHeader": { "status": 0, "QTime": 0 }, "result": { "filterCache@7e351be2": { "class": "org.apache.solr.search.FastLRUCache", "types": [ "cache" ], "managedLimits": { "cleanupThread": false, "size": 0, "showItems": 0, "minSize": 460, "maxRamMB": -1, "acceptableSize": 486 } }, "perSegFilter@4d1a23c9": { "class": "org.apache.solr.search.LRUCache", "types": [ "cache" ], "managedLimits": { "size": 10, "maxRamMB": -1 } }, "documentCache@225da02a": { "class": "org.apache.solr.search.LRUCache", "types": [ "cache" ], "managedLimits": { "size": 512, "maxRamMB": -1 } }, "fieldValueCache@90a2ca": { "class": "org.apache.solr.search.FastLRUCache", "types": [ "cache" ], "managedLimits": { "cleanupThread": false, "size": 0, "showItems": -1, "minSize": 9000, "maxRamMB": -1, "acceptableSize": 9500 } }, ... {code} > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887963#comment-16887963 ] Andrzej Bialecki commented on SOLR-13579: -- Updated patch: * more refactoring :) * added ResourceManagerHandler under /admin/resources to expose the resource management. This handler supports managing the pools (create / delete / status / modify limits) as well as resources (similarly). > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch, SOLR-13579.patch > > > Resource management framework API supporting the goals outlined in SOLR-13578. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886014#comment-16886014 ] Andrzej Bialecki commented on SOLR-13579: -- Updated patch, with one significant change (based on the work in SOLR-13558): allow arbitrary limit types, ie. Object instead of Float. This way the API can support controllable parameters that are expressed as eg. booleans, enums, etc. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch, SOLR-13579.patch > > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881572#comment-16881572 ] David Smiley commented on SOLR-13579: - Oh I see this is a sub-task, and the parent task has a fine description. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881571#comment-16881571 ] David Smiley commented on SOLR-13579: - Could you please add an issue description? The title is not so self explanatory so as to excuse you from writing one. It's a little unclear to me what the objective is. Node-wide cache management seems to be just one example or is that the whole point? What might other purposes be? I could use my imagination but I'd rather the proposal spell it out for us. Thanks. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13579) Create resource management API
[ https://issues.apache.org/jira/browse/SOLR-13579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878603#comment-16878603 ] Andrzej Bialecki commented on SOLR-13579: -- This patch contains a draft of the API, with some details of the implementation fleshed out, up for discussion - there are no tests and no integration with any existing component yet. :) A high-level design overview: * A {{ResourceManager}} manages multiple named pools of resources (flat hierarchy for now). A default instance of {{ResourceManager}} would be created at a {{CoreContainer}} level so that it can manage global limits for a Solr node. * Each pool knows how to perform a single specific type of management. This handling is actually performed by a {{ResourceManagerPlugin}}, which knows what monitored values to retrieve from resources, and knows how to adjust the controlled parameters of managed resources. * There can be multiple pools of the same type (under different names) - they will likely differ in their parameter. Eg. document cache size may be checked every 1 min and have one limit, but the query / filter cache size may use different parameters, even though the set of monitored parameters and controlled parameters are the same (hence the same type). * {{ResourceManager}} is responsible for periodically executing the {{ResourceManagerPlugin}} of each of the pools, so that it can verify and adjust the resources it manages in the pool. * Each pool has its own parameters - currently the only global parameter is scheduleDelaySeconds, which determines how often the pool will run the management plugin to verify and adjust the resource usage. * Each pool also has plugin-specific parameters, most notably the limits - eg. max total cache size, which the CacheManagerPlugin knows how to use in order to adjust cache sizes. * Each managed resource can be managed by multiple types of plugins and it may appear in multiple pools (of different types). This reflects the fact that a single component may have multiple aspects of resource management - eg. cache mgmt, cpu, threads, etc. The patch also contains an example implementation of a management plugin - {{CacheManagerPlugin}}. This plugin uses the API to enforce global limits on the cache size. It knows how to retrieve and calculate the current resource usage, as reported by the monitored values, and then it adjusts the controlled limits of each resource to bring the usage back to the total values that fit within the limits defined by the pool. In this case the pool can define global limits on the cache {{size}} and {{maxRamMB}} (and these are also the parameters to control for each cache), and the plugin uses {{size}} and {{ramBytesUsed}} for monitoring the actual resource consumption. Obviously {{SolrCache}} doesn't implement this API yet, but it's relatively easy to add. I'd appreciate review, comments and suggestions. > Create resource management API > -- > > Key: SOLR-13579 > URL: https://issues.apache.org/jira/browse/SOLR-13579 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Attachments: SOLR-13579.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org