[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032610#comment-17032610 ] Ilan Ginzburg commented on SOLR-5146: - Thanks [~erickerickson] for the wider context overview. If we solve the leader issue, ensuring index is up to date (and making it so if it's not) is likely a lot easier with SHARED collections and replicas, i.e. index files written to a Blob storage that becomes the "source of truth" (https://github.com/apache/lucene-solr/tree/jira/SOLR-13101). My understanding [~dsmiley] is that a replica being unloaded totally, i.e. files are on disk but nothing in memory, would require changes to the current strategy of always having replica specific Zookeeper connections/state for the leader election process. > Figure out what it would take for lazily-loaded cores to play nice with > SolrCloud > - > > Key: SOLR-5146 > URL: https://issues.apache.org/jira/browse/SOLR-5146 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.5, 6.0 >Reporter: Erick Erickson >Assignee: David Smiley >Priority: Major > > The whole lazy-load core thing was implemented with non-SolrCloud use-cases > in mind. There are several user-list threads that ask about using lazy cores > with SolrCloud, especially in multi-tenant use-cases. > This is a marker JIRA to investigate what it would take to make lazy-load > cores play nice with SolrCloud. It's especially interesting how this all > works with shards, replicas, leader election, recovery, etc. > NOTE: This is pretty much totally unexplored territory. It may be that a few > trivial modifications are all that's needed. OTOH, It may be that we'd have > to rip apart SolrCloud to handle this case. Until someone dives into the > code, we don't know. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031050#comment-17031050 ] David Smiley commented on SOLR-5146: {quote}...but then it's not immediately possible for it to get the shard leader election done since other nodes are not currently participating for that slice. {quote} I don't get what you are saying here. The "trick" with transient cores with SolrCloud will be that SolrCloud needn't know about the loaded status. Maybe there will be an exception but it'll be a secret inside the node (other nodes/replicas won't know). The core is _present_, and thus it's leader status is whatever it is to SolrCloud. It might be awoken to participate in leadership elections (I hope not) but if so I'll look to fix that so an unloaded core can stay that way during this. If there is data to sync then the core will be awoken to do so (/update and /replication and all request handlers require the core be loaded). > Figure out what it would take for lazily-loaded cores to play nice with > SolrCloud > - > > Key: SOLR-5146 > URL: https://issues.apache.org/jira/browse/SOLR-5146 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.5, 6.0 >Reporter: Erick Erickson >Assignee: David Smiley >Priority: Major > > The whole lazy-load core thing was implemented with non-SolrCloud use-cases > in mind. There are several user-list threads that ask about using lazy cores > with SolrCloud, especially in multi-tenant use-cases. > This is a marker JIRA to investigate what it would take to make lazy-load > cores play nice with SolrCloud. It's especially interesting how this all > works with shards, replicas, leader election, recovery, etc. > NOTE: This is pretty much totally unexplored territory. It may be that a few > trivial modifications are all that's needed. OTOH, It may be that we'd have > to rip apart SolrCloud to handle this case. Until someone dives into the > code, we don't know. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030952#comment-17030952 ] Erick Erickson commented on SOLR-5146: -- [~murblanc] That's certainly one issue. Even if efficiently getting a leader for a completely unloaded shard is solved, the question of how to keep the core in sync is a sticky wicket. Say even one replica of a shard is unloaded and it gets loaded. How is the core synched before doing anything? If replicas are coming and going all the time, do we wind up doing full synchronizations (assuming the leader problem is solved)? In the case of, say, 200G indexes for a given replica, that's very expensive. Core loading from a cold start is a very heavyweight operation. It may be that we need some intermediate state where we can free up lots of resources but keep the core kind of loaded, mostly so it could be waked up nearly instantly, say the equivalent of opening a new searcher. Leader election is really all about insuring that the index is up to date. So I've wondered about a state for a replica that's "index only" rather than unloaded, the idea is that that way it's always up to date and can (almost) instantly assume leadership, but doesn't consume the heavier-weight resources. Then it could be brought online without having to sync from the leader. And then "somehow" combine it with autoscaling-like functionality, when they query rate exceeded X, bring another replica from index-only to serving searchers. That'd take untangling what's necessary for indexing and what's necessary for searching so they were relatively independent. But I'll leave that for David to struggle with... > Figure out what it would take for lazily-loaded cores to play nice with > SolrCloud > - > > Key: SOLR-5146 > URL: https://issues.apache.org/jira/browse/SOLR-5146 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.5, 6.0 >Reporter: Erick Erickson >Assignee: David Smiley >Priority: Major > > The whole lazy-load core thing was implemented with non-SolrCloud use-cases > in mind. There are several user-list threads that ask about using lazy cores > with SolrCloud, especially in multi-tenant use-cases. > This is a marker JIRA to investigate what it would take to make lazy-load > cores play nice with SolrCloud. It's especially interesting how this all > works with shards, replicas, leader election, recovery, etc. > NOTE: This is pretty much totally unexplored territory. It may be that a few > trivial modifications are all that's needed. OTOH, It may be that we'd have > to rip apart SolrCloud to handle this case. Until someone dives into the > code, we don't know. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030919#comment-17030919 ] Ilan Ginzburg commented on SOLR-5146: - Isn't a fundamental difference in SolrCloud vs standalone Solr that if we assume a given slice (shard) is not loaded anywhere and a request is received by a node for it, the node can load/open its local copy of that core just fine (let's assume that since it works in standalone), but then it's not immediately possible for it to get the shard leader election done since other nodes are not currently participating for that slice. > Figure out what it would take for lazily-loaded cores to play nice with > SolrCloud > - > > Key: SOLR-5146 > URL: https://issues.apache.org/jira/browse/SOLR-5146 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.5, 6.0 >Reporter: Erick Erickson >Assignee: David Smiley >Priority: Major > > The whole lazy-load core thing was implemented with non-SolrCloud use-cases > in mind. There are several user-list threads that ask about using lazy cores > with SolrCloud, especially in multi-tenant use-cases. > This is a marker JIRA to investigate what it would take to make lazy-load > cores play nice with SolrCloud. It's especially interesting how this all > works with shards, replicas, leader election, recovery, etc. > NOTE: This is pretty much totally unexplored territory. It may be that a few > trivial modifications are all that's needed. OTOH, It may be that we'd have > to rip apart SolrCloud to handle this case. Until someone dives into the > code, we don't know. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023591#comment-17023591 ] Erick Erickson commented on SOLR-5146: -- [~dsmiley] AFAIC, if you're going to be diving into this feel totally free to make any changes to the JIRA you want ;). I created it mostly to have a marker so people would know that transient cores weren't supported in SolrCloud. In a similar vein, you can change the "TestLazyCores.java" to "TestTransientCores" ;) As far as the design for the pluggable interface I totally agree that it's awkward. When I was originally working on it the sponsor of the effort had forked Solr and heavily customized the prior implementation (CoreContainer etc). They wanted a less-painful upgrade path next time. So I pulled the intertwined bits out of CoreContainer and made them accessible. Making it pluggable really amounted to identifying all the bits in places like CoreContainer that were needed and getting them into the interface. I always thought of it as an intermediate step and could help inform a thoughtful redesign. There's a use-case for each of the touch points, what would be good is to identify whether all of those use-cases are necessary and/or could be combined into something simpler, perhaps modifying Solr's core operations along the way. At root, there's a two-way set of communications that need to happen between CoreContainer and the transient core code. As cores move through various phases (loading, pendingOps, queued up for the closer thread etc.) it may or may not be available for the plugin to do what it wants, say move it to another location. So there have to be ways for the transient code to know what state the core is in as it moves through it's lifecycle. Conversely, when Solr is trying to do whatever it needs to, say close the core, it needs the transient plugin not to do something conflicting. The plugin can do anything it wants to the core. Move it physically, temporarily suspend it, even move it to another machine if the infrastructure is such that the external app knows how to resolve where the core is. Some of the complexity is due to trying to operate on cores in parallel, and the core operations have accumulated cruft over time. Whether we could reduce the complexity of core manipulation and thus reduce the complexity of the transient interface, as well as the complexity of CoreContainer is a fair question. If you are puzzled by how trivial some of the touch points are in the default implementation, we can talk about why they're there, some of them were required for the alternate implementation. If I can remember why 2 years later... Anyway, if you want to totally redesign it, please feel free ;) > Figure out what it would take for lazily-loaded cores to play nice with > SolrCloud > - > > Key: SOLR-5146 > URL: https://issues.apache.org/jira/browse/SOLR-5146 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.5, 6.0 >Reporter: Erick Erickson >Assignee: David Smiley >Priority: Major > > The whole lazy-load core thing was implemented with non-SolrCloud use-cases > in mind. There are several user-list threads that ask about using lazy cores > with SolrCloud, especially in multi-tenant use-cases. > This is a marker JIRA to investigate what it would take to make lazy-load > cores play nice with SolrCloud. It's especially interesting how this all > works with shards, replicas, leader election, recovery, etc. > NOTE: This is pretty much totally unexplored territory. It may be that a few > trivial modifications are all that's needed. OTOH, It may be that we'd have > to rip apart SolrCloud to handle this case. Until someone dives into the > code, we don't know. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023078#comment-17023078 ] David Smiley commented on SOLR-5146: I'm going to start work on this as it's a critical feature to support thousands of cores per node. Shawn rightfully points out it's necessary to also load cores super-fast, though I think separate issues should exist. SOLR-14040 for schema sharing is one, and there are smaller ones that have been contributed. In a SolrCloud world I think we'll identify the need to do more. Also out of scope on this specific Jira issue is directly addressing SolrCloud scale of # of collections/shards or whatever. The title says "lazy cores" which suggests only load-on-startup=false and is a rather easy case but I think the scope is actually full "transient" core compatibility, thus on-demand load/*unload* and implies a cache. I want to edit the title to not not have the word "lazy". I expect I'll use the new "Resource Management API" SOLR-13579 for the node-level cache, though I haven't yet dug into the details of that. I tried out the transient core cache in SolrCloud for the heck of it (not expecting it to work) and sure enough, it led to an error. Debugging that further might be a good early step to get a sense of the challenges ahead. Like Erick, I'm hopeful that there's not much technical barrier preventing this feature from "just working" in spite of SolrCloud. SolrCloud will think all the cores on the node are live; the trick is that some are asleep and can be awoken easily, but that distinction needs to be invisible to SolrCloud. I sympathize with Scott's theory that maybe we don't do transient cores and instead flush cashes and perhaps other things. I've thought of that a great deal. It's promising, but you won't quite save as much memory as actually closing the core. If I get stuck with SolrCloud transient core difficulties, I may look at such an alternative. And ultimately we can do both; there aren't mutually exclusive! I'm rather unsatisfied with the implementation of the existing transient core cache. It's weird to me that it's pluggable and has a rather large API surface area for something conceptually straight-forward. I suppose the details are complicated, and I'll have to dig to appreciate those complexities. > Figure out what it would take for lazily-loaded cores to play nice with > SolrCloud > - > > Key: SOLR-5146 > URL: https://issues.apache.org/jira/browse/SOLR-5146 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.5, 6.0 >Reporter: Erick Erickson >Assignee: David Smiley >Priority: Major > > The whole lazy-load core thing was implemented with non-SolrCloud use-cases > in mind. There are several user-list threads that ask about using lazy cores > with SolrCloud, especially in multi-tenant use-cases. > This is a marker JIRA to investigate what it would take to make lazy-load > cores play nice with SolrCloud. It's especially interesting how this all > works with shards, replicas, leader election, recovery, etc. > NOTE: This is pretty much totally unexplored territory. It may be that a few > trivial modifications are all that's needed. OTOH, It may be that we'd have > to rip apart SolrCloud to handle this case. Until someone dives into the > code, we don't know. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org