[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud

2020-02-07 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032610#comment-17032610
 ] 

Ilan Ginzburg commented on SOLR-5146:
-

Thanks [~erickerickson] for the wider context overview. If we solve the leader 
issue, ensuring index is up to date (and making it so if it's not) is likely a 
lot easier with SHARED collections and replicas, i.e. index files written to a 
Blob storage that becomes the "source of truth" 
(https://github.com/apache/lucene-solr/tree/jira/SOLR-13101).

My understanding [~dsmiley] is that a replica being unloaded totally, i.e. 
files are on disk but nothing in memory, would require changes to the current 
strategy of always having replica specific Zookeeper connections/state for the 
leader election process.

> Figure out what it would take for lazily-loaded cores to play nice with 
> SolrCloud
> -
>
> Key: SOLR-5146
> URL: https://issues.apache.org/jira/browse/SOLR-5146
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.5, 6.0
>Reporter: Erick Erickson
>Assignee: David Smiley
>Priority: Major
>
> The whole lazy-load core thing was implemented with non-SolrCloud use-cases 
> in mind. There are several user-list threads that ask about using lazy cores 
> with SolrCloud, especially in multi-tenant use-cases.
> This is a marker JIRA to investigate what it would take to make lazy-load 
> cores play nice with SolrCloud. It's especially interesting how this all 
> works with shards, replicas, leader election, recovery, etc.
> NOTE: This is pretty much totally unexplored territory. It may be that a few 
> trivial modifications are all that's needed. OTOH, It may be that we'd have 
> to rip apart SolrCloud to handle this case. Until someone dives into the 
> code, we don't know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud

2020-02-05 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17031050#comment-17031050
 ] 

David Smiley commented on SOLR-5146:


{quote}...but then it's not immediately possible for it to get the shard leader 
election done since other nodes are not currently participating for that slice.
{quote}
I don't get what you are saying here.  The "trick" with transient cores with 
SolrCloud will be that SolrCloud needn't know about the loaded status.  Maybe 
there will be an exception but it'll be a secret inside the node (other 
nodes/replicas won't know).  The core is _present_, and thus it's leader status 
is whatever it is to SolrCloud.  It might be awoken to participate in 
leadership elections (I hope not) but if so I'll look to fix that so an 
unloaded core can stay that way during this.  If there is data to sync then the 
core will be awoken to do so (/update and /replication and all request handlers 
require the core be loaded).

> Figure out what it would take for lazily-loaded cores to play nice with 
> SolrCloud
> -
>
> Key: SOLR-5146
> URL: https://issues.apache.org/jira/browse/SOLR-5146
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.5, 6.0
>Reporter: Erick Erickson
>Assignee: David Smiley
>Priority: Major
>
> The whole lazy-load core thing was implemented with non-SolrCloud use-cases 
> in mind. There are several user-list threads that ask about using lazy cores 
> with SolrCloud, especially in multi-tenant use-cases.
> This is a marker JIRA to investigate what it would take to make lazy-load 
> cores play nice with SolrCloud. It's especially interesting how this all 
> works with shards, replicas, leader election, recovery, etc.
> NOTE: This is pretty much totally unexplored territory. It may be that a few 
> trivial modifications are all that's needed. OTOH, It may be that we'd have 
> to rip apart SolrCloud to handle this case. Until someone dives into the 
> code, we don't know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud

2020-02-05 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030952#comment-17030952
 ] 

Erick Erickson commented on SOLR-5146:
--

[~murblanc] That's certainly one issue. Even if efficiently getting a leader 
for a completely unloaded shard is solved, the question of how to keep the core 
in sync is a sticky wicket. Say even one replica of a shard is unloaded and it 
gets loaded. How is the core synched before doing anything? If replicas are 
coming and going all the time, do we wind up doing full synchronizations 
(assuming the leader problem is solved)? In the case of, say, 200G indexes for 
a given replica, that's very expensive.

Core loading from a cold start is a very heavyweight operation. It may be that 
we need some intermediate state where we can free up lots of resources but keep 
the core kind of loaded, mostly so it could be waked up nearly instantly, say 
the equivalent of opening a new searcher.

Leader election is really all about insuring that the index is up to date. So 
I've wondered about a state for a replica that's "index only" rather than 
unloaded, the idea is that that way it's always up to date and can (almost) 
instantly assume leadership, but doesn't consume the heavier-weight resources. 
Then it could be brought online without having to sync from the leader. And 
then "somehow" combine it with autoscaling-like functionality, when they query 
rate exceeded X, bring another replica from index-only to serving searchers. 
That'd take untangling what's necessary for indexing and what's necessary for 
searching so they were relatively independent.

But I'll leave that for David to struggle with...

> Figure out what it would take for lazily-loaded cores to play nice with 
> SolrCloud
> -
>
> Key: SOLR-5146
> URL: https://issues.apache.org/jira/browse/SOLR-5146
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.5, 6.0
>Reporter: Erick Erickson
>Assignee: David Smiley
>Priority: Major
>
> The whole lazy-load core thing was implemented with non-SolrCloud use-cases 
> in mind. There are several user-list threads that ask about using lazy cores 
> with SolrCloud, especially in multi-tenant use-cases.
> This is a marker JIRA to investigate what it would take to make lazy-load 
> cores play nice with SolrCloud. It's especially interesting how this all 
> works with shards, replicas, leader election, recovery, etc.
> NOTE: This is pretty much totally unexplored territory. It may be that a few 
> trivial modifications are all that's needed. OTOH, It may be that we'd have 
> to rip apart SolrCloud to handle this case. Until someone dives into the 
> code, we don't know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud

2020-02-05 Thread Ilan Ginzburg (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030919#comment-17030919
 ] 

Ilan Ginzburg commented on SOLR-5146:
-

Isn't a fundamental difference in SolrCloud vs standalone Solr that if we 
assume a given slice (shard) is not loaded anywhere and a request is received 
by a node for it, the node can load/open its local copy of that core just fine 
(let's assume that since it works in standalone), but then it's not immediately 
possible for it to get the shard leader election done since other nodes are not 
currently participating for that slice.

> Figure out what it would take for lazily-loaded cores to play nice with 
> SolrCloud
> -
>
> Key: SOLR-5146
> URL: https://issues.apache.org/jira/browse/SOLR-5146
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.5, 6.0
>Reporter: Erick Erickson
>Assignee: David Smiley
>Priority: Major
>
> The whole lazy-load core thing was implemented with non-SolrCloud use-cases 
> in mind. There are several user-list threads that ask about using lazy cores 
> with SolrCloud, especially in multi-tenant use-cases.
> This is a marker JIRA to investigate what it would take to make lazy-load 
> cores play nice with SolrCloud. It's especially interesting how this all 
> works with shards, replicas, leader election, recovery, etc.
> NOTE: This is pretty much totally unexplored territory. It may be that a few 
> trivial modifications are all that's needed. OTOH, It may be that we'd have 
> to rip apart SolrCloud to handle this case. Until someone dives into the 
> code, we don't know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud

2020-01-25 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023591#comment-17023591
 ] 

Erick Erickson commented on SOLR-5146:
--

[~dsmiley] AFAIC, if you're going to be diving into this feel totally free to 
make any changes to the JIRA you want ;).  I created it mostly to have a marker 
so people would know that transient cores weren't supported in SolrCloud. In a 
similar vein, you can change the "TestLazyCores.java" to "TestTransientCores" ;)

As far as the design for the pluggable interface I totally agree that it's 
awkward. When I was originally working on it the sponsor of the effort had 
forked Solr and heavily customized the prior implementation (CoreContainer 
etc). They wanted a less-painful upgrade path next time. So I pulled the 
intertwined bits out of CoreContainer and made them accessible.

Making it pluggable really amounted to identifying all the bits in places like 
CoreContainer that were needed and getting them into the interface. I always 
thought of it as an intermediate step and could help inform a thoughtful 
redesign. There's a use-case for each of the touch points, what would be good 
is to identify whether all of those use-cases are necessary and/or could be 
combined into something simpler, perhaps modifying Solr's core operations along 
the way.

At root, there's a two-way set of communications that need to happen between 
CoreContainer and the transient core code. As cores move through various phases 
(loading, pendingOps, queued up for the closer thread etc.) it may or may not 
be available for the plugin to do what it wants, say move it to another 
location. So there have to be ways for the transient code to know what state 
the core is in as it moves through it's lifecycle.

Conversely, when Solr is trying to do whatever it needs to, say close the core, 
it needs the transient plugin not to do something conflicting.

The plugin can do anything it wants to the core. Move it physically, 
temporarily suspend it, even move it to another machine if the infrastructure 
is such that the external app knows how to resolve where the core is.

Some of the complexity is due to trying to operate on cores in parallel, and 
the core operations have accumulated cruft over time. Whether we could reduce 
the complexity of core manipulation and thus reduce the complexity of the 
transient interface, as well as the complexity of CoreContainer is a fair 
question.

If you are puzzled by how trivial some of the touch points are in the default 
implementation, we can talk about why they're there, some of them were required 
for the alternate implementation. If I can remember why 2 years later...

Anyway, if you want to totally redesign it, please feel free ;)

> Figure out what it would take for lazily-loaded cores to play nice with 
> SolrCloud
> -
>
> Key: SOLR-5146
> URL: https://issues.apache.org/jira/browse/SOLR-5146
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.5, 6.0
>Reporter: Erick Erickson
>Assignee: David Smiley
>Priority: Major
>
> The whole lazy-load core thing was implemented with non-SolrCloud use-cases 
> in mind. There are several user-list threads that ask about using lazy cores 
> with SolrCloud, especially in multi-tenant use-cases.
> This is a marker JIRA to investigate what it would take to make lazy-load 
> cores play nice with SolrCloud. It's especially interesting how this all 
> works with shards, replicas, leader election, recovery, etc.
> NOTE: This is pretty much totally unexplored territory. It may be that a few 
> trivial modifications are all that's needed. OTOH, It may be that we'd have 
> to rip apart SolrCloud to handle this case. Until someone dives into the 
> code, we don't know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-5146) Figure out what it would take for lazily-loaded cores to play nice with SolrCloud

2020-01-24 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-5146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023078#comment-17023078
 ] 

David Smiley commented on SOLR-5146:


I'm going to start work on this as it's a critical feature to support thousands 
of cores per node.  Shawn rightfully points out it's necessary to also load 
cores super-fast, though I think separate issues should exist.  SOLR-14040 for 
schema sharing is one, and there are smaller ones that have been contributed.  
In a SolrCloud world I think we'll identify the need to do more.  Also out of 
scope on this specific Jira issue is directly addressing SolrCloud scale of # 
of collections/shards or whatever.

The title says "lazy cores" which suggests only load-on-startup=false and is a 
rather easy case but I think the scope is actually full "transient" core 
compatibility, thus on-demand load/*unload* and implies a cache.  I want to 
edit the title to not not have the word "lazy".  I expect I'll use the new 
"Resource Management API" SOLR-13579 for the node-level cache, though I haven't 
yet dug into the details of that.

I tried out the transient core cache in SolrCloud for the heck of it (not 
expecting it to work) and sure enough, it led to an error.  Debugging that 
further might be a good early step to get a sense of the challenges ahead.  
Like Erick, I'm hopeful that there's not much technical barrier preventing this 
feature from "just working" in spite of SolrCloud.  SolrCloud will think all 
the cores on the node are live; the trick is that some are asleep and can be 
awoken easily, but that distinction needs to be invisible to SolrCloud.  I 
sympathize with Scott's theory that maybe we don't do transient cores and 
instead flush cashes and perhaps other things.  I've thought of that a great 
deal.  It's promising, but you won't quite save as much memory as actually 
closing the core.  If I get stuck with SolrCloud transient core difficulties, I 
may look at such an alternative.  And ultimately we can do both; there aren't 
mutually exclusive!

I'm rather unsatisfied with the implementation of the existing transient core 
cache.  It's weird to me that it's pluggable and has a rather large API surface 
area for something conceptually straight-forward.  I suppose the details are 
complicated, and I'll have to dig to appreciate those complexities.

 

 

> Figure out what it would take for lazily-loaded cores to play nice with 
> SolrCloud
> -
>
> Key: SOLR-5146
> URL: https://issues.apache.org/jira/browse/SOLR-5146
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.5, 6.0
>Reporter: Erick Erickson
>Assignee: David Smiley
>Priority: Major
>
> The whole lazy-load core thing was implemented with non-SolrCloud use-cases 
> in mind. There are several user-list threads that ask about using lazy cores 
> with SolrCloud, especially in multi-tenant use-cases.
> This is a marker JIRA to investigate what it would take to make lazy-load 
> cores play nice with SolrCloud. It's especially interesting how this all 
> works with shards, replicas, leader election, recovery, etc.
> NOTE: This is pretty much totally unexplored territory. It may be that a few 
> trivial modifications are all that's needed. OTOH, It may be that we'd have 
> to rip apart SolrCloud to handle this case. Until someone dives into the 
> code, we don't know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org