[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know

2016-03-22 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206745#comment-15206745
 ] 

ASF subversion and git services commented on SOLR-8862:
---

Commit b6be74f2182c46a10f861556ea81d3ed1a79a308 in lucene-solr's branch 
refs/heads/jira/SOLR-445 from [~hossman_luc...@fucit.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b6be74f ]

SOLR-8862 work around.  Maybe something like this should be promoted into 
MiniSolrCloudCluster's start() method? or SolrCloudTestCase's configureCluster?


> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --
>
> Key: SOLR-8862
> URL: https://issues.apache.org/jira/browse/SOLR-8862
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
> configureCluster(numServers)
>   .addConfig(configName, configDir.toPath())
>   .configure();
> Map collectionProperties = ...;
> assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know

2016-03-21 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205251#comment-15205251
 ] 

Alan Woodward commented on SOLR-8862:
-

I've tried to dig a bit and see when everything here is run within the Jetty 
lifecycle, and it turns out that... it's complicated!
* In a normal Solr setup, running using the Jetty start.jar, the 
SolrDispatchFilter is instantiated during startup (Jetty instantiates its 
Filters, and then its Servlets), and it won't serve any requests until all 
filters and servlets are fully constructed and have finished initialising.  So 
there could be a significant gap between registering the live_nodes znode and 
requests actually being served, particularly if there are other servlets within 
the container that take their time in starting up.
* In JettySolrRunner, the SDF is instantiated within a jetty LifecycleListener 
(of which more below), which is called *after* Jetty has started listening on 
its port.  Requests won't be served via the filter until it has finished 
instantiating, but the gap here is smaller.

In both cases we have a race.  Ideally, we want to instatiate the filters, and 
only register ourselves with the cluster once we know we're serving requests, 
so we need a way to be notified that everything is ready to go:
* The standard servlet API exposes ServletContextListeners, but these only get 
called *before* startup and shutdown, so these aren't any use.  We need to be 
notified *after* startup.
* Jetty allows you to register LifecycleListeners that get called before and 
after startup and shutdown, which is exactly what we want.  Hurrah!

So what we really need to do here is to separate out CoreContainer 
construction, loading of cores, and creation of the live_nodes znode.  The 
container should be constructed and load up during server startup, and then 
register itself in a LifecycleListener.

It's not ideal that we have two different code paths here, one for 'proper' 
solr running using start.jar and xml configuration, and one programmatically, 
but I guess we can live with that for a while.

On a separate note, SOLR-8323 should help with waiting for collections to be 
searchable.

> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --
>
> Key: SOLR-8862
> URL: https://issues.apache.org/jira/browse/SOLR-8862
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
> configureCluster(numServers)
>   .addConfig(configName, configDir.toPath())
>   .configure();
> Map collectionProperties = ...;
> assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know

2016-03-20 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203595#comment-15203595
 ] 

ASF subversion and git services commented on SOLR-8862:
---

Commit aeda8dc4ae881c4ec405d70dcbf1d0b2c30871b7 in lucene-solr's branch 
refs/heads/jira/SOLR-445 from [~hossman_luc...@fucit.org]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aeda8dc ]

SOLR-445: fix test bugs, and put in a stupid work around for SOLR-8862


> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --
>
> Key: SOLR-8862
> URL: https://issues.apache.org/jira/browse/SOLR-8862
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
> configureCluster(numServers)
>   .addConfig(configName, configDir.toPath())
>   .configure();
> Map collectionProperties = ...;
> assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know

2016-03-19 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203069#comment-15203069
 ] 

David Smiley commented on SOLR-8862:


I hope this can get improved/resolved.  I didn't chase it down as far but I too 
had frustrations developing a test using MiniSolrCloudCluster that simply 
wanted the collection to be searchable (in SOLR-5750).

> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --
>
> Key: SOLR-8862
> URL: https://issues.apache.org/jira/browse/SOLR-8862
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
> configureCluster(numServers)
>   .addConfig(configName, configDir.toPath())
>   .configure();
> Map collectionProperties = ...;
> assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know

2016-03-19 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200233#comment-15200233
 ] 

Noble Paul commented on SOLR-8862:
--

bq.ZkController.checkOverseerDesignate() is called (no idea what that does)

I probaly should add a comment there. If an overseer designate is down and 
comes back up, it should be pushed ahead of non designates . So it sends a 
message to overseer to put it in the front of the overseer election queue

> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --
>
> Key: SOLR-8862
> URL: https://issues.apache.org/jira/browse/SOLR-8862
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
> configureCluster(numServers)
>   .addConfig(configName, configDir.toPath())
>   .configure();
> Map collectionProperties = ...;
> assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know

2016-03-19 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200439#comment-15200439
 ] 

Scott Blum commented on SOLR-8862:
--

I think I can comment on this just a bit.  The first call to 
createEphemeralLiveNode() is not actually called from the constructor; it's 
called from the OnReconnect handler much later, if you lose your ZK session and 
have to create a new one.  At least, that's the theory.  Are you seeing it 
actually get called early?

More generally, the important race being resolved here is the call to 
registerAllCoresAsDown() happening before createEphemeralLiveNode().  Any 
client is supposed to join the cluster state (ie, is a core marked ACTIVE) with 
the live_nodes list.  So the idea is, mark everything as DOWN, then put in the 
live_nodes child, then go mark things ACTIVE as they actually come up.  This 
works reasonably well for things like routing search requests.  I can see how 
it might fall over if you're depending on live_nodes for doing cluster level 
operations.

> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --
>
> Key: SOLR-8862
> URL: https://issues.apache.org/jira/browse/SOLR-8862
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
> configureCluster(numServers)
>   .addConfig(configName, configDir.toPath())
>   .configure();
> Map collectionProperties = ...;
> assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know

2016-03-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200470#comment-15200470
 ] 

Hoss Man commented on SOLR-8862:


bq. The first call to createEphemeralLiveNode() is not actually called from the 
constructor; it's called from the OnReconnect handler much later, if you lose 
your ZK session and have to create a new one. At least, that's the theory. Are 
you seeing it actually get called early?

Ah ... ok ... i'm probably wrong then -- i thought the "OnReconnect" handler 
was also used on the _initial_ connect as well.

I'll edit my other comment to reduce confusion

bq. This works reasonably well for things like routing search requests. I can 
see how it might fall over if you're depending on live_nodes for doing cluster 
level operations.

that's my concern -- CloudSolrClient consults {{/live_nodes}} (via 
{{ClusterState.getLiveNodes()}}) to decide which nodes are up for any requests 
that aren't explicitly routable updates -- in my particular case i'm getting 
burned by collection API calls...

I guess I see your point though ... for any request involving specific 
collection(s) clients can use the replica state to see if they are ACTIVE (or 
if they are a LEADER for update situations) .. and CloudsolrClient does that 
even for searchers.  So I guess the "practical" impacts of this aren't as 
severe as i initially thought ... 

but I still feel like we need something per-node in ZK that isn't set to  
"true" until that node is actually listening on it's port. 

> /live_nodes is populated too early to be very useful for clients -- 
> CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other 
> ephemeral zk node to knowwhich servers are "ready"
> --
>
> Key: SOLR-8862
> URL: https://issues.apache.org/jira/browse/SOLR-8862
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> {{/live_nodes}} is populated surprisingly early (and multiple times) in the 
> life cycle of a sole node startup, and as a result probably shouldn't be used 
> by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers 
> are fair game for requests.
> we should either fix {{/live_nodes}} to be created later in the lifecycle, or 
> add some new ZK node for this purpose.
> {panel:title=original bug report}
> I haven't been able to make sense of this yet, but what i'm seeing in a new 
> SolrCloudTestCase subclass i'm writing is that the code below, which 
> (reasonably) attempts to create a collection immediately after configuring 
> the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers 
> available to handle this request" -- in spite of the fact, that (as far as i 
> can tell at first glance) MiniSolrCloudCluster's constructor is suppose to 
> block until all the servers are live..
> {code}
> configureCluster(numServers)
>   .addConfig(configName, configDir.toPath())
>   .configure();
> Map collectionProperties = ...;
> assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, 
> repFactor,
>configName, null, null, 
> collectionProperties));
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org