[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know
[ https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206745#comment-15206745 ] ASF subversion and git services commented on SOLR-8862: --- Commit b6be74f2182c46a10f861556ea81d3ed1a79a308 in lucene-solr's branch refs/heads/jira/SOLR-445 from [~hossman_luc...@fucit.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b6be74f ] SOLR-8862 work around. Maybe something like this should be promoted into MiniSolrCloudCluster's start() method? or SolrCloudTestCase's configureCluster? > /live_nodes is populated too early to be very useful for clients -- > CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other > ephemeral zk node to knowwhich servers are "ready" > -- > > Key: SOLR-8862 > URL: https://issues.apache.org/jira/browse/SOLR-8862 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > {{/live_nodes}} is populated surprisingly early (and multiple times) in the > life cycle of a sole node startup, and as a result probably shouldn't be used > by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers > are fair game for requests. > we should either fix {{/live_nodes}} to be created later in the lifecycle, or > add some new ZK node for this purpose. > {panel:title=original bug report} > I haven't been able to make sense of this yet, but what i'm seeing in a new > SolrCloudTestCase subclass i'm writing is that the code below, which > (reasonably) attempts to create a collection immediately after configuring > the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers > available to handle this request" -- in spite of the fact, that (as far as i > can tell at first glance) MiniSolrCloudCluster's constructor is suppose to > block until all the servers are live.. > {code} > configureCluster(numServers) > .addConfig(configName, configDir.toPath()) > .configure(); > MapcollectionProperties = ...; > assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, > repFactor, >configName, null, null, > collectionProperties)); > {code} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know
[ https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15205251#comment-15205251 ] Alan Woodward commented on SOLR-8862: - I've tried to dig a bit and see when everything here is run within the Jetty lifecycle, and it turns out that... it's complicated! * In a normal Solr setup, running using the Jetty start.jar, the SolrDispatchFilter is instantiated during startup (Jetty instantiates its Filters, and then its Servlets), and it won't serve any requests until all filters and servlets are fully constructed and have finished initialising. So there could be a significant gap between registering the live_nodes znode and requests actually being served, particularly if there are other servlets within the container that take their time in starting up. * In JettySolrRunner, the SDF is instantiated within a jetty LifecycleListener (of which more below), which is called *after* Jetty has started listening on its port. Requests won't be served via the filter until it has finished instantiating, but the gap here is smaller. In both cases we have a race. Ideally, we want to instatiate the filters, and only register ourselves with the cluster once we know we're serving requests, so we need a way to be notified that everything is ready to go: * The standard servlet API exposes ServletContextListeners, but these only get called *before* startup and shutdown, so these aren't any use. We need to be notified *after* startup. * Jetty allows you to register LifecycleListeners that get called before and after startup and shutdown, which is exactly what we want. Hurrah! So what we really need to do here is to separate out CoreContainer construction, loading of cores, and creation of the live_nodes znode. The container should be constructed and load up during server startup, and then register itself in a LifecycleListener. It's not ideal that we have two different code paths here, one for 'proper' solr running using start.jar and xml configuration, and one programmatically, but I guess we can live with that for a while. On a separate note, SOLR-8323 should help with waiting for collections to be searchable. > /live_nodes is populated too early to be very useful for clients -- > CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other > ephemeral zk node to knowwhich servers are "ready" > -- > > Key: SOLR-8862 > URL: https://issues.apache.org/jira/browse/SOLR-8862 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > {{/live_nodes}} is populated surprisingly early (and multiple times) in the > life cycle of a sole node startup, and as a result probably shouldn't be used > by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers > are fair game for requests. > we should either fix {{/live_nodes}} to be created later in the lifecycle, or > add some new ZK node for this purpose. > {panel:title=original bug report} > I haven't been able to make sense of this yet, but what i'm seeing in a new > SolrCloudTestCase subclass i'm writing is that the code below, which > (reasonably) attempts to create a collection immediately after configuring > the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers > available to handle this request" -- in spite of the fact, that (as far as i > can tell at first glance) MiniSolrCloudCluster's constructor is suppose to > block until all the servers are live.. > {code} > configureCluster(numServers) > .addConfig(configName, configDir.toPath()) > .configure(); > MapcollectionProperties = ...; > assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, > repFactor, >configName, null, null, > collectionProperties)); > {code} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know
[ https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203595#comment-15203595 ] ASF subversion and git services commented on SOLR-8862: --- Commit aeda8dc4ae881c4ec405d70dcbf1d0b2c30871b7 in lucene-solr's branch refs/heads/jira/SOLR-445 from [~hossman_luc...@fucit.org] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=aeda8dc ] SOLR-445: fix test bugs, and put in a stupid work around for SOLR-8862 > /live_nodes is populated too early to be very useful for clients -- > CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other > ephemeral zk node to knowwhich servers are "ready" > -- > > Key: SOLR-8862 > URL: https://issues.apache.org/jira/browse/SOLR-8862 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > {{/live_nodes}} is populated surprisingly early (and multiple times) in the > life cycle of a sole node startup, and as a result probably shouldn't be used > by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers > are fair game for requests. > we should either fix {{/live_nodes}} to be created later in the lifecycle, or > add some new ZK node for this purpose. > {panel:title=original bug report} > I haven't been able to make sense of this yet, but what i'm seeing in a new > SolrCloudTestCase subclass i'm writing is that the code below, which > (reasonably) attempts to create a collection immediately after configuring > the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers > available to handle this request" -- in spite of the fact, that (as far as i > can tell at first glance) MiniSolrCloudCluster's constructor is suppose to > block until all the servers are live.. > {code} > configureCluster(numServers) > .addConfig(configName, configDir.toPath()) > .configure(); > MapcollectionProperties = ...; > assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, > repFactor, >configName, null, null, > collectionProperties)); > {code} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know
[ https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203069#comment-15203069 ] David Smiley commented on SOLR-8862: I hope this can get improved/resolved. I didn't chase it down as far but I too had frustrations developing a test using MiniSolrCloudCluster that simply wanted the collection to be searchable (in SOLR-5750). > /live_nodes is populated too early to be very useful for clients -- > CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other > ephemeral zk node to knowwhich servers are "ready" > -- > > Key: SOLR-8862 > URL: https://issues.apache.org/jira/browse/SOLR-8862 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > {{/live_nodes}} is populated surprisingly early (and multiple times) in the > life cycle of a sole node startup, and as a result probably shouldn't be used > by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers > are fair game for requests. > we should either fix {{/live_nodes}} to be created later in the lifecycle, or > add some new ZK node for this purpose. > {panel:title=original bug report} > I haven't been able to make sense of this yet, but what i'm seeing in a new > SolrCloudTestCase subclass i'm writing is that the code below, which > (reasonably) attempts to create a collection immediately after configuring > the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers > available to handle this request" -- in spite of the fact, that (as far as i > can tell at first glance) MiniSolrCloudCluster's constructor is suppose to > block until all the servers are live.. > {code} > configureCluster(numServers) > .addConfig(configName, configDir.toPath()) > .configure(); > MapcollectionProperties = ...; > assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, > repFactor, >configName, null, null, > collectionProperties)); > {code} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know
[ https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200233#comment-15200233 ] Noble Paul commented on SOLR-8862: -- bq.ZkController.checkOverseerDesignate() is called (no idea what that does) I probaly should add a comment there. If an overseer designate is down and comes back up, it should be pushed ahead of non designates . So it sends a message to overseer to put it in the front of the overseer election queue > /live_nodes is populated too early to be very useful for clients -- > CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other > ephemeral zk node to knowwhich servers are "ready" > -- > > Key: SOLR-8862 > URL: https://issues.apache.org/jira/browse/SOLR-8862 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > {{/live_nodes}} is populated surprisingly early (and multiple times) in the > life cycle of a sole node startup, and as a result probably shouldn't be used > by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers > are fair game for requests. > we should either fix {{/live_nodes}} to be created later in the lifecycle, or > add some new ZK node for this purpose. > {panel:title=original bug report} > I haven't been able to make sense of this yet, but what i'm seeing in a new > SolrCloudTestCase subclass i'm writing is that the code below, which > (reasonably) attempts to create a collection immediately after configuring > the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers > available to handle this request" -- in spite of the fact, that (as far as i > can tell at first glance) MiniSolrCloudCluster's constructor is suppose to > block until all the servers are live.. > {code} > configureCluster(numServers) > .addConfig(configName, configDir.toPath()) > .configure(); > MapcollectionProperties = ...; > assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, > repFactor, >configName, null, null, > collectionProperties)); > {code} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know
[ https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200439#comment-15200439 ] Scott Blum commented on SOLR-8862: -- I think I can comment on this just a bit. The first call to createEphemeralLiveNode() is not actually called from the constructor; it's called from the OnReconnect handler much later, if you lose your ZK session and have to create a new one. At least, that's the theory. Are you seeing it actually get called early? More generally, the important race being resolved here is the call to registerAllCoresAsDown() happening before createEphemeralLiveNode(). Any client is supposed to join the cluster state (ie, is a core marked ACTIVE) with the live_nodes list. So the idea is, mark everything as DOWN, then put in the live_nodes child, then go mark things ACTIVE as they actually come up. This works reasonably well for things like routing search requests. I can see how it might fall over if you're depending on live_nodes for doing cluster level operations. > /live_nodes is populated too early to be very useful for clients -- > CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other > ephemeral zk node to knowwhich servers are "ready" > -- > > Key: SOLR-8862 > URL: https://issues.apache.org/jira/browse/SOLR-8862 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > {{/live_nodes}} is populated surprisingly early (and multiple times) in the > life cycle of a sole node startup, and as a result probably shouldn't be used > by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers > are fair game for requests. > we should either fix {{/live_nodes}} to be created later in the lifecycle, or > add some new ZK node for this purpose. > {panel:title=original bug report} > I haven't been able to make sense of this yet, but what i'm seeing in a new > SolrCloudTestCase subclass i'm writing is that the code below, which > (reasonably) attempts to create a collection immediately after configuring > the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers > available to handle this request" -- in spite of the fact, that (as far as i > can tell at first glance) MiniSolrCloudCluster's constructor is suppose to > block until all the servers are live.. > {code} > configureCluster(numServers) > .addConfig(configName, configDir.toPath()) > .configure(); > MapcollectionProperties = ...; > assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, > repFactor, >configName, null, null, > collectionProperties)); > {code} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8862) /live_nodes is populated too early to be very useful for clients -- CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other ephemeral zk node to know
[ https://issues.apache.org/jira/browse/SOLR-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200470#comment-15200470 ] Hoss Man commented on SOLR-8862: bq. The first call to createEphemeralLiveNode() is not actually called from the constructor; it's called from the OnReconnect handler much later, if you lose your ZK session and have to create a new one. At least, that's the theory. Are you seeing it actually get called early? Ah ... ok ... i'm probably wrong then -- i thought the "OnReconnect" handler was also used on the _initial_ connect as well. I'll edit my other comment to reduce confusion bq. This works reasonably well for things like routing search requests. I can see how it might fall over if you're depending on live_nodes for doing cluster level operations. that's my concern -- CloudSolrClient consults {{/live_nodes}} (via {{ClusterState.getLiveNodes()}}) to decide which nodes are up for any requests that aren't explicitly routable updates -- in my particular case i'm getting burned by collection API calls... I guess I see your point though ... for any request involving specific collection(s) clients can use the replica state to see if they are ACTIVE (or if they are a LEADER for update situations) .. and CloudsolrClient does that even for searchers. So I guess the "practical" impacts of this aren't as severe as i initially thought ... but I still feel like we need something per-node in ZK that isn't set to "true" until that node is actually listening on it's port. > /live_nodes is populated too early to be very useful for clients -- > CloudSolrClient (and MiniSolrCloudCluster.createCollection) need some other > ephemeral zk node to knowwhich servers are "ready" > -- > > Key: SOLR-8862 > URL: https://issues.apache.org/jira/browse/SOLR-8862 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > {{/live_nodes}} is populated surprisingly early (and multiple times) in the > life cycle of a sole node startup, and as a result probably shouldn't be used > by {{CloudSolrClient}} (or other "smart" clients) for deciding what servers > are fair game for requests. > we should either fix {{/live_nodes}} to be created later in the lifecycle, or > add some new ZK node for this purpose. > {panel:title=original bug report} > I haven't been able to make sense of this yet, but what i'm seeing in a new > SolrCloudTestCase subclass i'm writing is that the code below, which > (reasonably) attempts to create a collection immediately after configuring > the MiniSolrCloudCluster gets a "SolrServerException: No live SolrServers > available to handle this request" -- in spite of the fact, that (as far as i > can tell at first glance) MiniSolrCloudCluster's constructor is suppose to > block until all the servers are live.. > {code} > configureCluster(numServers) > .addConfig(configName, configDir.toPath()) > .configure(); > MapcollectionProperties = ...; > assertNotNull(cluster.createCollection(COLLECTION_NAME, numShards, > repFactor, >configName, null, null, > collectionProperties)); > {code} > {panel} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org