from:"Ben DeMott \(JIRA\)"

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2018-02-21 Thread Ben DeMott (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben DeMott updated SOLR-10284:
--
Affects Version/s: 7.0
   7.1
   7.2

> Solr connection to Standalone node in Ensemble causes cluster failure
> -
>
> Key: SOLR-10284
> URL: https://issues.apache.org/jira/browse/SOLR-10284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.3, 6.4, 7.0, 7.1, 7.2
> Environment: Solrcloud, with Zookeeper 
>Reporter: Ben DeMott
>Priority: Major
>
> I posted this issue on the Dev mailing list and was encouraged to create a 
> Jira ticket.  This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an 
> ensemble cluster, which causes absolute havoc. 
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways.  I 
> just want to get consensus from the community about how to provide the best 
> solution.
> My original email describing the issue: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2
> Proposed Solution:
> My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
> default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
> connection or reconnection of the Zookeeper Client, it would ask the server 
> "are you standalone", and disconnect if it is and ZK_STANDALONE=false, and 
> try the next host.  If all hosts are in standalone, an error would be shown - 
> "No zookeeper hosts available, that aren't in standalone operation - The 
> setting ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"
> In order to urge users to use the setting, I would possibly also have a 
> warning shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
> connection string, and ZK_STANDALONE is not false.
> I can't think of any implicit way to internalize a setting Other than 
>  ZK_HOSTS connection string setting has multiple hosts, there should be no 
> scenario in which any node is standalone, so you could assume there should be 
> no standalone servers.  But maybe an explicit setting is preferable.
> This solution should be:
> 1.) backwards compatible
> 2.) have very little performance impact (1 extra call upon connection to ZK)
> 3.) isolated to one part of the code.
> *Update 6/26/2017:*
> I started working on this, and it occurred to me the same issue exists for 
> *SolrJ* clients.  So SolrJ might be the place to make this change. I'm not 
> sure yet.
> A SolrJ client that has a multi-zk-node connection string that connects (even 
> temporarily) to a zk host that is standalone will believe there are no Solr 
> hosts that can answer the query, and you'll get the following error.  
> {{CloudSolrClient - Request to collection efc-profiles-match-col failed due 
> to (510) org.apache.solr.common.SolrException: Could not find a healthy node 
> to handle the request.}}
> I am not as familiar with the SolrJ codebase ... so I'll have to do some 
> digging.
> Instead of moving onto a different Zookeeper host, the SolrJ client will 
> think everything is fully working, just no Solr Hosts or Collections
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-12013) collections API CUSTERSTATUS command fails when collections have errors

2018-02-21 Thread Ben DeMott (JIRA)

Ben DeMott created SOLR-12013:
-

 Summary: collections API CUSTERSTATUS command fails when 
collections have errors
 Key: SOLR-12013
 URL: https://issues.apache.org/jira/browse/SOLR-12013
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 7.2, 7.1, 7.0, 6.0
Reporter: Ben DeMott


CLUSTERSTATUS command can be given independent of a given collection.

http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS

I would expect that you can still inspect the status of a cluster even if a 
single collection has failed, or is missing its configuration.

*Expected behavior*: all healthy collections status is returned, unhealthy 
collections are either reported with a stacktrace in the response, reported in 
a failure state, or are not present from the response.

For example, CLUSTERSTATUS fails when a collection config-set is missing from 
ZooKeeper with:

{{*org.apache.solr.common.cloud.ZooKeeperException: Specified config does not 
exist in ZooKeeper: config-set-name*}}
{{ *at 
org.apache.solr.common.cloud.ZkStateReader.readConfigName(ZkStateReader.java:189)*}}
{{ at 
org.apache.solr.handler.admin.ClusterStatus.getClusterStatus(ClusterStatus.java:141)}}
{{ at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.lambda$static$19(CollectionsHandler.java:649)}}
{{ at 
org.apache.solr.handler.admin.CollectionsHandler$CollectionOperation.execute(CollectionsHandler.java:888)}}
{{ at 
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:226)}}
{{ at 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:213)}}
{{ at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)}}
{{ at org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:748)}}
{{ at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:729)}}
{{ at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:510)}}
{{ at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)}}
{{ at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)}}
{{ at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)}}
{{ at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)}}
{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)}}
{{ at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)}}
{{ at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)}}
{{ at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)}}
{{ at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)}}
{{ at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)}}
{{ at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)}}
{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)}}
{{ at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)}}
{{ at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)}}
{{ at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)}}
{{ at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)}}
{{ at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)}}
{{ at org.eclipse.jetty.server.Server.handle(Server.java:534)}}
{{ at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)}}
{{ at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)}}
{{ at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)}}
{{ at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)}}
{{ at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)}}
{{ at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)}}
{{ at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)}}
{{ at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)}}
{{ at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)}}
{{ at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)}}
{{ at java.lang.Thread.run(Thread.java:745)}}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11300) LatLonPointSpatialField does not implement getValueSource()

2017-09-19 Thread Ben DeMott (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172374#comment-16172374
 ] 

Ben DeMott commented on SOLR-11300:
---

Hi David, forgive my ignorance - I was alerted to this problem at work for a 
collection we have.

It seems you might be right - exists always returns true.  

On solr 6.5 with a LatLonType the query returns successfully with an exists() 
with a LatLonPointSpatialField the query throws an error.  Maybe throwing an 
error is an improvement and is the correct behavior if so this 'bug' can be 
closed.



> LatLonPointSpatialField does not implement getValueSource()
> ---
>
> Key: SOLR-11300
> URL: https://issues.apache.org/jira/browse/SOLR-11300
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spatial
>Affects Versions: 6.5, 6.6
>Reporter: Ben DeMott
>
> LatLonPointSpatialField replaces LatLonPoint 
> Documented in SOLR-10039
>  
> LatLonPointSpatialField doesn't implement 
> getValueSource(), which causes any
> query function like (*exists*, *default*, etc) to raise 
> ...
> {{"A ValueSource isn't directly available from this 
> field. Instead try a query using the distance as the score."}}
> Which is defined in the abstract class here:
> 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/schema/AbstractSpatialFieldType.java#L330
> 
> Note that query functions like this worked with 
> LatLonPoint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-11300) LatLonPointSpatialField does not implement getValueSource()

2017-08-30 Thread Ben DeMott (JIRA)

Ben DeMott created SOLR-11300:
-

 Summary: LatLonPointSpatialField does not implement 
getValueSource()
 Key: SOLR-11300
 URL: https://issues.apache.org/jira/browse/SOLR-11300
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: spatial
Affects Versions: 6.6, 6.5
Reporter: Ben DeMott


LatLonPointSpatialField replaces LatLonPoint 
Documented in SOLR-10039
 
LatLonPointSpatialField doesn't implement getValueSource(), 
which causes any
query function like (*exists*, *default*, etc) to raise ...
{{"A ValueSource isn't directly available from this field. 
Instead try a query using the distance as the score."}}
Which is defined in the abstract class here:

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/schema/AbstractSpatialFieldType.java#L330

Note that query functions like this worked with LatLonPoint.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6707) Recovery/election for invalid core results in rapid-fire re-attempts until /overseer/queue is clogged

2017-08-23 Thread Ben DeMott (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16139275#comment-16139275
 ] 

Ben DeMott commented on SOLR-6707:
--

We have experienced this multiple times.  We host inside AWS and Zookeeper is 
spread across different availability zones...
This means that the connection between ZK's has high latency once in awhile 
which ZK doesn't seem to like.  I wonder if anyone else is in this situation.
We've never had so many Zookeeper issues as we do now that we've moved our 
infrastructure inside AWS.

What triggered a backed up overseer queue for us was a hung ephemeral node in 
Zookeeper which I discuss here:
https://stackoverflow.com/questions/23743424/solr-issue-clusterstate-says-we-are-the-leader-but-locally-we-dont-think-so/42210844#42210844

As OP said, once this goes on for long enough Solr runs out of 
file-descriptors, and eventually brings down the whole cluster.

This bug in Zookeeper (appears) to be the cause of the hung ephemeral node:
https://issues.apache.org/jira/browse/ZOOKEEPER-2355

> Recovery/election for invalid core results in rapid-fire re-attempts until 
> /overseer/queue is clogged
> -
>
> Key: SOLR-6707
> URL: https://issues.apache.org/jira/browse/SOLR-6707
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.10
>Reporter: James Hardwick
> Fix For: 5.2, 6.0
>
>
> We experienced an issue the other day that brought a production solr server 
> down, and this is what we found after investigating:
> - Running solr instance with two separate cores, one of which is perpetually 
> down because it's configs are not yet completely updated for Solr-cloud. This 
> was thought to be harmless since it's not currently in use. 
> - Solr experienced an "internal server error" supposedly because of "No space 
> left on device" even though we appeared to have ~10GB free. 
> - Solr immediately went into recovery, and subsequent leader election for 
> each shard of each core. 
> - Our primary core recovered immediately. Our additional core which was never 
> active in the first place, attempted to recover but of course couldn't due to 
> the improper configs. 
> - Solr then began rapid-fire reattempting recovery of said node, trying maybe 
> 20-30 times per second.
> - This in turn bombarded zookeepers /overseer/queue into oblivion
> - At some point /overseer/queue becomes so backed up that normal cluster 
> coordination can no longer play out, and Solr topples over. 
> I know this is a bit of an unusual circumstance due to us keeping the dead 
> core around, and our quick solution has been to remove said core. However I 
> can see other potential scenarios that might cause the same issue to arise. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-06-26 Thread Ben DeMott (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben DeMott updated SOLR-10284:
--
Description:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways. I
just want to get consensus from the community about how to provide the best
solution.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would
default to TRUE for the solr.in.sh file found next to bin/solr). Upon
connection or reconnection of the Zookeeper Client, it would ask the server
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try
the next host. If all hosts are in standalone, an error would be shown - "No
zookeeper hosts available, that aren't in standalone operation - The setting
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than
ZK_HOSTS connection string setting has multiple hosts, there should be no
scenario in which any node is standalone, so you could assume there should be
no standalone servers. But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for
*SolrJ* clients. So SolrJ might be the place to make this change. I'm not sure
yet.
A SolrJ client that has a multi-zk-node connection string that connects (even
temporarily) to a zk host that is standalone will believe there are no Solr
hosts that can answer the query, and you'll get the following error.

{{CloudSolrClient - Request to collection efc-profiles-match-col failed due to
(510) org.apache.solr.common.SolrException: Could not find a healthy node to
handle the request.}}

I am not as familiar with the SolrJ codebase ... so I'll have to do some
digging.

Instead of moving onto a different Zookeeper host, the SolrJ client will think
everything is fully working, just no Solr Hosts or Collections

was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for
*SolrJ* clients. So SolrJ might be the place to make this change. I'm not sure
yet.
A SolrJ client that has

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-06-26 Thread Ben DeMott (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben DeMott updated SOLR-10284:
--
Description:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

{{CloudSolrClient - Request to collection efc-profiles-match-col failed due to
(510) org.apache.solr.common.SolrException: Could not find a healthy node to
handle the request.}}

I am not as familiar with the SolrJ codebase ... so I'll have to do some
digging.

Instead of moving onto a different Zookeeper host, the SolrJ client will think
everything is fully working, just no collections.

was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for
*SolrJ * clients. So SolrJ might be the place to make this change. I'm not
sure yet.
A SolrJ client that has a

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-06-26 Thread Ben DeMott (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben DeMott updated SOLR-10284:
--
Description:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for
*SolrJ * clients. So SolrJ might be the place to make this change. I'm not
sure yet.
A SolrJ client that has a multi-zk-node connection string that connects (even
temporarily) to a zk host that is standalone will believe there are no Solr
hosts that can answer the query, and you'll get the following error.

{{CloudSolrClient - Request to collection efc-profiles-match-col failed due to
(510) org.apache.solr.common.SolrException: Could not find a healthy node to
handle the request.}}

I am not as familiar with the SolrJ codebase ... so I'll have to do some
digging.

Instead of moving onto a different Zookeeper host, the SolrJ client will think
everything is fully working, just no collections.

was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for
SolrJ clients. So SolrJ might be the place to make this change. I'm not sure
yet.
A SolrJ client that has a

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-06-26 Thread Ben DeMott (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ben DeMott updated SOLR-10284:
--
Description:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

*Update 6/26/2017:*

I started working on this, and it occurred to me the same issue exists for
SolrJ clients. So SolrJ might be the place to make this change. I'm not sure
yet.
A SolrJ client that has a multi-zk-node connection string that connects (even
temporarily) to a zk host that is standalone will think there are no solr hosts
available to satisfy the request, or it will believe there are no solr hosts
that can answer the query, and you'll get the following error.

``CloudSolrClient - Request to collection efc-profiles-match-col failed due to
(510) org.apache.solr.common.SolrException: Could not find a healthy node to
handle the request.``

I am not as familiar with the SolrJ codebase ... so I'll have to do some
digging.

Instead of moving onto a different Zookeeper host, the SolrJ client will think
everything is fully working, just no collections.

was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira
ticket. This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble
cluster, which causes absolute havoc.

My original email describing the issue:
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

> Solr connection to Standalone node in Ensemble causes cluster failure
> -

[jira] [Commented] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-03-23 Thread Ben DeMott (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938890#comment-15938890
 ] 

Ben DeMott commented on SOLR-10284:
---

bq. Keep it simple? boolean allowStandaloneZk = numZkHosts == 1;

Totally agree, just wasn't sure if there were any corner cases that this would 
disagree with, sounds like a plan.

I'll update here with my progress on the patch and any further notes/questions. 
 Thanks for the input.

> Solr connection to Standalone node in Ensemble causes cluster failure
> -
>
> Key: SOLR-10284
> URL: https://issues.apache.org/jira/browse/SOLR-10284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.3, 6.4
> Environment: Solrcloud, with Zookeeper 
>Reporter: Ben DeMott
>
> I posted this issue on the Dev mailing list and was encouraged to create a 
> Jira ticket.  This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an 
> ensemble cluster, which causes absolute havoc. 
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways.  I 
> just want to get consensus from the community about how to provide the best 
> solution.
> My original email describing the issue: 
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2
> Proposed Solution:
> My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
> default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
> connection or reconnection of the Zookeeper Client, it would ask the server 
> "are you standalone", and disconnect if it is and ZK_STANDALONE=false, and 
> try the next host.  If all hosts are in standalone, an error would be shown - 
> "No zookeeper hosts available, that aren't in standalone operation - The 
> setting ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"
> In order to urge users to use the setting, I would possibly also have a 
> warning shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
> connection string, and ZK_STANDALONE is not false.
> I can't think of any implicit way to internalize a setting Other than 
>  ZK_HOSTS connection string setting has multiple hosts, there should be no 
> scenario in which any node is standalone, so you could assume there should be 
> no standalone servers.  But maybe an explicit setting is preferable.
> This solution should be:
> 1.) backwards compatible
> 2.) have very little performance impact (1 extra call upon connection to ZK)
> 3.) isolated to one part of the code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-03-14 Thread Ben DeMott (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben DeMott updated SOLR-10284:
--
Description: 
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) isolated to one part of the code.

  was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) be isolated to one part of the code.


> Solr connection to Standalone node in Ensemble causes cluster failure
> -
>
> Key: SOLR-10284
> URL: https://issues.apache.org/jira/browse/SOLR-10284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.3, 6.4
> Environment: Solrcloud, with Zookeeper 
>Reporter: Ben DeMott
>
> I posted this issue on the Dev mailing list and was encouraged to create a 
> Jira ticket.  This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an 
> ensemble cluster, which causes absolute havoc. 
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways.  I 
> just want to get consensus from the community about how to provide the best 
> solution.
> My original email

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-03-14 Thread Ben DeMott (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben DeMott updated SOLR-10284:
--
Description: 
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) be isolated to one part of the code.

  was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

Hi Jan,

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) be isolated to one part of the code.


> Solr connection to Standalone node in Ensemble causes cluster failure
> -
>
> Key: SOLR-10284
> URL: https://issues.apache.org/jira/browse/SOLR-10284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.3, 6.4
> Environment: Solrcloud, with Zookeeper 
>Reporter: Ben DeMott
>
> I posted this issue on the Dev mailing list and was encouraged to create a 
> Jira ticket.  This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an 
> ensemble cluster, which causes absolute havoc. 
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways.  I 
> just want to get consensus from the community about how to provide the best 
> solution.
> My

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-03-14 Thread Ben DeMott (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-10284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben DeMott updated SOLR-10284:
--
Description: 
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/raw/%3CCACbtCQ2cSPA8NbnqCbXZE9nZdT40xFHjpUhAOqUnd%3DqZaRMEsA%40mail.gmail.com%3E/2

Proposed Solution:

Hi Jan,

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) be isolated to one part of the code.

  was:
I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/browser

Proposed Solution:

Hi Jan,

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) be isolated to one part of the code.


> Solr connection to Standalone node in Ensemble causes cluster failure
> -
>
> Key: SOLR-10284
> URL: https://issues.apache.org/jira/browse/SOLR-10284
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 6.3, 6.4
> Environment: Solrcloud, with Zookeeper 
>Reporter: Ben DeMott
>
> I posted this issue on the Dev mailing list and was encouraged to create a 
> Jira ticket.  This isn't a bug per-se.
> Solr connects / reconnects to "Standalone" Zookeeper nodes, within an 
> ensemble cluster, which causes absolute havoc. 
> I work for Dice.com, as one of the core search developers.
> I'm happy to write a patch, as we'll probably do that internally anyways.  I 
> just want to get consensus from the community about how to provide the best 
> solution.
> My original email describing the issue: 
>

[jira] [Created] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

2017-03-14 Thread Ben DeMott (JIRA)

Ben DeMott created SOLR-10284:
-

 Summary: Solr connection to Standalone node in Ensemble causes 
cluster failure
 Key: SOLR-10284
 URL: https://issues.apache.org/jira/browse/SOLR-10284
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 6.3, 6.4
 Environment: Solrcloud, with Zookeeper 
Reporter: Ben DeMott


I posted this issue on the Dev mailing list and was encouraged to create a Jira 
ticket.  This isn't a bug per-se.

Solr connects / reconnects to "Standalone" Zookeeper nodes, within an ensemble 
cluster, which causes absolute havoc. 

I work for Dice.com, as one of the core search developers.
I'm happy to write a patch, as we'll probably do that internally anyways.  I 
just want to get consensus from the community about how to provide the best 
solution.

My original email describing the issue: 
http://mail-archives.apache.org/mod_mbox/lucene-dev/201703.mbox/browser

Proposed Solution:

Hi Jan,

My thought was an explicit setting in solr.in.sh "ZK_STANDALONE" (which would 
default to TRUE for the solr.in.sh file found next to bin/solr).  Upon 
connection or reconnection of the Zookeeper Client, it would ask the server 
"are you standalone", and disconnect if it is and ZK_STANDALONE=false, and try 
the next host.  If all hosts are in standalone, an error would be shown - "No 
zookeeper hosts available, that aren't in standalone operation - The setting 
ZK_STANDALONE=false prevents connecting to a standalone Zookeeper"

In order to urge users to use the setting, I would possibly also have a warning 
shown in the logs, if your ZK_HOSTS is set, has multiple hosts in the 
connection string, and ZK_STANDALONE is not false.

I can't think of any implicit way to internalize a setting Other than  
ZK_HOSTS connection string setting has multiple hosts, there should be no 
scenario in which any node is standalone, so you could assume there should be 
no standalone servers.  But maybe an explicit setting is preferable.

This solution should be:
1.) backwards compatible
2.) have very little performance impact (1 extra call upon connection to ZK)
3.) be isolated to one part of the code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Created] (SOLR-12013) collections API CUSTERSTATUS command fails when collections have errors

[jira] [Commented] (SOLR-11300) LatLonPointSpatialField does not implement getValueSource()

[jira] [Created] (SOLR-11300) LatLonPointSpatialField does not implement getValueSource()

[jira] [Commented] (SOLR-6707) Recovery/election for invalid core results in rapid-fire re-attempts until /overseer/queue is clogged

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Commented] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Updated] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

[jira] [Created] (SOLR-10284) Solr connection to Standalone node in Ensemble causes cluster failure

14 matches

Site Navigation

Mail list logo

Footer information