Re: SolrCloud installation troubles...
SELinux? Number open File limits? Number of Process limits? -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: SolrCloud installation troubles...
On 1/29/18 1:31 PM, Shawn Heisey wrote: On 1/29/2018 2:02 PM, Scott Prentice wrote: Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug? And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes. Garbage collection is one of the primary features of Java's memory management. It's not Solr or the OS. If the java heap is really enormous, you can end up with long pauses, but I wouldn't expect them to be frequent unless the index is also really huge. A very common issue that can cause even worse pause issues than a large heap is a heap that's too small, but not quite small enough to cause Java to completely run out of heap memory. The default max heap size in recent Solr versions is 512MB, which is very small. A Java program (which Solr is) can never use more heap memory than the maximum it is configured with, even if the machine has more memory available. This paragraph is included because you mentioned IP redirection: Extreme care must be used when setting up SolrCloud on virtual machines where accessing the VM has to go through any kind of IP translation. SolrCloud keeps track of how to reach each server in the cloud and if it stores an untranslated address when you need the translated address (or vice-versa), things are not going to work. Generally speaking translated addresses are going to be problematic for SolrCloud, and should not be used. Thanks, Shawn Thanks for the clarification. Yes, we're just using the default heap size for Solr, but there's no index (yet) and nothing really going on, so I'd hope that garbage collection isn't the problem. I'm putting my money on some IP translation issues (this is on a tightly controlled corporate network) or the fact that the 2888 and 2890 ports appear to not be open. I'll dig down the network issue path for now and see where that gets me. Thanks, ...scott
Re: SolrCloud installation troubles...
On 1/29/2018 2:02 PM, Scott Prentice wrote: Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug? And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes. Garbage collection is one of the primary features of Java's memory management. It's not Solr or the OS. If the java heap is really enormous, you can end up with long pauses, but I wouldn't expect them to be frequent unless the index is also really huge. A very common issue that can cause even worse pause issues than a large heap is a heap that's too small, but not quite small enough to cause Java to completely run out of heap memory. The default max heap size in recent Solr versions is 512MB, which is very small. A Java program (which Solr is) can never use more heap memory than the maximum it is configured with, even if the machine has more memory available. This paragraph is included because you mentioned IP redirection: Extreme care must be used when setting up SolrCloud on virtual machines where accessing the VM has to go through any kind of IP translation. SolrCloud keeps track of how to reach each server in the cloud and if it stores an untranslated address when you need the translated address (or vice-versa), things are not going to work. Generally speaking translated addresses are going to be problematic for SolrCloud, and should not be used. Thanks, Shawn
Re: SolrCloud installation troubles...
Looks like 2888 and 2890 are not open. At least they are not reported with a netstat -plunt .. could be the problem. Thanks, all! ...scott On 1/29/18 1:10 PM, Davis, Daniel (NIH/NLM) [C] wrote: Trying 127.0.0.1 could help. We kind of tend to think localhost is always 127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent of 127.0.0.1. I guess some environments can be strict enough to restrict communication on localhost; seems hard to imagine, but it does happen. -Original Message- From: Scott Prentice [mailto:s...@leximation.com] Sent: Monday, January 29, 2018 4:02 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud installation troubles... On 1/29/18 12:44 PM, Shawn Heisey wrote: On 1/29/2018 1:13 PM, Scott Prentice wrote: But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message .. Connection to Solr lost Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line .. ./bin/solr create -c test99 -n _default -s 2 -rf 2 I get this response .. ERROR: Failed to create collection 'test99' due to: {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti on:IOException occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio n:IOException occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio n:IOException occured when talking to server at: http://10.6.208.31:8983/solr} This sounds like either network connectivity problems or possibly issues caused by extreme garbage collection pauses that result in timeouts. Thanks, Shawn Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug? And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes. Thanks! ...scott
RE: SolrCloud installation troubles...
Trying 127.0.0.1 could help. We kind of tend to think localhost is always 127.0.0.1, but I've seen localhost start to resolve to ::1, the IPv6 equivalent of 127.0.0.1. I guess some environments can be strict enough to restrict communication on localhost; seems hard to imagine, but it does happen. -Original Message- From: Scott Prentice [mailto:s...@leximation.com] Sent: Monday, January 29, 2018 4:02 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud installation troubles... On 1/29/18 12:44 PM, Shawn Heisey wrote: > On 1/29/2018 1:13 PM, Scott Prentice wrote: >> But when I do the same thing on the Red Hat system it fails. Through >> the UI, it'll first time out with this message .. >> >> Connection to Solr lost >> >> Then after a refresh, the collection appears to have been partially >> created, but it's in the "Gone" state, and after some time, is >> deleted by an apparent cleanup process. If I try to create one >> through the command line .. >> >> ./bin/solr create -c test99 -n _default -s 2 -rf 2 >> >> I get this response .. >> >> ERROR: Failed to create collection 'test99' due to: >> {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerExcepti >> on:IOException occured when talking to server at: >> http://10.6.208.31:8984/solr, >> 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerExceptio >> n:IOException occured when talking to server at: >> http://10.6.208.31:8985/solr, >> 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerExceptio >> n:IOException occured when talking to server at: >> http://10.6.208.31:8983/solr} > > This sounds like either network connectivity problems or possibly > issues caused by extreme garbage collection pauses that result in > timeouts. > > Thanks, > Shawn > Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug? And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes. Thanks! ...scott
Re: SolrCloud installation troubles...
Interesting. I am using "localhost" in the config files (using the IP caused things to break even worse). But perhaps I should check with IT to make sure the ports are all open. Thanks, ...scott On 1/29/18 12:57 PM, Davis, Daniel (NIH/NLM) [C] wrote: To expand on that answer, you have to wonder what ports are open in the server system's port-based firewall.I have to ask my systems team to open ports for everything I'm using, especially when I move from localhost to outside. You should be able to "fake it out" if you set up your zookeeper configuration to use localhost ports. -Original Message- From: Scott Prentice [mailto:s...@leximation.com] Sent: Monday, January 29, 2018 3:13 PM To: solr-user@lucene.apache.org Subject: SolrCloud installation troubles... Using Solr 7.2.0 and Zookeeper 3.4.11 In an effort to move to a more robust Solr environment, I'm setting up a prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is all on one machine, but will eventually be 3 machines. This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To keep things simple, I'm not using our custom schema yet, but just creating a collection through the Solr Admin UI using Collections > Add Collection, using the "_default" config set. On the Ubuntu system, I can create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all seem alive and well. But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message .. Connection to Solr lost Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line .. ./bin/solr create -c test99 -n _default -s 2 -rf 2 I get this response .. ERROR: Failed to create collection 'test99' due to: {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8983/solr} I've seen other reports of errors like this but no solutions that seem to apply to my situation. Any thoughts? Thanks! ...scott
Re: SolrCloud installation troubles...
On 1/29/18 12:44 PM, Shawn Heisey wrote: On 1/29/2018 1:13 PM, Scott Prentice wrote: But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message .. Connection to Solr lost Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line .. ./bin/solr create -c test99 -n _default -s 2 -rf 2 I get this response .. ERROR: Failed to create collection 'test99' due to: {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8983/solr} This sounds like either network connectivity problems or possibly issues caused by extreme garbage collection pauses that result in timeouts. Thanks, Shawn Thanks, Shawn. I was wondering if there was something going on with IP redirection that was causing confusion. Any thoughts on how to debug? And, what do you mean by "extreme garbage collection pauses"? Is that Solr garbage collection or the OS itself? There's really nothing happening on this machine, it's purely for testing so there shouldn't be any extra load from other processes. Thanks! ...scott
RE: SolrCloud installation troubles...
To expand on that answer, you have to wonder what ports are open in the server system's port-based firewall.I have to ask my systems team to open ports for everything I'm using, especially when I move from localhost to outside. You should be able to "fake it out" if you set up your zookeeper configuration to use localhost ports. -Original Message- From: Scott Prentice [mailto:s...@leximation.com] Sent: Monday, January 29, 2018 3:13 PM To: solr-user@lucene.apache.org Subject: SolrCloud installation troubles... Using Solr 7.2.0 and Zookeeper 3.4.11 In an effort to move to a more robust Solr environment, I'm setting up a prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is all on one machine, but will eventually be 3 machines. This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To keep things simple, I'm not using our custom schema yet, but just creating a collection through the Solr Admin UI using Collections > Add Collection, using the "_default" config set. On the Ubuntu system, I can create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all seem alive and well. But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message .. Connection to Solr lost Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line .. ./bin/solr create -c test99 -n _default -s 2 -rf 2 I get this response .. ERROR: Failed to create collection 'test99' due to: {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8983/solr} I've seen other reports of errors like this but no solutions that seem to apply to my situation. Any thoughts? Thanks! ...scott
Re: SolrCloud installation troubles...
On 1/29/2018 1:13 PM, Scott Prentice wrote: But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message .. Connection to Solr lost Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line .. ./bin/solr create -c test99 -n _default -s 2 -rf 2 I get this response .. ERROR: Failed to create collection 'test99' due to: {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8983/solr} This sounds like either network connectivity problems or possibly issues caused by extreme garbage collection pauses that result in timeouts. Thanks, Shawn
SolrCloud installation troubles...
Using Solr 7.2.0 and Zookeeper 3.4.11 In an effort to move to a more robust Solr environment, I'm setting up a prototype system of 3 Solr servers and 3 Zookeeper servers. For now, this is all on one machine, but will eventually be 3 machines. This works fine on a Ubuntu 5.4.0-6 VM on my local system, but when I do the same setup on the company's network machine (a Red Hat 4.8.5-16 VM), I'm unable to create a collection. To keep things simple, I'm not using our custom schema yet, but just creating a collection through the Solr Admin UI using Collections > Add Collection, using the "_default" config set. On the Ubuntu system, I can create various collections .. 1 shard w/ 1 replication .. 2 shards w/ 3 replications .. 3 shards w/ 4 replications .. all seem alive and well. But when I do the same thing on the Red Hat system it fails. Through the UI, it'll first time out with this message .. Connection to Solr lost Then after a refresh, the collection appears to have been partially created, but it's in the "Gone" state, and after some time, is deleted by an apparent cleanup process. If I try to create one through the command line .. ./bin/solr create -c test99 -n _default -s 2 -rf 2 I get this response .. ERROR: Failed to create collection 'test99' due to: {10.6.208.31:8984_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8984/solr, 10.6.208.31:8985_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8985/solr, 10.6.208.31:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException occured when talking to server at: http://10.6.208.31:8983/solr} I've seen other reports of errors like this but no solutions that seem to apply to my situation. Any thoughts? Thanks! ...scott