This could be because Zookeeper ensemble is not properly configured. Using a very similar setup which consists of ZK cluster of three hosts and one Solr Cloud node (all are containers), the system got running. Each ZK host has ZOO_MY_ID and ZOO_SERVERS environment variables set before running ZK. In this case, the former variable value would be from 1 to 3 on each host and the latter would be "server.1=z1:2888:3888;2181 server.2=z2:2888:3888;2181 server.3=z3:2888:3888;2181" the same on all hosts (the double quotes may be needed for proper parsing). This ZOO_SERVERS syntax is for ZK version 3.5. 3.4 is slightly different.
http://aadel.io On Fri, Oct 18, 2019 at 5:28 PM Drew Kidder <dre...@gmail.com> wrote: > Thank you all for your suggestions! I appreciate the fast turnaround. > > My setup is using Amazon ECS for our solr cloud installation. Each ZK is in > its own container, using Route53 Service Discovery to provide the DNS name. > The ZK nodes can all talk to each other, and I can communicate to each one > of those nodes from my local machine and from within the solr container. > Solr is one node per container, as Martijn correctly assumed. I am not > using a zkRoot at present because my intention is to use ZK solely for Solr > Cloud and nothing else. > > I have tried removing the "-z" option from the Dockerfile CMD and using the > ZK_HOST environment variable (see below). I have even also modified the > solr.in.sh and set the ZK_HOST variable there, all to no avail. I have > tried both the Dockerfile command route, and have logged into the solr > container and tried to run the CMD manually to see if there was a problem > with the way I was using the CMD entry. All of those methods give me the > same result output captured in the gist below. > > The gist for my solr.log output is here: > https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087 > > My Dockerfile for the solr container looks like this: > > > FROM solr:8.2 > > EXPOSE 8983 8999 2181 > > VOLUME /app/logs > VOLUME /app/data > VOLUME /app/conf > > ## add our jetty configuration (increased request size!) > COPY jetty.xml /opt/solr/server/etc/ > > ## SolrCloud configuration > ENV ZK_HOST zk1:2181,zk2:2181,zk3:2181 > ENV ZK_CLIENT_TIMEOUT 30000 > > USER root > RUN apt-get update > RUN apt-get install -y netcat net-tools vim procps > USER solr > > # Copy over custom solr plugins > COPY myplugins/src/resources/* /opt/solr/server/solr/my-resources/ > COPY lib/*.jar /opt/solr/my-lib/ > > # Copy over my configs > COPY conf/ /app/conf > > #Start solr in cloud mode, connecting to zookeeper > CMD ["solr","start","-f","-c"] > > The docker command I use to execute this Dockerfile is `docker run -p > 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest` > > Output of `ps -eflww` from within the solr container (as root): > > root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww > F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME > CMD > 4 S solr 1 0 9 80 0 - 1043842 - 14:36 ? 00:00:07 > /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC > -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled > -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch > > -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.local.only=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.port=18983 > -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000 > -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC > -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data > -Dsolr.data.home= -Dsolr.install.dir=/opt/solr > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf > -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k > -Dsolr.jetty.https.port=8983 -jar start.jar --module=http > 4 S root 90 0 0 80 0 - 4988 - 14:37 pts/0 00:00:00 > /bin/bash > 0 R root 95 90 0 80 0 - 9595 - 14:37 pts/0 00:00:00 > ps -eflww > > Output of netstat from within the solr container (as root): > > root@fe0ad5b40b42:/opt/solr-8.2.0# netstat > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp 0 0 fe0ad5b40b42:43678 172.20.28.179:2181 > TIME_WAIT > tcp 0 0 fe0ad5b40b42:60164 172.20.155.241:2181 > TIME_WAIT > tcp 0 0 fe0ad5b40b42:60500 172.20.60.138:2181 > TIME_WAIT > Active UNIX domain sockets (w/o servers) > Proto RefCnt Flags Type State I-Node Path > unix 2 [ ] STREAM CONNECTED 129252 > unix 2 [ ] STREAM CONNECTED 129270 > > I'm beginning to think that ZK is not setup correctly. I haven't uploaded > any configuration files to ZK yet; my understanding was that I could start > up a solr cloud node with no collections and upload the configuration from > there. I was under the impression that it would try to connect to ZK and if > it couldn't get config files from there it would use local config files. Do > I need to upload the solr cloud configuration files to ZK before starting > up the cluster? The netstat output makes it look like the solr container > is indeed connected to the ZK containers, but there's no indication as to > why it cannot connect to Zookeeper that I can see. > > -- > Drew(i...@gmail.com) > http://wyntermute.dyndns.org/blog/ > > -- I Drive Way Too Fast To Worry About Cholesterol. > > > On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster < > mak-luc...@greenhills.co.uk> > wrote: > > > > > > > > On 18 Oct 2019, at 00:25, Drew Kidder <dre...@gmail.com> wrote: > > > > > * I'm using the following command line to start a basic solr cloud > > instance > > > as per the documentation: `bin/solr start -c -z > > zk1:2181,zk2:2181,zk3:2181` > > > > I assume you’re just looking to run a single Solr node in a single > > container, right? > > > > Just set the ZK_HOST environment variable, and remove the command-line > > arguments. > > And you don’t need to specify the port number unless you deviate from the > > default. > > Have a look at this example > > > https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml > > < > > > https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with > > > > > > > The “start” command starts Solr in the background, which is typically not > > what you want > > when running Solr under docker. > > > > > > Why your command isn’t working as is, is not clear. When you say you’re > > using that > > command-line, how do you actually do that? In a full docker command line, > > or a compose file, or from a “docker exec”, or from some orchestrator. > > Share the exact thing you’re doing; perhaps there is mistake there. > > Also, run `ps -eflww` in the container to see what command-line arguments > > the JVM actually got started with. > > And share the full startup log somewhere (in a GitHub gist perhaps), > there > > might be something of interest earlier on. > > > > >> (running `echo ruok | nc zk1 2181` returns the expected "imok" > response > > >> from ZK within the docker container where Solr is located) > > >> * The netcat command mentioned above shows up in the ZK logs, but the > > Solr > > >> attempts to connect do not (it's like the request isn't even getting > to > > ZK) > > > > Then it doesn’t sound like a environmental > firewall/security-group/routing > > issue. > > Next step to debug then could be to check if you actually see Solr make > > tcp connections > > to port 2181, in the Solr container, using tcpdump/sysdig/netstat or some > > such. > > If that gives a negative result, then you know it’s an issue in your Solr > > invocation config, or name resolution. > > If that gives a positive result, then it’s environmental after all; and > > you can dig further. > > > > > > But try the ZK_HOST thing first; it may just fix it. > > > > — Martijn >