This could be because Zookeeper ensemble is not properly configured. Using
a very similar setup which consists of ZK cluster of three hosts and one
Solr Cloud node (all are containers), the system got running. Each ZK host
has ZOO_MY_ID and ZOO_SERVERS environment variables set before running ZK.
In this case, the former variable value would be from 1 to 3 on each host
and the latter would be "server.1=z1:2888:3888;2181
server.2=z2:2888:3888;2181 server.3=z3:2888:3888;2181" the same on all
hosts (the double quotes may be needed for proper parsing). This
ZOO_SERVERS syntax is for ZK version 3.5. 3.4 is slightly different.

http://aadel.io

On Fri, Oct 18, 2019 at 5:28 PM Drew Kidder <dre...@gmail.com> wrote:

> Thank you all for your suggestions! I appreciate the fast turnaround.
>
> My setup is using Amazon ECS for our solr cloud installation. Each ZK is in
> its own container, using Route53 Service Discovery to provide the DNS name.
> The ZK nodes can all talk to each other, and I can communicate to each one
> of those nodes from my local machine and from within the solr container.
> Solr is one node per container, as Martijn correctly assumed. I am not
> using a zkRoot at present because my intention is to use ZK solely for Solr
> Cloud and nothing else.
>
> I have tried removing the "-z" option from the Dockerfile CMD and using the
> ZK_HOST environment variable (see below). I have even also modified the
> solr.in.sh and set the ZK_HOST variable there, all to no avail. I have
> tried both the Dockerfile command route, and have logged into the solr
> container and tried to run the CMD manually to see if there was a problem
> with the way I was using the CMD entry. All of those methods give me the
> same result output captured in the gist below.
>
> The gist for my solr.log output is here:
> https://gist.github.com/dkidder/2db9a6d393dedb97a39ed32e2be0c087
>
> My Dockerfile for the solr container looks like this:
>
>
> FROM    solr:8.2
>
> EXPOSE    8983 8999 2181
>
> VOLUME    /app/logs
> VOLUME    /app/data
> VOLUME    /app/conf
>
> ## add our jetty configuration (increased request size!)
> COPY   jetty.xml /opt/solr/server/etc/
>
> ## SolrCloud configuration
> ENV     ZK_HOST zk1:2181,zk2:2181,zk3:2181
> ENV     ZK_CLIENT_TIMEOUT 30000
>
> USER   root
> RUN    apt-get update
> RUN    apt-get install -y netcat net-tools vim procps
> USER   solr
>
> # Copy over custom solr plugins
> COPY    myplugins/src/resources/* /opt/solr/server/solr/my-resources/
> COPY    lib/*.jar /opt/solr/my-lib/
>
> # Copy over my configs
> COPY    conf/ /app/conf
>
> #Start solr in cloud mode, connecting to zookeeper
> CMD       ["solr","start","-f","-c"]
>
> The docker command I use to execute this Dockerfile is `docker run -p
> 8983:8983 -p 2181:2181 --name $(APP_NAME) $(APP_NAME):latest`
>
> Output of `ps -eflww` from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# ps -eflww
> F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME
> CMD
> 4 S solr         1     0  9  80   0 - 1043842 -    14:36 ?        00:00:07
> /usr/local/openjdk-11/bin/java -server -Xms512m -Xmx512m -XX:+UseG1GC
> -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled
> -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch
>
> -Xlog:gc*:file=/var/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983
> -Dcom.sun.management.jmxremote.rmi.port=18983 -DzkClientTimeout=30000
> -DzkHost=zk1:2181,zk2:2181,zk3:2181 -Dsolr.log.dir=/var/solr/logs
> -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
> -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data
> -Dsolr.data.home= -Dsolr.install.dir=/opt/solr
> -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> -Dlog4j.configurationFile=file:/var/solr/log4j2.xml -Xss256k
> -Dsolr.jetty.https.port=8983 -jar start.jar --module=http
> 4 S root        90     0  0  80   0 -  4988 -      14:37 pts/0    00:00:00
> /bin/bash
> 0 R root        95    90  0  80   0 -  9595 -      14:37 pts/0    00:00:00
> ps -eflww
>
> Output of netstat from within the solr container (as root):
>
> root@fe0ad5b40b42:/opt/solr-8.2.0# netstat
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 fe0ad5b40b42:43678      172.20.28.179:2181
>  TIME_WAIT
> tcp        0      0 fe0ad5b40b42:60164      172.20.155.241:2181
> TIME_WAIT
> tcp        0      0 fe0ad5b40b42:60500      172.20.60.138:2181
>  TIME_WAIT
> Active UNIX domain sockets (w/o servers)
> Proto RefCnt Flags       Type       State         I-Node   Path
> unix  2      [ ]         STREAM     CONNECTED     129252
> unix  2      [ ]         STREAM     CONNECTED     129270
>
> I'm beginning to think that ZK is not setup correctly. I haven't uploaded
> any configuration files to ZK yet; my understanding was that I could start
> up a solr cloud node with no collections and upload the configuration from
> there. I was under the impression that it would try to connect to ZK and if
> it couldn't get config files from there it would use local config files. Do
> I need to upload the solr cloud configuration files to ZK before starting
> up the cluster?  The netstat output makes it look like the solr container
> is indeed connected to the ZK containers, but there's no indication as to
> why it cannot connect to Zookeeper that I can see.
>
> --
> Drew(i...@gmail.com)
> http://wyntermute.dyndns.org/blog/
>
> -- I Drive Way Too Fast To Worry About Cholesterol.
>
>
> On Fri, Oct 18, 2019 at 3:11 AM Martijn Koster <
> mak-luc...@greenhills.co.uk>
> wrote:
>
> >
> >
> > > On 18 Oct 2019, at 00:25, Drew Kidder <dre...@gmail.com> wrote:
> >
> > > * I'm using the following command line to start a basic solr cloud
> > instance
> > > as per the documentation: `bin/solr start -c -z
> > zk1:2181,zk2:2181,zk3:2181`
> >
> > I assume you’re just looking to run a single Solr node in a single
> > container, right?
> >
> > Just set the ZK_HOST environment variable, and remove the command-line
> > arguments.
> > And you don’t need to specify the port number unless you deviate from the
> > default.
> > Have a look at this example
> >
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml
> > <
> >
> https://github.com/docker-solr/docker-solr-examples/blob/master/swarm/docker-compose.yml#L61with
> > >
> >
> > The “start” command starts Solr in the background, which is typically not
> > what you want
> > when running Solr under docker.
> >
> >
> > Why your command isn’t working as is, is not clear. When you say you’re
> > using that
> > command-line, how do you actually do that? In a full docker command line,
> > or a compose file, or from a “docker exec”, or from some orchestrator.
> > Share the exact thing you’re doing; perhaps there is mistake there.
> > Also, run `ps -eflww` in the container to see what command-line arguments
> > the JVM actually got started with.
> > And share the full startup log somewhere (in a GitHub gist perhaps),
> there
> > might be something of interest earlier on.
> >
> > >> (running `echo ruok | nc zk1 2181` returns the expected "imok"
> response
> > >> from ZK within the docker container where Solr is located)
> > >> * The netcat command mentioned above shows up in the ZK logs, but the
> > Solr
> > >> attempts to connect do not (it's like the request isn't even getting
> to
> > ZK)
> >
> > Then it doesn’t sound like a environmental
> firewall/security-group/routing
> > issue.
> > Next step to debug then could be to check if you actually see Solr make
> > tcp connections
> > to port 2181, in the Solr container, using tcpdump/sysdig/netstat or some
> > such.
> > If that gives a negative result, then you know it’s an issue in your Solr
> > invocation config, or name resolution.
> > If that gives a positive result, then it’s environmental after all; and
> > you can dig further.
> >
> >
> > But try the ZK_HOST thing first; it may just fix it.
> >
> > — Martijn
>

Reply via email to