Re: Clustered nifi issues
I see what's happening. The container sets up a /root/.nifi-cli.config file that has the required security parameters so that the user doesn't have to supply them on the command line. From: Bryan Bende Sent: Wednesday, October 14, 2020 10:45 AM To: users@nifi.apache.org Subject: Re: Clustered nifi issues The CLI does not use nifi.properties, there are several ways of passing in config... https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html#property-argument-handling On Wed, Oct 14, 2020 at 10:01 AM Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: That makes sense. It must be reading the keystore/truststore specified in the nifi.properties file then? From: Bryan Bende mailto:bbe...@gmail.com>> Sent: Wednesday, October 14, 2020 9:59 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues The get-nodes command calls the REST resource /controller/cluster which authorizes against READ on /controller [1], so there is no way you can call this in a secure environment without authenticating somehow, which from the CLI means specifying a keystore/truststore. [1] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/ControllerResource.java#L857 On Wed, Oct 14, 2020 at 9:26 AM Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: Yes, this is for a secured cluster deployed as a Kubernetes stateful set. The certificate parameters are apparently not needed to just get the status of the nodes using the command below. From: Sushil Kumar mailto:skm@gmail.com>> Sent: Tuesday, October 13, 2020 4:01 PM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues Did you say that the same line of code works fine for secured clusters too. I asked because nifi-toolkit has a separate set of parameters asking for certificates and everything else related to secure clusters. On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: I found that instead of dealing with nifi client certificate hell, the nifi-toolkit cli.sh will work just fine for testing the readiness of the cluster. Here is my readiness script which seems to work just fine with in kubernetes with the apache/nifi docker container version 1.12.1 #!/bin/bash $NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state if [ $? -ne 0 ]; then cat /tmp/cluster.state exit 1 fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi From: Chris Sampson mailto:chris.samp...@naimuri.com>> Sent: Thursday, October 1, 2020 9:03 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues For info, the probes we currently use for our StatefulSet Pods are: * livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) * readinessProbe - exec command to curl the nifi-api/controller/cluster endpoint to check the node's cluster connection status, e.g.: readinessProbe: exec: command: - bash - -c - | if [ "${SECURE}" = "true" ]; then INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') curl -v \ --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ --cert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem \ --key ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem \ https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state else curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to call the endpoint and for whom our pod contains a set of certificate files in the indicated locations (generated from NiFi Toolkit in an init-container before the Pod starts); jq utility was added into our customised version of the apache/nifi Docker Image. --- Chris Sampson IT Consultant chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com> [X]<https://www.naimuri.com/>
Re: Clustered nifi issues
The CLI does not use nifi.properties, there are several ways of passing in config... https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html#property-argument-handling On Wed, Oct 14, 2020 at 10:01 AM Wyll Ingersoll < wyllys.ingers...@keepertech.com> wrote: > That makes sense. It must be reading the keystore/truststore specified in > the nifi.properties file then? > -- > *From:* Bryan Bende > *Sent:* Wednesday, October 14, 2020 9:59 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > The get-nodes command calls the REST resource /controller/cluster which > authorizes against READ on /controller [1], so there is no way you can call > this in a secure environment without authenticating somehow, which from the > CLI means specifying a keystore/truststore. > > [1] > https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/ControllerResource.java#L857 > > On Wed, Oct 14, 2020 at 9:26 AM Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > Yes, this is for a secured cluster deployed as a Kubernetes stateful set. > The certificate parameters are apparently not needed to just get the status > of the nodes using the command below. > > > > -- > *From:* Sushil Kumar > *Sent:* Tuesday, October 13, 2020 4:01 PM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > Did you say that the same line of code works fine for secured clusters > too. > I asked because nifi-toolkit has a separate set of parameters asking for > certificates and everything else related to secure clusters. > > > On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > > I found that instead of dealing with nifi client certificate hell, the > nifi-toolkit cli.sh will work just fine for testing the readiness of the > cluster. Here is my readiness script which seems to work just fine with in > kubernetes with the apache/nifi docker container version 1.12.1 > > > #!/bin/bash > > > $NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state > > if [ $? -ne 0 ]; then > > cat /tmp/cluster.state > > exit 1 > > fi > > STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") > or .address==\"localhost\") | .status" /tmp/cluster.state) > > if [[ ! $STATUS = "CONNECTED" ]]; then > > echo "Node not found with CONNECTED state. Full cluster state:" > > jq . /tmp/cluster.state > > exit 1 > > fi > > > -- > *From:* Chris Sampson > *Sent:* Thursday, October 1, 2020 9:03 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > For info, the probes we currently use for our StatefulSet Pods are: > >- livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) >- readinessProbe - exec command to curl the >nifi-api/controller/cluster endpoint to check the node's cluster connection >status, e.g.: > > readinessProbe: > exec: > command: > - bash > - -c > - | > if [ "${SECURE}" = "true" ]; then > INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' > | tr ' ' '-') > > curl -v \ > --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ > --cert > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem > \ > --key > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem > \ > https://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > else > curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > fi > > STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") > or .address==\"localhost\") | .status" /tmp/cluster.state) > > if [[ ! $STATUS = "CONNECTED" ]]; then > echo "Node not found with CONNECTED state. Full cluster state:" > jq . /tmp/cluster.state > exit 1 > fi > > > Note that INITIAL_ADMIN is the CN of a user with appropriate permissions > to call the endpoint and for whom our pod contains a set of certificate > files in the indicated locations (generated from NiFi Toolkit in an > init-container before the Pod starts); jq utility was added into our > customised version of the apache/nifi Docker Image. > > > --- > *Chris Sampson* > IT Consultant > chris.samp...@naimuri.com > <https://www.naimuri.com/> > > >
Re: Clustered nifi issues
That makes sense. It must be reading the keystore/truststore specified in the nifi.properties file then? From: Bryan Bende Sent: Wednesday, October 14, 2020 9:59 AM To: users@nifi.apache.org Subject: Re: Clustered nifi issues The get-nodes command calls the REST resource /controller/cluster which authorizes against READ on /controller [1], so there is no way you can call this in a secure environment without authenticating somehow, which from the CLI means specifying a keystore/truststore. [1] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/ControllerResource.java#L857 On Wed, Oct 14, 2020 at 9:26 AM Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: Yes, this is for a secured cluster deployed as a Kubernetes stateful set. The certificate parameters are apparently not needed to just get the status of the nodes using the command below. From: Sushil Kumar mailto:skm@gmail.com>> Sent: Tuesday, October 13, 2020 4:01 PM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues Did you say that the same line of code works fine for secured clusters too. I asked because nifi-toolkit has a separate set of parameters asking for certificates and everything else related to secure clusters. On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: I found that instead of dealing with nifi client certificate hell, the nifi-toolkit cli.sh will work just fine for testing the readiness of the cluster. Here is my readiness script which seems to work just fine with in kubernetes with the apache/nifi docker container version 1.12.1 #!/bin/bash $NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state if [ $? -ne 0 ]; then cat /tmp/cluster.state exit 1 fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi From: Chris Sampson mailto:chris.samp...@naimuri.com>> Sent: Thursday, October 1, 2020 9:03 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues For info, the probes we currently use for our StatefulSet Pods are: * livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) * readinessProbe - exec command to curl the nifi-api/controller/cluster endpoint to check the node's cluster connection status, e.g.: readinessProbe: exec: command: - bash - -c - | if [ "${SECURE}" = "true" ]; then INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') curl -v \ --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ --cert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem \ --key ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem \ https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state else curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to call the endpoint and for whom our pod contains a set of certificate files in the indicated locations (generated from NiFi Toolkit in an init-container before the Pod starts); jq utility was added into our customised version of the apache/nifi Docker Image. --- Chris Sampson IT Consultant chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com> [X]<https://www.naimuri.com/>
Re: Clustered nifi issues
The get-nodes command calls the REST resource /controller/cluster which authorizes against READ on /controller [1], so there is no way you can call this in a secure environment without authenticating somehow, which from the CLI means specifying a keystore/truststore. [1] https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/ControllerResource.java#L857 On Wed, Oct 14, 2020 at 9:26 AM Wyll Ingersoll < wyllys.ingers...@keepertech.com> wrote: > Yes, this is for a secured cluster deployed as a Kubernetes stateful set. > The certificate parameters are apparently not needed to just get the status > of the nodes using the command below. > > > > -- > *From:* Sushil Kumar > *Sent:* Tuesday, October 13, 2020 4:01 PM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > Did you say that the same line of code works fine for secured clusters > too. > I asked because nifi-toolkit has a separate set of parameters asking for > certificates and everything else related to secure clusters. > > > On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > > I found that instead of dealing with nifi client certificate hell, the > nifi-toolkit cli.sh will work just fine for testing the readiness of the > cluster. Here is my readiness script which seems to work just fine with in > kubernetes with the apache/nifi docker container version 1.12.1 > > > #!/bin/bash > > > $NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state > > if [ $? -ne 0 ]; then > > cat /tmp/cluster.state > > exit 1 > > fi > > STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") > or .address==\"localhost\") | .status" /tmp/cluster.state) > > if [[ ! $STATUS = "CONNECTED" ]]; then > > echo "Node not found with CONNECTED state. Full cluster state:" > > jq . /tmp/cluster.state > > exit 1 > > fi > > > -- > *From:* Chris Sampson > *Sent:* Thursday, October 1, 2020 9:03 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > For info, the probes we currently use for our StatefulSet Pods are: > >- livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) >- readinessProbe - exec command to curl the >nifi-api/controller/cluster endpoint to check the node's cluster connection >status, e.g.: > > readinessProbe: > exec: > command: > - bash > - -c > - | > if [ "${SECURE}" = "true" ]; then > INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' > | tr ' ' '-') > > curl -v \ > --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ > --cert > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem > \ > --key > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem > \ > https://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > else > curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > fi > > STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") > or .address==\"localhost\") | .status" /tmp/cluster.state) > > if [[ ! $STATUS = "CONNECTED" ]]; then > echo "Node not found with CONNECTED state. Full cluster state:" > jq . /tmp/cluster.state > exit 1 > fi > > > Note that INITIAL_ADMIN is the CN of a user with appropriate permissions > to call the endpoint and for whom our pod contains a set of certificate > files in the indicated locations (generated from NiFi Toolkit in an > init-container before the Pod starts); jq utility was added into our > customised version of the apache/nifi Docker Image. > > > --- > *Chris Sampson* > IT Consultant > chris.samp...@naimuri.com > <https://www.naimuri.com/> > > >
Re: Clustered nifi issues
Yes, this is for a secured cluster deployed as a Kubernetes stateful set. The certificate parameters are apparently not needed to just get the status of the nodes using the command below. From: Sushil Kumar Sent: Tuesday, October 13, 2020 4:01 PM To: users@nifi.apache.org Subject: Re: Clustered nifi issues Did you say that the same line of code works fine for secured clusters too. I asked because nifi-toolkit has a separate set of parameters asking for certificates and everything else related to secure clusters. On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: I found that instead of dealing with nifi client certificate hell, the nifi-toolkit cli.sh will work just fine for testing the readiness of the cluster. Here is my readiness script which seems to work just fine with in kubernetes with the apache/nifi docker container version 1.12.1 #!/bin/bash $NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state if [ $? -ne 0 ]; then cat /tmp/cluster.state exit 1 fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi From: Chris Sampson mailto:chris.samp...@naimuri.com>> Sent: Thursday, October 1, 2020 9:03 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues For info, the probes we currently use for our StatefulSet Pods are: * livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) * readinessProbe - exec command to curl the nifi-api/controller/cluster endpoint to check the node's cluster connection status, e.g.: readinessProbe: exec: command: - bash - -c - | if [ "${SECURE}" = "true" ]; then INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') curl -v \ --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ --cert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem \ --key ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem \ https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state else curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to call the endpoint and for whom our pod contains a set of certificate files in the indicated locations (generated from NiFi Toolkit in an init-container before the Pod starts); jq utility was added into our customised version of the apache/nifi Docker Image. --- Chris Sampson IT Consultant chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com> [https://docs.google.com/uc?export=download=1oPtzd0P7DqtuzpjiTRAa6h6coFitpqom=0B9aXwC5rMc6lVlZ2OWpUaVlFVmUwTlZBdjQ0KzAxb1dZS2hJPQ]<https://www.naimuri.com/>
Re: Clustered nifi issues
Did you say that the same line of code works fine for secured clusters too. I asked because nifi-toolkit has a separate set of parameters asking for certificates and everything else related to secure clusters. On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll < wyllys.ingers...@keepertech.com> wrote: > > I found that instead of dealing with nifi client certificate hell, the > nifi-toolkit cli.sh will work just fine for testing the readiness of the > cluster. Here is my readiness script which seems to work just fine with in > kubernetes with the apache/nifi docker container version 1.12.1 > > > #!/bin/bash > > > $NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state > > if [ $? -ne 0 ]; then > > cat /tmp/cluster.state > > exit 1 > > fi > > STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") > or .address==\"localhost\") | .status" /tmp/cluster.state) > > if [[ ! $STATUS = "CONNECTED" ]]; then > > echo "Node not found with CONNECTED state. Full cluster state:" > > jq . /tmp/cluster.state > > exit 1 > > fi > > > -- > *From:* Chris Sampson > *Sent:* Thursday, October 1, 2020 9:03 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > For info, the probes we currently use for our StatefulSet Pods are: > >- livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) >- readinessProbe - exec command to curl the >nifi-api/controller/cluster endpoint to check the node's cluster connection >status, e.g.: > > readinessProbe: > exec: > command: > - bash > - -c > - | > if [ "${SECURE}" = "true" ]; then > INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' > | tr ' ' '-') > > curl -v \ > --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ > --cert > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem > \ > --key > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem > \ > https://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > else > curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > fi > > STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") > or .address==\"localhost\") | .status" /tmp/cluster.state) > > if [[ ! $STATUS = "CONNECTED" ]]; then > echo "Node not found with CONNECTED state. Full cluster state:" > jq . /tmp/cluster.state > exit 1 > fi > > > Note that INITIAL_ADMIN is the CN of a user with appropriate permissions > to call the endpoint and for whom our pod contains a set of certificate > files in the indicated locations (generated from NiFi Toolkit in an > init-container before the Pod starts); jq utility was added into our > customised version of the apache/nifi Docker Image. > > > --- > *Chris Sampson* > IT Consultant > chris.samp...@naimuri.com > <https://www.naimuri.com/> > > > On Wed, 30 Sep 2020 at 16:43, Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > Thanks for following up and filing the issue. Unfortunately, I dont have > any of the logs from the original issue since I have since restarted and > rebooted my containers many times. > -- > *From:* Mark Payne > *Sent:* Wednesday, September 30, 2020 11:21 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > Thanks Wyll, > > I created a Jira [1] to address this. The NullPointer that you show in the > stack trace will prevent the node from reconnecting to the cluster. > Unfortunately, it’s a bug that needs to be addressed. It’s possible that > you may find a way to work around the issue, but I can’t tell you off the > top of my head what that would be. > > Can you check the logs for anything else from the StandardFlowService > class? That may help to understand why the null value is getting returned, > causing the NullPointerException that you’re seeing. > > Thanks > -Mark > > [1] https://issues.apache.org/jira/browse/NIFI-7866 > > On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > 1.11.4 > -- > *From:* Mark Payne > *Sent:* Wednesday, September 30, 2020 11:02 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > Wyll, > > What version of nifi are you running? >
Re: Clustered nifi issues
I found that instead of dealing with nifi client certificate hell, the nifi-toolkit cli.sh will work just fine for testing the readiness of the cluster. Here is my readiness script which seems to work just fine with in kubernetes with the apache/nifi docker container version 1.12.1 #!/bin/bash $NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state if [ $? -ne 0 ]; then cat /tmp/cluster.state exit 1 fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi From: Chris Sampson Sent: Thursday, October 1, 2020 9:03 AM To: users@nifi.apache.org Subject: Re: Clustered nifi issues For info, the probes we currently use for our StatefulSet Pods are: * livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) * readinessProbe - exec command to curl the nifi-api/controller/cluster endpoint to check the node's cluster connection status, e.g.: readinessProbe: exec: command: - bash - -c - | if [ "${SECURE}" = "true" ]; then INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') curl -v \ --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ --cert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem \ --key ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem \ https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state else curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to call the endpoint and for whom our pod contains a set of certificate files in the indicated locations (generated from NiFi Toolkit in an init-container before the Pod starts); jq utility was added into our customised version of the apache/nifi Docker Image. --- Chris Sampson IT Consultant chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com> [https://docs.google.com/uc?export=download=1oPtzd0P7DqtuzpjiTRAa6h6coFitpqom=0B9aXwC5rMc6lVlZ2OWpUaVlFVmUwTlZBdjQ0KzAxb1dZS2hJPQ]<https://www.naimuri.com/> On Wed, 30 Sep 2020 at 16:43, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: Thanks for following up and filing the issue. Unfortunately, I dont have any of the logs from the original issue since I have since restarted and rebooted my containers many times. From: Mark Payne mailto:marka...@hotmail.com>> Sent: Wednesday, September 30, 2020 11:21 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues Thanks Wyll, I created a Jira [1] to address this. The NullPointer that you show in the stack trace will prevent the node from reconnecting to the cluster. Unfortunately, it’s a bug that needs to be addressed. It’s possible that you may find a way to work around the issue, but I can’t tell you off the top of my head what that would be. Can you check the logs for anything else from the StandardFlowService class? That may help to understand why the null value is getting returned, causing the NullPointerException that you’re seeing. Thanks -Mark [1] https://issues.apache.org/jira/browse/NIFI-7866 On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: 1.11.4 From: Mark Payne mailto:marka...@hotmail.com>> Sent: Wednesday, September 30, 2020 11:02 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues Wyll, What version of nifi are you running? Thanks -Mark On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: * Yes - the host specific parameters on the different instances are configured correctly (nifi-0, nifi-1, nifi-2) * Yes - we have separate certificate for each node and the keystores are configured correctly. * Yes - we have a headless service in front of the STS cluster * No - I don't think there is an explicit liveness or readiness probe defined for the STS, perhaps I need to add one. Do you have an example? -Wyllys _____
Re: Clustered nifi issues
Thanks, I'll try it out. I had the liveness probe already, but wasn't quite sure what to check for in the readinessProbe. From: Chris Sampson Sent: Thursday, October 1, 2020 9:03 AM To: users@nifi.apache.org Subject: Re: Clustered nifi issues For info, the probes we currently use for our StatefulSet Pods are: * livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) * readinessProbe - exec command to curl the nifi-api/controller/cluster endpoint to check the node's cluster connection status, e.g.: readinessProbe: exec: command: - bash - -c - | if [ "${SECURE}" = "true" ]; then INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr ' ' '-') curl -v \ --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ --cert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem \ --key ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem \ https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state else curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state fi STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or .address==\"localhost\") | .status" /tmp/cluster.state) if [[ ! $STATUS = "CONNECTED" ]]; then echo "Node not found with CONNECTED state. Full cluster state:" jq . /tmp/cluster.state exit 1 fi Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to call the endpoint and for whom our pod contains a set of certificate files in the indicated locations (generated from NiFi Toolkit in an init-container before the Pod starts); jq utility was added into our customised version of the apache/nifi Docker Image. --- Chris Sampson IT Consultant chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com> [https://docs.google.com/uc?export=download=1oPtzd0P7DqtuzpjiTRAa6h6coFitpqom=0B9aXwC5rMc6lVlZ2OWpUaVlFVmUwTlZBdjQ0KzAxb1dZS2hJPQ]<https://www.naimuri.com/> On Wed, 30 Sep 2020 at 16:43, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: Thanks for following up and filing the issue. Unfortunately, I dont have any of the logs from the original issue since I have since restarted and rebooted my containers many times. From: Mark Payne mailto:marka...@hotmail.com>> Sent: Wednesday, September 30, 2020 11:21 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues Thanks Wyll, I created a Jira [1] to address this. The NullPointer that you show in the stack trace will prevent the node from reconnecting to the cluster. Unfortunately, it’s a bug that needs to be addressed. It’s possible that you may find a way to work around the issue, but I can’t tell you off the top of my head what that would be. Can you check the logs for anything else from the StandardFlowService class? That may help to understand why the null value is getting returned, causing the NullPointerException that you’re seeing. Thanks -Mark [1] https://issues.apache.org/jira/browse/NIFI-7866 On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: 1.11.4 From: Mark Payne mailto:marka...@hotmail.com>> Sent: Wednesday, September 30, 2020 11:02 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues Wyll, What version of nifi are you running? Thanks -Mark On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: * Yes - the host specific parameters on the different instances are configured correctly (nifi-0, nifi-1, nifi-2) * Yes - we have separate certificate for each node and the keystores are configured correctly. * Yes - we have a headless service in front of the STS cluster * No - I don't think there is an explicit liveness or readiness probe defined for the STS, perhaps I need to add one. Do you have an example? -Wyllys From: Chris Sampson mailto:chris.samp...@naimuri.com>> Sent: Tuesday, September 29, 2020 3:21 PM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues We started to have more stability when we switched to bitnami/zookeeper:3.5.7, but I suspect that's a red herring here. Your properties have nifi-0 in several places, so just to double check that the relevant properties are changed for each of the instances within your statefulset? For example: * nifi.remote.input.host * nifi.cluster.node.address * nifi.web.https.host Yes And are you using a separate (non-wildcard) certificate
Re: Clustered nifi issues
For info, the probes we currently use for our StatefulSet Pods are: - livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080) - readinessProbe - exec command to curl the nifi-api/controller/cluster endpoint to check the node's cluster connection status, e.g.: readinessProbe: > exec: > command: > - bash > - -c > - | > if [ "${SECURE}" = "true" ]; then > INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' > | tr ' ' '-') > > curl -v \ > --cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \ > --cert > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem > \ > --key > ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem > \ > https://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > else > curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > > /tmp/cluster.state > fi > > STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") > or .address==\"localhost\") | .status" /tmp/cluster.state) > > if [[ ! $STATUS = "CONNECTED" ]]; then > echo "Node not found with CONNECTED state. Full cluster state:" > jq . /tmp/cluster.state > exit 1 > fi > Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to call the endpoint and for whom our pod contains a set of certificate files in the indicated locations (generated from NiFi Toolkit in an init-container before the Pod starts); jq utility was added into our customised version of the apache/nifi Docker Image. --- *Chris Sampson* IT Consultant chris.samp...@naimuri.com <https://www.naimuri.com/> On Wed, 30 Sep 2020 at 16:43, Wyll Ingersoll < wyllys.ingers...@keepertech.com> wrote: > Thanks for following up and filing the issue. Unfortunately, I dont have > any of the logs from the original issue since I have since restarted and > rebooted my containers many times. > -- > *From:* Mark Payne > *Sent:* Wednesday, September 30, 2020 11:21 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > Thanks Wyll, > > I created a Jira [1] to address this. The NullPointer that you show in the > stack trace will prevent the node from reconnecting to the cluster. > Unfortunately, it’s a bug that needs to be addressed. It’s possible that > you may find a way to work around the issue, but I can’t tell you off the > top of my head what that would be. > > Can you check the logs for anything else from the StandardFlowService > class? That may help to understand why the null value is getting returned, > causing the NullPointerException that you’re seeing. > > Thanks > -Mark > > [1] https://issues.apache.org/jira/browse/NIFI-7866 > > On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > 1.11.4 > -- > *From:* Mark Payne > *Sent:* Wednesday, September 30, 2020 11:02 AM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > Wyll, > > What version of nifi are you running? > > Thanks > -Mark > > > On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > >- Yes - the host specific parameters on the different instances are >configured correctly (nifi-0, nifi-1, nifi-2) >- Yes - we have separate certificate for each node and the keystores >are configured correctly. >- Yes - we have a headless service in front of the STS cluster >- No - I don't think there is an explicit liveness or readiness probe >defined for the STS, perhaps I need to add one. Do you have an example? > > > -Wyllys > > > -- > *From:* Chris Sampson > *Sent:* Tuesday, September 29, 2020 3:21 PM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > We started to have more stability when we switched to > bitnami/zookeeper:3.5.7, but I suspect that's a red herring here. > > Your properties have nifi-0 in several places, so just to double check > that the relevant properties are changed for each of the instances within > your statefulset? > > For example: > * nifi.remote.input.host > * nifi.cluster.node.address > * nifi.web.https.host > > > Yes > > And are you using a separate (non-wildcard) certificate for each node? > > > Do you have liveness/readiness probes set on your nifi sts? > > > And are you using a headless service[1] to manage the cluster during > startup? > > > [1] > https://kub
Re: Clustered nifi issues
Thanks for following up and filing the issue. Unfortunately, I dont have any of the logs from the original issue since I have since restarted and rebooted my containers many times. From: Mark Payne Sent: Wednesday, September 30, 2020 11:21 AM To: users@nifi.apache.org Subject: Re: Clustered nifi issues Thanks Wyll, I created a Jira [1] to address this. The NullPointer that you show in the stack trace will prevent the node from reconnecting to the cluster. Unfortunately, it’s a bug that needs to be addressed. It’s possible that you may find a way to work around the issue, but I can’t tell you off the top of my head what that would be. Can you check the logs for anything else from the StandardFlowService class? That may help to understand why the null value is getting returned, causing the NullPointerException that you’re seeing. Thanks -Mark [1] https://issues.apache.org/jira/browse/NIFI-7866 On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: 1.11.4 From: Mark Payne mailto:marka...@hotmail.com>> Sent: Wednesday, September 30, 2020 11:02 AM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues Wyll, What version of nifi are you running? Thanks -Mark On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: * Yes - the host specific parameters on the different instances are configured correctly (nifi-0, nifi-1, nifi-2) * Yes - we have separate certificate for each node and the keystores are configured correctly. * Yes - we have a headless service in front of the STS cluster * No - I don't think there is an explicit liveness or readiness probe defined for the STS, perhaps I need to add one. Do you have an example? -Wyllys From: Chris Sampson mailto:chris.samp...@naimuri.com>> Sent: Tuesday, September 29, 2020 3:21 PM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues We started to have more stability when we switched to bitnami/zookeeper:3.5.7, but I suspect that's a red herring here. Your properties have nifi-0 in several places, so just to double check that the relevant properties are changed for each of the instances within your statefulset? For example: * nifi.remote.input.host * nifi.cluster.node.address * nifi.web.https.host Yes And are you using a separate (non-wildcard) certificate for each node? Do you have liveness/readiness probes set on your nifi sts? And are you using a headless service[1] to manage the cluster during startup? [1] https://kubernetes.io/docs/concepts/services-networking/service/#headless-services Cheers, Chris Sampson On Tue, 29 Sep 2020, 18:48 Wyll Ingersoll, mailto:wyllys.ingers...@keepertech.com>> wrote: Zookeeper is from the docker hub zookeeper:3.5.7 image. Below is our nifi.properties (with secrets and hostnames modified). thanks! - Wyllys nifi.flow.configuration.file=/opt/nifi/nifi-current/latest_flow/nifi-0/flow.xml.gz nifi.flow.configuration.archive.enabled=true nifi.flow.configuration.archive.dir=/opt/nifi/nifi-current/archives nifi.flow.configuration.archive.max.time=30 days nifi.flow.configuration.archive.max.storage=500 MB nifi.flow.configuration.archive.max.count= nifi.flowcontroller.autoResumeState=false nifi.flowcontroller.graceful.shutdown.period=10 sec nifi.flowservice.writedelay.interval=500 ms nifi.administrative.yield.duration=30 sec nifi.bored.yield.duration=10 millis nifi.queue.backpressure.count=1 nifi.queue.backpressure.size=1 GB nifi.authorizer.configuration.file=./conf/authorizers.xml nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml nifi.templates.directory=/opt/nifi/nifi-current/templates nifi.ui.banner.text=KI Nifi Cluster nifi.ui.autorefresh.interval=30 sec nifi.nar.library.directory=./lib nifi.nar.library.autoload.directory=./extensions nifi.nar.working.directory=./work/nar/ nifi.documentation.working.directory=./work/docs/components nifi.state.management.configuration.file=./conf/state-management.xml nifi.state.management.provider.local=local-provider nifi.state.management.provider.cluster=zk-provider nifi.state.management.embedded.zookeeper.start=false nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties nifi.database.directory=./database_repository nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog nifi.flowfile.repository.directory=./flowfile_repository nifi.flowfile.repository.partitions=2
Re: Clustered nifi issues
1.11.4 From: Mark Payne Sent: Wednesday, September 30, 2020 11:02 AM To: users@nifi.apache.org Subject: Re: Clustered nifi issues Wyll, What version of nifi are you running? Thanks -Mark On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: * Yes - the host specific parameters on the different instances are configured correctly (nifi-0, nifi-1, nifi-2) * Yes - we have separate certificate for each node and the keystores are configured correctly. * Yes - we have a headless service in front of the STS cluster * No - I don't think there is an explicit liveness or readiness probe defined for the STS, perhaps I need to add one. Do you have an example? -Wyllys From: Chris Sampson mailto:chris.samp...@naimuri.com>> Sent: Tuesday, September 29, 2020 3:21 PM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues We started to have more stability when we switched to bitnami/zookeeper:3.5.7, but I suspect that's a red herring here. Your properties have nifi-0 in several places, so just to double check that the relevant properties are changed for each of the instances within your statefulset? For example: * nifi.remote.input.host * nifi.cluster.node.address * nifi.web.https.host Yes And are you using a separate (non-wildcard) certificate for each node? Do you have liveness/readiness probes set on your nifi sts? And are you using a headless service[1] to manage the cluster during startup? [1] https://kubernetes.io/docs/concepts/services-networking/service/#headless-services Cheers, Chris Sampson On Tue, 29 Sep 2020, 18:48 Wyll Ingersoll, mailto:wyllys.ingers...@keepertech.com>> wrote: Zookeeper is from the docker hub zookeeper:3.5.7 image. Below is our nifi.properties (with secrets and hostnames modified). thanks! - Wyllys nifi.flow.configuration.file=/opt/nifi/nifi-current/latest_flow/nifi-0/flow.xml.gz nifi.flow.configuration.archive.enabled=true nifi.flow.configuration.archive.dir=/opt/nifi/nifi-current/archives nifi.flow.configuration.archive.max.time=30 days nifi.flow.configuration.archive.max.storage=500 MB nifi.flow.configuration.archive.max.count= nifi.flowcontroller.autoResumeState=false nifi.flowcontroller.graceful.shutdown.period=10 sec nifi.flowservice.writedelay.interval=500 ms nifi.administrative.yield.duration=30 sec nifi.bored.yield.duration=10 millis nifi.queue.backpressure.count=1 nifi.queue.backpressure.size=1 GB nifi.authorizer.configuration.file=./conf/authorizers.xml nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml nifi.templates.directory=/opt/nifi/nifi-current/templates nifi.ui.banner.text=KI Nifi Cluster nifi.ui.autorefresh.interval=30 sec nifi.nar.library.directory=./lib nifi.nar.library.autoload.directory=./extensions nifi.nar.working.directory=./work/nar/ nifi.documentation.working.directory=./work/docs/components nifi.state.management.configuration.file=./conf/state-management.xml nifi.state.management.provider.local=local-provider nifi.state.management.provider.cluster=zk-provider nifi.state.management.embedded.zookeeper.start=false nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties nifi.database.directory=./database_repository nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog nifi.flowfile.repository.directory=./flowfile_repository nifi.flowfile.repository.partitions=256 nifi.flowfile.repository.checkpoint.interval=2 mins nifi.flowfile.repository.always.sync=false nifi.flowfile.repository.encryption.key.provider.implementation= nifi.flowfile.repository.encryption.key.provider.location= nifi.flowfile.repository.encryption.key.id<http://nifi.flowfile.repository.encryption.key.id/>= nifi.flowfile.repository.encryption.key= nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager nifi.queue.swap.threshold=2 nifi.swap.in.period=5 sec nifi.swap.in.threads=1 nifi.swap.out.period=5 sec nifi.swap.out.threads=4 nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.claim.max.appendable.size=1 MB nifi.content.claim.max.flow.files=100 nifi.content.repository.directory.default=./content_repository nifi.content.repository.archive.max.retention.period=12 hours nifi.content.repository.archive.max.usage.percentage=50% nifi.content.repository.archive.enabled=true nifi.content.repository.always.sync=false nifi.content.viewer.url=../nifi-content-viewer/ nifi.content.repository.encryption.key.provider.implementation= nifi.content.reposit
Re: Clustered nifi issues
Wyll, What version of nifi are you running? Thanks -Mark On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: * Yes - the host specific parameters on the different instances are configured correctly (nifi-0, nifi-1, nifi-2) * Yes - we have separate certificate for each node and the keystores are configured correctly. * Yes - we have a headless service in front of the STS cluster * No - I don't think there is an explicit liveness or readiness probe defined for the STS, perhaps I need to add one. Do you have an example? -Wyllys From: Chris Sampson mailto:chris.samp...@naimuri.com>> Sent: Tuesday, September 29, 2020 3:21 PM To: users@nifi.apache.org<mailto:users@nifi.apache.org> mailto:users@nifi.apache.org>> Subject: Re: Clustered nifi issues We started to have more stability when we switched to bitnami/zookeeper:3.5.7, but I suspect that's a red herring here. Your properties have nifi-0 in several places, so just to double check that the relevant properties are changed for each of the instances within your statefulset? For example: * nifi.remote.input.host * nifi.cluster.node.address * nifi.web.https.host Yes And are you using a separate (non-wildcard) certificate for each node? Do you have liveness/readiness probes set on your nifi sts? And are you using a headless service[1] to manage the cluster during startup? [1] https://kubernetes.io/docs/concepts/services-networking/service/#headless-services Cheers, Chris Sampson On Tue, 29 Sep 2020, 18:48 Wyll Ingersoll, mailto:wyllys.ingers...@keepertech.com>> wrote: Zookeeper is from the docker hub zookeeper:3.5.7 image. Below is our nifi.properties (with secrets and hostnames modified). thanks! - Wyllys nifi.flow.configuration.file=/opt/nifi/nifi-current/latest_flow/nifi-0/flow.xml.gz nifi.flow.configuration.archive.enabled=true nifi.flow.configuration.archive.dir=/opt/nifi/nifi-current/archives nifi.flow.configuration.archive.max.time=30 days nifi.flow.configuration.archive.max.storage=500 MB nifi.flow.configuration.archive.max.count= nifi.flowcontroller.autoResumeState=false nifi.flowcontroller.graceful.shutdown.period=10 sec nifi.flowservice.writedelay.interval=500 ms nifi.administrative.yield.duration=30 sec nifi.bored.yield.duration=10 millis nifi.queue.backpressure.count=1 nifi.queue.backpressure.size=1 GB nifi.authorizer.configuration.file=./conf/authorizers.xml nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml nifi.templates.directory=/opt/nifi/nifi-current/templates nifi.ui.banner.text=KI Nifi Cluster nifi.ui.autorefresh.interval=30 sec nifi.nar.library.directory=./lib nifi.nar.library.autoload.directory=./extensions nifi.nar.working.directory=./work/nar/ nifi.documentation.working.directory=./work/docs/components nifi.state.management.configuration.file=./conf/state-management.xml nifi.state.management.provider.local=local-provider nifi.state.management.provider.cluster=zk-provider nifi.state.management.embedded.zookeeper.start=false nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties nifi.database.directory=./database_repository nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog nifi.flowfile.repository.directory=./flowfile_repository nifi.flowfile.repository.partitions=256 nifi.flowfile.repository.checkpoint.interval=2 mins nifi.flowfile.repository.always.sync=false nifi.flowfile.repository.encryption.key.provider.implementation= nifi.flowfile.repository.encryption.key.provider.location= nifi.flowfile.repository.encryption.key.id<http://nifi.flowfile.repository.encryption.key.id/>= nifi.flowfile.repository.encryption.key= nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager nifi.queue.swap.threshold=2 nifi.swap.in.period=5 sec nifi.swap.in.threads=1 nifi.swap.out.period=5 sec nifi.swap.out.threads=4 nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.claim.max.appendable.size=1 MB nifi.content.claim.max.flow.files=100 nifi.content.repository.directory.default=./content_repository nifi.content.repository.archive.max.retention.period=12 hours nifi.content.repository.archive.max.usage.percentage=50% nifi.content.repository.archive.enabled=true nifi.content.repository.always.sync=false nifi.content.viewer.url=../nifi-content-viewer/ nifi.content.repository.encryption.key.provider.implementation= nifi.content.repository.encryption.key.provider.location= nifi.content.repository.encryption.key.id<http://nifi.content.repository.encryption.key.id/>= nifi.content.repository.encryption.key=
Re: Clustered nifi issues
* Yes - the host specific parameters on the different instances are configured correctly (nifi-0, nifi-1, nifi-2) * Yes - we have separate certificate for each node and the keystores are configured correctly. * Yes - we have a headless service in front of the STS cluster * No - I don't think there is an explicit liveness or readiness probe defined for the STS, perhaps I need to add one. Do you have an example? -Wyllys From: Chris Sampson Sent: Tuesday, September 29, 2020 3:21 PM To: users@nifi.apache.org Subject: Re: Clustered nifi issues We started to have more stability when we switched to bitnami/zookeeper:3.5.7, but I suspect that's a red herring here. Your properties have nifi-0 in several places, so just to double check that the relevant properties are changed for each of the instances within your statefulset? For example: * nifi.remote.input.host * nifi.cluster.node.address * nifi.web.https.host Yes And are you using a separate (non-wildcard) certificate for each node? Do you have liveness/readiness probes set on your nifi sts? And are you using a headless service[1] to manage the cluster during startup? [1] https://kubernetes.io/docs/concepts/services-networking/service/#headless-services Cheers, Chris Sampson On Tue, 29 Sep 2020, 18:48 Wyll Ingersoll, mailto:wyllys.ingers...@keepertech.com>> wrote: Zookeeper is from the docker hub zookeeper:3.5.7 image. Below is our nifi.properties (with secrets and hostnames modified). thanks! - Wyllys nifi.flow.configuration.file=/opt/nifi/nifi-current/latest_flow/nifi-0/flow.xml.gz nifi.flow.configuration.archive.enabled=true nifi.flow.configuration.archive.dir=/opt/nifi/nifi-current/archives nifi.flow.configuration.archive.max.time=30 days nifi.flow.configuration.archive.max.storage=500 MB nifi.flow.configuration.archive.max.count= nifi.flowcontroller.autoResumeState=false nifi.flowcontroller.graceful.shutdown.period=10 sec nifi.flowservice.writedelay.interval=500 ms nifi.administrative.yield.duration=30 sec nifi.bored.yield.duration=10 millis nifi.queue.backpressure.count=1 nifi.queue.backpressure.size=1 GB nifi.authorizer.configuration.file=./conf/authorizers.xml nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml nifi.templates.directory=/opt/nifi/nifi-current/templates nifi.ui.banner.text=KI Nifi Cluster nifi.ui.autorefresh.interval=30 sec nifi.nar.library.directory=./lib nifi.nar.library.autoload.directory=./extensions nifi.nar.working.directory=./work/nar/ nifi.documentation.working.directory=./work/docs/components nifi.state.management.configuration.file=./conf/state-management.xml nifi.state.management.provider.local=local-provider nifi.state.management.provider.cluster=zk-provider nifi.state.management.embedded.zookeeper.start=false nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties nifi.database.directory=./database_repository nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog nifi.flowfile.repository.directory=./flowfile_repository nifi.flowfile.repository.partitions=256 nifi.flowfile.repository.checkpoint.interval=2 mins nifi.flowfile.repository.always.sync=false nifi.flowfile.repository.encryption.key.provider.implementation= nifi.flowfile.repository.encryption.key.provider.location= nifi.flowfile.repository.encryption.key.id<http://nifi.flowfile.repository.encryption.key.id>= nifi.flowfile.repository.encryption.key= nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager nifi.queue.swap.threshold=2 nifi.swap.in.period=5 sec nifi.swap.in.threads=1 nifi.swap.out.period=5 sec nifi.swap.out.threads=4 nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository nifi.content.claim.max.appendable.size=1 MB nifi.content.claim.max.flow.files=100 nifi.content.repository.directory.default=./content_repository nifi.content.repository.archive.max.retention.period=12 hours nifi.content.repository.archive.max.usage.percentage=50% nifi.content.repository.archive.enabled=true nifi.content.repository.always.sync=false nifi.content.viewer.url=../nifi-content-viewer/ nifi.content.repository.encryption.key.provider.implementation= nifi.content.repository.encryption.key.provider.location= nifi.content.repository.encryption.key.id<http://nifi.content.repository.encryption.key.id>= nifi.content.repository.encryption.key= nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository nifi.provenance.repository.debug.frequency=1_000_000 nifi.provenance.repository.encryption.key.provi
Re: Clustered nifi issues
nect.timeout=30 secs > > nifi.zookeeper.session.timeout=30 secs > > nifi.zookeeper.root.node=/nifi > > nifi.zookeeper.auth.type= > > nifi.zookeeper.kerberos.removeHostFromPrincipal= > > nifi.zookeeper.kerberos.removeRealmFromPrincipal= > > > nifi.kerberos.krb5.file= > > > nifi.kerberos.service.principal= > > nifi.kerberos.service.keytab.location= > > > nifi.kerberos.spnego.principal= > > nifi.kerberos.spnego.keytab.location= > > nifi.kerberos.spnego.authentication.expiration=12 hours > > > nifi.variable.registry.properties= > > > nifi.analytics.predict.enabled=false > > nifi.analytics.predict.interval=3 mins > > nifi.analytics.query.interval=5 mins > > > nifi.analytics.connection.model.implementation=org.apache.nifi.controller.status.analytics.models.OrdinaryLeastSquares > > nifi.analytics.connection.model.score.name=rSquared > > nifi.analytics.connection.model.score.threshold=.90 > > -- > *From:* Chris Sampson > *Sent:* Tuesday, September 29, 2020 12:41 PM > *To:* users@nifi.apache.org > *Subject:* Re: Clustered nifi issues > > Also, which version of zookeeper and what image (I've found different > versions and images provided better stability)? > > > Cheers, > > Chris Sampson > > On Tue, 29 Sep 2020, 17:34 Sushil Kumar, wrote: > > Hello Wyll > > It may be helpful if you can send nifi.properties. > > Thanks > Sushil Kumar > > On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > > > I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a > StatefulSet) using external zookeeper (3 nodes also) to manage state. > > Whenever even 1 node (pod/container) goes down or is restarted, it can > throw the whole cluster into a bad state that forces me to restart ALL of > the pods in order to recover. This seems wrong. The problem seems to be > that when the primary node goes away, the remaining 2 nodes don't ever try > to take over. Instead, I have restart all of them individually until one > of them becomes the primary, then the other 2 eventually join and sync up. > > When one of the nodes is refusing to sync up, I often see these errors in > the log and the only way to get it back into the cluster is to restart it. > The node showing the errors below never seems to be able to rejoin or > resync with the other 2 nodes. > > > 2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] > o.a.nifi.controller.StandardFlowService Handling reconnection request > failed due to: org.apache.nifi.cluster.ConnectionException: Failed to > connect node to cluster due to: java.lang.NullPointerException > > org.apache.nifi.cluster.ConnectionException: Failed to connect node to > cluster due to: java.lang.NullPointerException > > at > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035) > > at > org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668) > > at > org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109) > > at > org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.NullPointerException: null > > at > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989) > > ... 4 common frames omitted > > 2020-09-29 10:18:53,326 INFO [Reconnect to Cluster] > o.a.c.f.imps.CuratorFrameworkImpl Starting > > 2020-09-29 10:18:53,327 INFO [Reconnect to Cluster] > org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes > > 2020-09-29 10:18:53,328 INFO [Reconnect to Cluster] > o.a.c.f.imps.CuratorFrameworkImpl Default schema > > 2020-09-29 10:18:53,807 INFO [Reconnect to Cluster-EventThread] > o.a.c.f.state.ConnectionStateManager State change: CONNECTED > > 2020-09-29 10:18:53,809 INFO [Reconnect to Cluster-EventThread] > o.a.c.framework.imps.EnsembleTracker New config event received: > {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, version=0, > server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, > server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181} > > 2020-09-29 10:18:53,810 INFO [Curator-Framework-0] > o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting > > 2020-09-29 10:18:53,813 INFO [Reconnect to Cluster-EventThread] > o.a.c.framework.imps.EnsembleTracker New config event received: > {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, version=0, > server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, > server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181} > > 2020-09-29 10:18:54,323 INFO [Reconnect to Cluster] > o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election > Role 'Primary Node' becuase that role is not registered > > 2020-09-29 10:18:54,324 INFO [Reconnect to Cluster] > o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election > Role 'Cluster Coordinator' becuase that role is not registered > >
Re: Clustered nifi issues
= nifi.web.http.network.interface.default= nifi.web.https.host=nifi-0.nifi.ki.svc.cluster.local nifi.web.https.port=8080 nifi.web.https.network.interface.default= nifi.web.jetty.working.directory=./work/jetty nifi.web.jetty.threads=200 nifi.web.max.header.size=16 KB nifi.web.proxy.context.path=/nifi-api,/nifi nifi.web.proxy.host=ingress.ourdomain.com nifi.sensitive.props.key= nifi.sensitive.props.key.protected= nifi.sensitive.props.algorithm=PBEWITHMD5AND256BITAES-CBC-OPENSSL nifi.sensitive.props.provider=BC nifi.sensitive.props.additional.keys= nifi.security.keystore=/opt/nifi/nifi-current/security/nifi-0.keystore.jks nifi.security.keystoreType=jks nifi.security.keystorePasswd= nifi.security.keyPasswd=X nifi.security.truststore=/opt/nifi/nifi-current/security/nifi-0.truststore.jks nifi.security.truststoreType=jks nifi.security.truststorePasswd=XXX nifi.security.user.authorizer=managed-authorizer nifi.security.user.login.identity.provider= nifi.security.ocsp.responder.url= nifi.security.ocsp.responder.certificate= nifi.security.user.oidc.discovery.url=https://keycloak-server-address/auth/realms/Test/.well-known/openid-configuration nifi.security.user.oidc.connect.timeout=15 secs nifi.security.user.oidc.read.timeout=15 secs nifi.security.user.oidc.client.id=nifi nifi.security.user.oidc.client.secret=X nifi.security.user.oidc.preferred.jwsalgorithm=RS512 nifi.security.user.oidc.additional.scopes= nifi.security.user.oidc.claim.identifying.user= nifi.security.user.knox.url= nifi.security.user.knox.publicKey= nifi.security.user.knox.cookieName=hadoop-jwt nifi.security.user.knox.audiences= nifi.cluster.protocol.heartbeat.interval=30 secs nifi.cluster.protocol.is.secure=true nifi.cluster.is.node=true nifi.cluster.node.address=nifi-0.nifi.ki.svc.cluster.local nifi.cluster.node.protocol.port=2882 nifi.cluster.node.protocol.threads=40 nifi.cluster.node.protocol.max.threads=50 nifi.cluster.node.event.history.size=25 nifi.cluster.node.connection.timeout=120 secs nifi.cluster.node.read.timeout=120 secs nifi.cluster.node.max.concurrent.requests=100 nifi.cluster.firewall.file= nifi.cluster.flow.election.max.wait.time=5 mins nifi.cluster.flow.election.max.candidates= nifi.cluster.load.balance.host=nifi-0.nifi.ki.svc.cluster.local nifi.cluster.load.balance.port=6342 nifi.cluster.load.balance.connections.per.node=4 nifi.cluster.load.balance.max.thread.count=8 nifi.cluster.load.balance.comms.timeout=30 sec nifi.zookeeper.connect.string=zk-0.zk-hs.ki.svc.cluster.local:2181,zk-1.zk-hs.ki.svc.cluster.local:2181,zk-2.zk-hs.ki.svc.cluster.local:2181 nifi.zookeeper.connect.timeout=30 secs nifi.zookeeper.session.timeout=30 secs nifi.zookeeper.root.node=/nifi nifi.zookeeper.auth.type= nifi.zookeeper.kerberos.removeHostFromPrincipal= nifi.zookeeper.kerberos.removeRealmFromPrincipal= nifi.kerberos.krb5.file= nifi.kerberos.service.principal= nifi.kerberos.service.keytab.location= nifi.kerberos.spnego.principal= nifi.kerberos.spnego.keytab.location= nifi.kerberos.spnego.authentication.expiration=12 hours nifi.variable.registry.properties= nifi.analytics.predict.enabled=false nifi.analytics.predict.interval=3 mins nifi.analytics.query.interval=5 mins nifi.analytics.connection.model.implementation=org.apache.nifi.controller.status.analytics.models.OrdinaryLeastSquares nifi.analytics.connection.model.score.name=rSquared nifi.analytics.connection.model.score.threshold=.90 From: Chris Sampson Sent: Tuesday, September 29, 2020 12:41 PM To: users@nifi.apache.org Subject: Re: Clustered nifi issues Also, which version of zookeeper and what image (I've found different versions and images provided better stability)? Cheers, Chris Sampson On Tue, 29 Sep 2020, 17:34 Sushil Kumar, mailto:skm@gmail.com>> wrote: Hello Wyll It may be helpful if you can send nifi.properties. Thanks Sushil Kumar On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll mailto:wyllys.ingers...@keepertech.com>> wrote: I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a StatefulSet) using external zookeeper (3 nodes also) to manage state. Whenever even 1 node (pod/container) goes down or is restarted, it can throw the whole cluster into a bad state that forces me to restart ALL of the pods in order to recover. This seems wrong. The problem seems to be that when the primary node goes away, the remaining 2 nodes don't ever try to take over. Instead, I have restart all of them individually until one of them becomes the primary, then the other 2 eventually join and sync up. When one of the nodes is refusing to sync up, I often see these errors in the log and the only way to get it back into the cluster is to restart it. The node showing the errors below never seems to be able to rejoin or resync with the other 2 nodes. 2020-09-29 10:18:
Re: Clustered nifi issues
Also, which version of zookeeper and what image (I've found different versions and images provided better stability)? Cheers, Chris Sampson On Tue, 29 Sep 2020, 17:34 Sushil Kumar, wrote: > Hello Wyll > > It may be helpful if you can send nifi.properties. > > Thanks > Sushil Kumar > > On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll < > wyllys.ingers...@keepertech.com> wrote: > >> >> I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a >> StatefulSet) using external zookeeper (3 nodes also) to manage state. >> >> Whenever even 1 node (pod/container) goes down or is restarted, it can >> throw the whole cluster into a bad state that forces me to restart ALL of >> the pods in order to recover. This seems wrong. The problem seems to be >> that when the primary node goes away, the remaining 2 nodes don't ever try >> to take over. Instead, I have restart all of them individually until one >> of them becomes the primary, then the other 2 eventually join and sync up. >> >> When one of the nodes is refusing to sync up, I often see these errors in >> the log and the only way to get it back into the cluster is to restart it. >> The node showing the errors below never seems to be able to rejoin or >> resync with the other 2 nodes. >> >> >> 2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] >> o.a.nifi.controller.StandardFlowService Handling reconnection request >> failed due to: org.apache.nifi.cluster.ConnectionException: Failed to >> connect node to cluster due to: java.lang.NullPointerException >> >> org.apache.nifi.cluster.ConnectionException: Failed to connect node to >> cluster due to: java.lang.NullPointerException >> >> at >> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035) >> >> at >> org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668) >> >> at >> org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109) >> >> at >> org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415) >> >> at java.lang.Thread.run(Thread.java:748) >> >> Caused by: java.lang.NullPointerException: null >> >> at >> org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989) >> >> ... 4 common frames omitted >> >> 2020-09-29 10:18:53,326 INFO [Reconnect to Cluster] >> o.a.c.f.imps.CuratorFrameworkImpl Starting >> >> 2020-09-29 10:18:53,327 INFO [Reconnect to Cluster] >> org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes >> >> 2020-09-29 10:18:53,328 INFO [Reconnect to Cluster] >> o.a.c.f.imps.CuratorFrameworkImpl Default schema >> >> 2020-09-29 10:18:53,807 INFO [Reconnect to Cluster-EventThread] >> o.a.c.f.state.ConnectionStateManager State change: CONNECTED >> >> 2020-09-29 10:18:53,809 INFO [Reconnect to Cluster-EventThread] >> o.a.c.framework.imps.EnsembleTracker New config event received: >> {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; >> 0.0.0.0:2181, version=0, >> server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; >> 0.0.0.0:2181, >> server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; >> 0.0.0.0:2181} >> >> 2020-09-29 10:18:53,810 INFO [Curator-Framework-0] >> o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting >> >> 2020-09-29 10:18:53,813 INFO [Reconnect to Cluster-EventThread] >> o.a.c.framework.imps.EnsembleTracker New config event received: >> {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; >> 0.0.0.0:2181, version=0, >> server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; >> 0.0.0.0:2181, >> server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; >> 0.0.0.0:2181} >> >> 2020-09-29 10:18:54,323 INFO [Reconnect to Cluster] >> o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election >> Role 'Primary Node' becuase that role is not registered >> >> 2020-09-29 10:18:54,324 INFO [Reconnect to Cluster] >> o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election >> Role 'Cluster Coordinator' becuase that role is not registered >> >>
Re: Clustered nifi issues
Hello Wyll It may be helpful if you can send nifi.properties. Thanks Sushil Kumar On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll < wyllys.ingers...@keepertech.com> wrote: > > I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a > StatefulSet) using external zookeeper (3 nodes also) to manage state. > > Whenever even 1 node (pod/container) goes down or is restarted, it can > throw the whole cluster into a bad state that forces me to restart ALL of > the pods in order to recover. This seems wrong. The problem seems to be > that when the primary node goes away, the remaining 2 nodes don't ever try > to take over. Instead, I have restart all of them individually until one > of them becomes the primary, then the other 2 eventually join and sync up. > > When one of the nodes is refusing to sync up, I often see these errors in > the log and the only way to get it back into the cluster is to restart it. > The node showing the errors below never seems to be able to rejoin or > resync with the other 2 nodes. > > > 2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] > o.a.nifi.controller.StandardFlowService Handling reconnection request > failed due to: org.apache.nifi.cluster.ConnectionException: Failed to > connect node to cluster due to: java.lang.NullPointerException > > org.apache.nifi.cluster.ConnectionException: Failed to connect node to > cluster due to: java.lang.NullPointerException > > at > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035) > > at > org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668) > > at > org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109) > > at > org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: java.lang.NullPointerException: null > > at > org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989) > > ... 4 common frames omitted > > 2020-09-29 10:18:53,326 INFO [Reconnect to Cluster] > o.a.c.f.imps.CuratorFrameworkImpl Starting > > 2020-09-29 10:18:53,327 INFO [Reconnect to Cluster] > org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes > > 2020-09-29 10:18:53,328 INFO [Reconnect to Cluster] > o.a.c.f.imps.CuratorFrameworkImpl Default schema > > 2020-09-29 10:18:53,807 INFO [Reconnect to Cluster-EventThread] > o.a.c.f.state.ConnectionStateManager State change: CONNECTED > > 2020-09-29 10:18:53,809 INFO [Reconnect to Cluster-EventThread] > o.a.c.framework.imps.EnsembleTracker New config event received: > {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, version=0, > server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, > server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181} > > 2020-09-29 10:18:53,810 INFO [Curator-Framework-0] > o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting > > 2020-09-29 10:18:53,813 INFO [Reconnect to Cluster-EventThread] > o.a.c.framework.imps.EnsembleTracker New config event received: > {server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, version=0, > server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181, > server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant; > 0.0.0.0:2181} > > 2020-09-29 10:18:54,323 INFO [Reconnect to Cluster] > o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election > Role 'Primary Node' becuase that role is not registered > > 2020-09-29 10:18:54,324 INFO [Reconnect to Cluster] > o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election > Role 'Cluster Coordinator' becuase that role is not registered > >