unsubscribe

2022-10-13 Thread Wyll Ingersoll



Re: NiFi on Kubernetes - 2

2020-11-16 Thread Wyll Ingersoll
We deployed a 3-node cluster of nifi containers into a K8S and ran into many of 
these same issues.   We ended up creating our own Docker image based on the 
upstream apache/nifi image, but with our own entrypoint script that sets the 
nifi.properties that we wished to customize before launching the main nifi 
start script.

We also set the various repository directories to be on persistent storage, but 
leave the "conf" directory as-is and use a ConfigMap to hold the logback.xml 
definition, but not the nifi.properties since that is setup by the entrypoint 
script in the container.

-Wyllys Ingersoll



From: Sushil Kumar 
Sent: Monday, November 16, 2020 2:54 PM
To: users@nifi.apache.org 
Subject: Re: NiFi on Kubernetes - 2

Hello muhyid72

Thanks for checking in the chart.
Yes it leverages usage of helm charts for kubernetes.

Regarding the issue you are facing, what you need to understand is that the 
conf directory on the container would get replaced by volume if you mount, so 
volume should have pre-existing files. If you are using configmap mounted as 
file then this needs to be a source file from which you would need to add 
functionality to get the target conf files.
I am also interested in understanding the requirement of usage of mount-point 
for conf directory because it seems that you want to use container's existing 
scripts so in that case what do you want to achieve by using the volume mount 
as conf directory.

I hope you can find the solution.

Thanks
Sushil Kumar


On Mon, Nov 16, 2020 at 11:01 AM muhyid72 
mailto:muhyi...@outlook.com>> wrote:
Hi Sushil,
Thanks for your interest and knowledge sharing. Sorry for delayed response.
I reviewed your chart. As far as I understand, you are using Helm for nifi
deployment and you used Helm capabilities. I have not used Helm till now.
I would like to use original apache nifi image with minimum changes due to
future updates and easy manageability but I will look at deeply to Helm for
alternative way.

P.S: As I said to Chris, I couldn't get my aim with using configmap due to
read-only behavior

Thanks for your support



--
Sent from: http://apache-nifi-users-list.2361937.n4.nabble.com/


Re: Clustered nifi issues

2020-10-14 Thread Wyll Ingersoll
I see what's happening.  The container sets up a /root/.nifi-cli.config file 
that has the required security parameters so that the user doesn't have to 
supply them on the command line.

From: Bryan Bende 
Sent: Wednesday, October 14, 2020 10:45 AM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

The CLI does not use nifi.properties, there are several ways of passing in 
config...

https://nifi.apache.org/docs/nifi-docs/html/toolkit-guide.html#property-argument-handling

On Wed, Oct 14, 2020 at 10:01 AM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:
That makes sense.  It must be reading the keystore/truststore specified in the 
nifi.properties file then?

From: Bryan Bende mailto:bbe...@gmail.com>>
Sent: Wednesday, October 14, 2020 9:59 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

The get-nodes command calls the REST resource /controller/cluster which 
authorizes against READ on /controller [1], so there is no way you can call 
this in a secure environment without authenticating somehow, which from the CLI 
means specifying a keystore/truststore.

[1] 
https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/ControllerResource.java#L857

On Wed, Oct 14, 2020 at 9:26 AM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:
Yes, this is for a secured cluster deployed as a Kubernetes stateful set.  The 
certificate parameters are apparently not needed to just get the status of the 
nodes using the command below.




From: Sushil Kumar mailto:skm@gmail.com>>
Sent: Tuesday, October 13, 2020 4:01 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

Did you say that the same line of code works fine for secured clusters too.
I asked because nifi-toolkit has a separate set of parameters asking for 
certificates and everything else related to secure clusters.


On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

I found that instead of dealing with nifi client certificate hell, the 
nifi-toolkit cli.sh will work just fine for testing the readiness of the 
cluster.  Here is my readiness script which seems to work just fine with in 
kubernetes with the apache/nifi docker container version 1.12.1



#!/bin/bash


$NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state

if [ $? -ne 0 ]; then

cat /tmp/cluster.state

exit 1

fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then

echo "Node not found with CONNECTED state. Full cluster state:"

jq . /tmp/cluster.state

exit 1

fi



From: Chris Sampson 
mailto:chris.samp...@naimuri.com>>
Sent: Thursday, October 1, 2020 9:03 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

For info, the probes we currently use for our StatefulSet Pods are:

  *   livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080)
  *   readinessProbe - exec command to curl the nifi-api/controller/cluster 
endpoint to check the node's cluster connection status, e.g.:

readinessProbe:
exec:
command:
- bash
- -c
- |
if [ "${SECURE}" = "true" ]; then
INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr 
' ' '-')

curl -v \
--cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \
--cert 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem
 \
--key 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem
 \
https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state
else
curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > 
/tmp/cluster.state
fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then
echo "Node not found with CONNECTED state. Full cluster state:"
jq . /tmp/cluster.state
exit 1
fi

Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to 
call the endpoint and for whom our pod contains a set of certificate files in 
the indicated locations (generated from NiFi Toolkit in an init-container 
before the Pod starts); jq utility was added into our customised version of the 
apache/nifi Docker Image.


---
Chris Sampson
IT Consultant
chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com>
[X]<https://www.naimuri.com/>



Re: Clustered nifi issues

2020-10-14 Thread Wyll Ingersoll
That makes sense.  It must be reading the keystore/truststore specified in the 
nifi.properties file then?

From: Bryan Bende 
Sent: Wednesday, October 14, 2020 9:59 AM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

The get-nodes command calls the REST resource /controller/cluster which 
authorizes against READ on /controller [1], so there is no way you can call 
this in a secure environment without authenticating somehow, which from the CLI 
means specifying a keystore/truststore.

[1] 
https://github.com/apache/nifi/blob/main/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-web/nifi-web-api/src/main/java/org/apache/nifi/web/api/ControllerResource.java#L857

On Wed, Oct 14, 2020 at 9:26 AM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:
Yes, this is for a secured cluster deployed as a Kubernetes stateful set.  The 
certificate parameters are apparently not needed to just get the status of the 
nodes using the command below.




From: Sushil Kumar mailto:skm@gmail.com>>
Sent: Tuesday, October 13, 2020 4:01 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

Did you say that the same line of code works fine for secured clusters too.
I asked because nifi-toolkit has a separate set of parameters asking for 
certificates and everything else related to secure clusters.


On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

I found that instead of dealing with nifi client certificate hell, the 
nifi-toolkit cli.sh will work just fine for testing the readiness of the 
cluster.  Here is my readiness script which seems to work just fine with in 
kubernetes with the apache/nifi docker container version 1.12.1



#!/bin/bash


$NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state

if [ $? -ne 0 ]; then

cat /tmp/cluster.state

exit 1

fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then

echo "Node not found with CONNECTED state. Full cluster state:"

jq . /tmp/cluster.state

exit 1

fi



From: Chris Sampson 
mailto:chris.samp...@naimuri.com>>
Sent: Thursday, October 1, 2020 9:03 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

For info, the probes we currently use for our StatefulSet Pods are:

  *   livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080)
  *   readinessProbe - exec command to curl the nifi-api/controller/cluster 
endpoint to check the node's cluster connection status, e.g.:

readinessProbe:
exec:
command:
- bash
- -c
- |
if [ "${SECURE}" = "true" ]; then
INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr 
' ' '-')

curl -v \
--cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \
--cert 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem
 \
--key 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem
 \
https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state
else
curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > 
/tmp/cluster.state
fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then
echo "Node not found with CONNECTED state. Full cluster state:"
jq . /tmp/cluster.state
exit 1
fi

Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to 
call the endpoint and for whom our pod contains a set of certificate files in 
the indicated locations (generated from NiFi Toolkit in an init-container 
before the Pod starts); jq utility was added into our customised version of the 
apache/nifi Docker Image.


---
Chris Sampson
IT Consultant
chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com>
[X]<https://www.naimuri.com/>



Re: Clustered nifi issues

2020-10-14 Thread Wyll Ingersoll
Yes, this is for a secured cluster deployed as a Kubernetes stateful set.  The 
certificate parameters are apparently not needed to just get the status of the 
nodes using the command below.




From: Sushil Kumar 
Sent: Tuesday, October 13, 2020 4:01 PM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

Did you say that the same line of code works fine for secured clusters too.
I asked because nifi-toolkit has a separate set of parameters asking for 
certificates and everything else related to secure clusters.


On Tue, Oct 13, 2020 at 12:14 PM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

I found that instead of dealing with nifi client certificate hell, the 
nifi-toolkit cli.sh will work just fine for testing the readiness of the 
cluster.  Here is my readiness script which seems to work just fine with in 
kubernetes with the apache/nifi docker container version 1.12.1



#!/bin/bash


$NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state

if [ $? -ne 0 ]; then

cat /tmp/cluster.state

exit 1

fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then

echo "Node not found with CONNECTED state. Full cluster state:"

jq . /tmp/cluster.state

exit 1

fi



From: Chris Sampson 
mailto:chris.samp...@naimuri.com>>
Sent: Thursday, October 1, 2020 9:03 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

For info, the probes we currently use for our StatefulSet Pods are:

  *   livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080)
  *   readinessProbe - exec command to curl the nifi-api/controller/cluster 
endpoint to check the node's cluster connection status, e.g.:

readinessProbe:
exec:
command:
- bash
- -c
- |
if [ "${SECURE}" = "true" ]; then
INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr 
' ' '-')

curl -v \
--cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \
--cert 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem
 \
--key 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem
 \
https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state
else
curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > 
/tmp/cluster.state
fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then
echo "Node not found with CONNECTED state. Full cluster state:"
jq . /tmp/cluster.state
exit 1
fi

Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to 
call the endpoint and for whom our pod contains a set of certificate files in 
the indicated locations (generated from NiFi Toolkit in an init-container 
before the Pod starts); jq utility was added into our customised version of the 
apache/nifi Docker Image.


---
Chris Sampson
IT Consultant
chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com>
[https://docs.google.com/uc?export=download=1oPtzd0P7DqtuzpjiTRAa6h6coFitpqom=0B9aXwC5rMc6lVlZ2OWpUaVlFVmUwTlZBdjQ0KzAxb1dZS2hJPQ]<https://www.naimuri.com/>



Re: Clustered nifi issues

2020-10-13 Thread Wyll Ingersoll

I found that instead of dealing with nifi client certificate hell, the 
nifi-toolkit cli.sh will work just fine for testing the readiness of the 
cluster.  Here is my readiness script which seems to work just fine with in 
kubernetes with the apache/nifi docker container version 1.12.1



#!/bin/bash


$NIFI_TOOLKIT_HOME/bin/cli.sh nifi get-nodes -ot json > /tmp/cluster.state

if [ $? -ne 0 ]; then

cat /tmp/cluster.state

exit 1

fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then

echo "Node not found with CONNECTED state. Full cluster state:"

jq . /tmp/cluster.state

exit 1

fi



From: Chris Sampson 
Sent: Thursday, October 1, 2020 9:03 AM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

For info, the probes we currently use for our StatefulSet Pods are:

  *   livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080)
  *   readinessProbe - exec command to curl the nifi-api/controller/cluster 
endpoint to check the node's cluster connection status, e.g.:

readinessProbe:
exec:
command:
- bash
- -c
- |
if [ "${SECURE}" = "true" ]; then
INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr 
' ' '-')

curl -v \
--cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \
--cert 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem
 \
--key 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem
 \
https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state
else
curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > 
/tmp/cluster.state
fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then
echo "Node not found with CONNECTED state. Full cluster state:"
jq . /tmp/cluster.state
exit 1
fi

Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to 
call the endpoint and for whom our pod contains a set of certificate files in 
the indicated locations (generated from NiFi Toolkit in an init-container 
before the Pod starts); jq utility was added into our customised version of the 
apache/nifi Docker Image.


---
Chris Sampson
IT Consultant
chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com>
[https://docs.google.com/uc?export=download=1oPtzd0P7DqtuzpjiTRAa6h6coFitpqom=0B9aXwC5rMc6lVlZ2OWpUaVlFVmUwTlZBdjQ0KzAxb1dZS2hJPQ]<https://www.naimuri.com/>


On Wed, 30 Sep 2020 at 16:43, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:
Thanks for following up and filing the issue. Unfortunately, I dont have any of 
the logs from the original issue since I have since restarted and rebooted my 
containers many times.

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Wednesday, September 30, 2020 11:21 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

Thanks Wyll,

I created a Jira [1] to address this. The NullPointer that you show in the 
stack trace will prevent the node from reconnecting to the cluster. 
Unfortunately, it’s a bug that needs to be addressed. It’s possible that you 
may find a way to work around the issue, but I can’t tell you off the top of my 
head what that would be.

Can you check the logs for anything else from the StandardFlowService class? 
That may help to understand why the null value is getting returned, causing the 
NullPointerException that you’re seeing.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-7866

On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

1.11.4

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Wednesday, September 30, 2020 11:02 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

Wyll,

What version of nifi are you running?

Thanks
-Mark


On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:


  *   Yes - the host specific parameters on the different instances are 
configured correctly (nifi-0, nifi-1, nifi-2)
  *   Yes - we have separate certificate for each node and the keystores are 
configured correctly.
  *   Yes - we have a headless service in front of the STS cluster
  *   No - I don't think there is an explicit liveness or readiness probe 
defined for the STS, perhaps I need to add one. Do you have an example?

-Wyllys


_

Re: Clustered nifi issues

2020-10-01 Thread Wyll Ingersoll
Thanks, I'll try it out.  I had the liveness probe already, but wasn't quite 
sure what to check for in the readinessProbe.

From: Chris Sampson 
Sent: Thursday, October 1, 2020 9:03 AM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

For info, the probes we currently use for our StatefulSet Pods are:

  *   livenessProbe - tcpSocket to ping the NiFi instance port (e.g. 8080)
  *   readinessProbe - exec command to curl the nifi-api/controller/cluster 
endpoint to check the node's cluster connection status, e.g.:

readinessProbe:
exec:
command:
- bash
- -c
- |
if [ "${SECURE}" = "true" ]; then
INITIAL_ADMIN_SLUG=$(echo "${INITIAL_ADMIN}" | tr '[:upper:]' '[:lower:]' | tr 
' ' '-')

curl -v \
--cacert ${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/nifi-cert.pem \
--cert 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-cert.pem
 \
--key 
${NIFI_HOME}/data/conf/certs/${INITIAL_ADMIN_SLUG}/${INITIAL_ADMIN_SLUG}-key.pem
 \
https://$(hostname -f):8080/nifi-api/controller/cluster > /tmp/cluster.state
else
curl -kv http://$(hostname -f):8080/nifi-api/controller/cluster > 
/tmp/cluster.state
fi

STATUS=$(jq -r ".cluster.nodes[] | select((.address==\"$(hostname -f)\") or 
.address==\"localhost\") | .status" /tmp/cluster.state)

if [[ ! $STATUS = "CONNECTED" ]]; then
echo "Node not found with CONNECTED state. Full cluster state:"
jq . /tmp/cluster.state
exit 1
fi

Note that INITIAL_ADMIN is the CN of a user with appropriate permissions to 
call the endpoint and for whom our pod contains a set of certificate files in 
the indicated locations (generated from NiFi Toolkit in an init-container 
before the Pod starts); jq utility was added into our customised version of the 
apache/nifi Docker Image.


---
Chris Sampson
IT Consultant
chris.samp...@naimuri.com<mailto:chris.samp...@naimuri.com>
[https://docs.google.com/uc?export=download=1oPtzd0P7DqtuzpjiTRAa6h6coFitpqom=0B9aXwC5rMc6lVlZ2OWpUaVlFVmUwTlZBdjQ0KzAxb1dZS2hJPQ]<https://www.naimuri.com/>


On Wed, 30 Sep 2020 at 16:43, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:
Thanks for following up and filing the issue. Unfortunately, I dont have any of 
the logs from the original issue since I have since restarted and rebooted my 
containers many times.

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Wednesday, September 30, 2020 11:21 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

Thanks Wyll,

I created a Jira [1] to address this. The NullPointer that you show in the 
stack trace will prevent the node from reconnecting to the cluster. 
Unfortunately, it’s a bug that needs to be addressed. It’s possible that you 
may find a way to work around the issue, but I can’t tell you off the top of my 
head what that would be.

Can you check the logs for anything else from the StandardFlowService class? 
That may help to understand why the null value is getting returned, causing the 
NullPointerException that you’re seeing.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-7866

On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

1.11.4

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Wednesday, September 30, 2020 11:02 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

Wyll,

What version of nifi are you running?

Thanks
-Mark


On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:


  *   Yes - the host specific parameters on the different instances are 
configured correctly (nifi-0, nifi-1, nifi-2)
  *   Yes - we have separate certificate for each node and the keystores are 
configured correctly.
  *   Yes - we have a headless service in front of the STS cluster
  *   No - I don't think there is an explicit liveness or readiness probe 
defined for the STS, perhaps I need to add one. Do you have an example?

-Wyllys



From: Chris Sampson 
mailto:chris.samp...@naimuri.com>>
Sent: Tuesday, September 29, 2020 3:21 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

We started to have more stability when we switched to bitnami/zookeeper:3.5.7, 
but I suspect that's a red herring here.

Your properties have nifi-0 in several places, so just to double check that the 
relevant properties are changed for each of the instances within your 
statefulset?

For example:
* nifi.remote.input.host
* nifi.cluster.node.address
* nifi.web.https.host


Yes

And are you using a separate (non-wildcard) certificate

Re: Clustered nifi issues

2020-09-30 Thread Wyll Ingersoll
Thanks for following up and filing the issue. Unfortunately, I dont have any of 
the logs from the original issue since I have since restarted and rebooted my 
containers many times.

From: Mark Payne 
Sent: Wednesday, September 30, 2020 11:21 AM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

Thanks Wyll,

I created a Jira [1] to address this. The NullPointer that you show in the 
stack trace will prevent the node from reconnecting to the cluster. 
Unfortunately, it’s a bug that needs to be addressed. It’s possible that you 
may find a way to work around the issue, but I can’t tell you off the top of my 
head what that would be.

Can you check the logs for anything else from the StandardFlowService class? 
That may help to understand why the null value is getting returned, causing the 
NullPointerException that you’re seeing.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-7866

On Sep 30, 2020, at 11:03 AM, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

1.11.4

From: Mark Payne mailto:marka...@hotmail.com>>
Sent: Wednesday, September 30, 2020 11:02 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

Wyll,

What version of nifi are you running?

Thanks
-Mark


On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:


  *   Yes - the host specific parameters on the different instances are 
configured correctly (nifi-0, nifi-1, nifi-2)
  *   Yes - we have separate certificate for each node and the keystores are 
configured correctly.
  *   Yes - we have a headless service in front of the STS cluster
  *   No - I don't think there is an explicit liveness or readiness probe 
defined for the STS, perhaps I need to add one. Do you have an example?

-Wyllys



From: Chris Sampson 
mailto:chris.samp...@naimuri.com>>
Sent: Tuesday, September 29, 2020 3:21 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

We started to have more stability when we switched to bitnami/zookeeper:3.5.7, 
but I suspect that's a red herring here.

Your properties have nifi-0 in several places, so just to double check that the 
relevant properties are changed for each of the instances within your 
statefulset?

For example:
* nifi.remote.input.host
* nifi.cluster.node.address
* nifi.web.https.host


Yes

And are you using a separate (non-wildcard) certificate for each node?


Do you have liveness/readiness probes set on your nifi sts?


And are you using a headless service[1] to manage the cluster during startup?


[1] 
https://kubernetes.io/docs/concepts/services-networking/service/#headless-services


Cheers,

Chris Sampson

On Tue, 29 Sep 2020, 18:48 Wyll Ingersoll, 
mailto:wyllys.ingers...@keepertech.com>> wrote:
Zookeeper is from the docker hub zookeeper:3.5.7 image.

Below is our nifi.properties (with secrets and hostnames modified).

thanks!
 - Wyllys


nifi.flow.configuration.file=/opt/nifi/nifi-current/latest_flow/nifi-0/flow.xml.gz
nifi.flow.configuration.archive.enabled=true
nifi.flow.configuration.archive.dir=/opt/nifi/nifi-current/archives
nifi.flow.configuration.archive.max.time=30 days
nifi.flow.configuration.archive.max.storage=500 MB
nifi.flow.configuration.archive.max.count=
nifi.flowcontroller.autoResumeState=false
nifi.flowcontroller.graceful.shutdown.period=10 sec
nifi.flowservice.writedelay.interval=500 ms
nifi.administrative.yield.duration=30 sec

nifi.bored.yield.duration=10 millis
nifi.queue.backpressure.count=1
nifi.queue.backpressure.size=1 GB

nifi.authorizer.configuration.file=./conf/authorizers.xml
nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml
nifi.templates.directory=/opt/nifi/nifi-current/templates
nifi.ui.banner.text=KI Nifi Cluster
nifi.ui.autorefresh.interval=30 sec
nifi.nar.library.directory=./lib
nifi.nar.library.autoload.directory=./extensions
nifi.nar.working.directory=./work/nar/
nifi.documentation.working.directory=./work/docs/components

nifi.state.management.configuration.file=./conf/state-management.xml
nifi.state.management.provider.local=local-provider
nifi.state.management.provider.cluster=zk-provider
nifi.state.management.embedded.zookeeper.start=false
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

nifi.database.directory=./database_repository
nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE

nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
nifi.flowfile.repository.directory=./flowfile_repository
nifi.flowfile.repository.partitions=2

Re: Clustered nifi issues

2020-09-30 Thread Wyll Ingersoll
1.11.4

From: Mark Payne 
Sent: Wednesday, September 30, 2020 11:02 AM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

Wyll,

What version of nifi are you running?

Thanks
-Mark


On Sep 30, 2020, at 10:33 AM, Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:


  *   Yes - the host specific parameters on the different instances are 
configured correctly (nifi-0, nifi-1, nifi-2)
  *   Yes - we have separate certificate for each node and the keystores are 
configured correctly.
  *   Yes - we have a headless service in front of the STS cluster
  *   No - I don't think there is an explicit liveness or readiness probe 
defined for the STS, perhaps I need to add one. Do you have an example?

-Wyllys



From: Chris Sampson 
mailto:chris.samp...@naimuri.com>>
Sent: Tuesday, September 29, 2020 3:21 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Clustered nifi issues

We started to have more stability when we switched to bitnami/zookeeper:3.5.7, 
but I suspect that's a red herring here.

Your properties have nifi-0 in several places, so just to double check that the 
relevant properties are changed for each of the instances within your 
statefulset?

For example:
* nifi.remote.input.host
* nifi.cluster.node.address
* nifi.web.https.host


Yes

And are you using a separate (non-wildcard) certificate for each node?


Do you have liveness/readiness probes set on your nifi sts?


And are you using a headless service[1] to manage the cluster during startup?


[1] 
https://kubernetes.io/docs/concepts/services-networking/service/#headless-services


Cheers,

Chris Sampson

On Tue, 29 Sep 2020, 18:48 Wyll Ingersoll, 
mailto:wyllys.ingers...@keepertech.com>> wrote:
Zookeeper is from the docker hub zookeeper:3.5.7 image.

Below is our nifi.properties (with secrets and hostnames modified).

thanks!
 - Wyllys


nifi.flow.configuration.file=/opt/nifi/nifi-current/latest_flow/nifi-0/flow.xml.gz
nifi.flow.configuration.archive.enabled=true
nifi.flow.configuration.archive.dir=/opt/nifi/nifi-current/archives
nifi.flow.configuration.archive.max.time=30 days
nifi.flow.configuration.archive.max.storage=500 MB
nifi.flow.configuration.archive.max.count=
nifi.flowcontroller.autoResumeState=false
nifi.flowcontroller.graceful.shutdown.period=10 sec
nifi.flowservice.writedelay.interval=500 ms
nifi.administrative.yield.duration=30 sec

nifi.bored.yield.duration=10 millis
nifi.queue.backpressure.count=1
nifi.queue.backpressure.size=1 GB

nifi.authorizer.configuration.file=./conf/authorizers.xml
nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml
nifi.templates.directory=/opt/nifi/nifi-current/templates
nifi.ui.banner.text=KI Nifi Cluster
nifi.ui.autorefresh.interval=30 sec
nifi.nar.library.directory=./lib
nifi.nar.library.autoload.directory=./extensions
nifi.nar.working.directory=./work/nar/
nifi.documentation.working.directory=./work/docs/components

nifi.state.management.configuration.file=./conf/state-management.xml
nifi.state.management.provider.local=local-provider
nifi.state.management.provider.cluster=zk-provider
nifi.state.management.embedded.zookeeper.start=false
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties

nifi.database.directory=./database_repository
nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE

nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
nifi.flowfile.repository.directory=./flowfile_repository
nifi.flowfile.repository.partitions=256
nifi.flowfile.repository.checkpoint.interval=2 mins
nifi.flowfile.repository.always.sync=false
nifi.flowfile.repository.encryption.key.provider.implementation=
nifi.flowfile.repository.encryption.key.provider.location=
nifi.flowfile.repository.encryption.key.id<http://nifi.flowfile.repository.encryption.key.id/>=
nifi.flowfile.repository.encryption.key=

nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager
nifi.queue.swap.threshold=2
nifi.swap.in.period=5 sec
nifi.swap.in.threads=1
nifi.swap.out.period=5 sec
nifi.swap.out.threads=4

nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=./content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/
nifi.content.repository.encryption.key.provider.implementation=
nifi.content.reposit

Re: Clustered nifi issues

2020-09-30 Thread Wyll Ingersoll
  *   Yes - the host specific parameters on the different instances are 
configured correctly (nifi-0, nifi-1, nifi-2)
  *   Yes - we have separate certificate for each node and the keystores are 
configured correctly.
  *   Yes - we have a headless service in front of the STS cluster
  *   No - I don't think there is an explicit liveness or readiness probe 
defined for the STS, perhaps I need to add one. Do you have an example?

-Wyllys



From: Chris Sampson 
Sent: Tuesday, September 29, 2020 3:21 PM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

We started to have more stability when we switched to bitnami/zookeeper:3.5.7, 
but I suspect that's a red herring here.

Your properties have nifi-0 in several places, so just to double check that the 
relevant properties are changed for each of the instances within your 
statefulset?

For example:
* nifi.remote.input.host
* nifi.cluster.node.address
* nifi.web.https.host


Yes

And are you using a separate (non-wildcard) certificate for each node?


Do you have liveness/readiness probes set on your nifi sts?


And are you using a headless service[1] to manage the cluster during startup?


[1] 
https://kubernetes.io/docs/concepts/services-networking/service/#headless-services


Cheers,

Chris Sampson

On Tue, 29 Sep 2020, 18:48 Wyll Ingersoll, 
mailto:wyllys.ingers...@keepertech.com>> wrote:
Zookeeper is from the docker hub zookeeper:3.5.7 image.

Below is our nifi.properties (with secrets and hostnames modified).

thanks!
 - Wyllys



nifi.flow.configuration.file=/opt/nifi/nifi-current/latest_flow/nifi-0/flow.xml.gz

nifi.flow.configuration.archive.enabled=true

nifi.flow.configuration.archive.dir=/opt/nifi/nifi-current/archives

nifi.flow.configuration.archive.max.time=30 days

nifi.flow.configuration.archive.max.storage=500 MB

nifi.flow.configuration.archive.max.count=

nifi.flowcontroller.autoResumeState=false

nifi.flowcontroller.graceful.shutdown.period=10 sec

nifi.flowservice.writedelay.interval=500 ms

nifi.administrative.yield.duration=30 sec


nifi.bored.yield.duration=10 millis

nifi.queue.backpressure.count=1

nifi.queue.backpressure.size=1 GB


nifi.authorizer.configuration.file=./conf/authorizers.xml

nifi.login.identity.provider.configuration.file=./conf/login-identity-providers.xml

nifi.templates.directory=/opt/nifi/nifi-current/templates

nifi.ui.banner.text=KI Nifi Cluster

nifi.ui.autorefresh.interval=30 sec

nifi.nar.library.directory=./lib

nifi.nar.library.autoload.directory=./extensions

nifi.nar.working.directory=./work/nar/

nifi.documentation.working.directory=./work/docs/components


nifi.state.management.configuration.file=./conf/state-management.xml

nifi.state.management.provider.local=local-provider

nifi.state.management.provider.cluster=zk-provider

nifi.state.management.embedded.zookeeper.start=false

nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties


nifi.database.directory=./database_repository

nifi.h2.url.append=;LOCK_TIMEOUT=25000;WRITE_DELAY=0;AUTO_SERVER=FALSE


nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository

nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog

nifi.flowfile.repository.directory=./flowfile_repository

nifi.flowfile.repository.partitions=256

nifi.flowfile.repository.checkpoint.interval=2 mins

nifi.flowfile.repository.always.sync=false

nifi.flowfile.repository.encryption.key.provider.implementation=

nifi.flowfile.repository.encryption.key.provider.location=

nifi.flowfile.repository.encryption.key.id<http://nifi.flowfile.repository.encryption.key.id>=

nifi.flowfile.repository.encryption.key=


nifi.swap.manager.implementation=org.apache.nifi.controller.FileSystemSwapManager

nifi.queue.swap.threshold=2

nifi.swap.in.period=5 sec

nifi.swap.in.threads=1

nifi.swap.out.period=5 sec

nifi.swap.out.threads=4


nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository

nifi.content.claim.max.appendable.size=1 MB

nifi.content.claim.max.flow.files=100

nifi.content.repository.directory.default=./content_repository

nifi.content.repository.archive.max.retention.period=12 hours

nifi.content.repository.archive.max.usage.percentage=50%

nifi.content.repository.archive.enabled=true

nifi.content.repository.always.sync=false

nifi.content.viewer.url=../nifi-content-viewer/

nifi.content.repository.encryption.key.provider.implementation=

nifi.content.repository.encryption.key.provider.location=

nifi.content.repository.encryption.key.id<http://nifi.content.repository.encryption.key.id>=

nifi.content.repository.encryption.key=


nifi.provenance.repository.implementation=org.apache.nifi.provenance.WriteAheadProvenanceRepository

nifi.provenance.repository.debug.frequency=1_000_000

nifi.provenance.repository.encryption.key.provi

Re: Clustered nifi issues

2020-09-29 Thread Wyll Ingersoll
=

nifi.web.http.network.interface.default=

nifi.web.https.host=nifi-0.nifi.ki.svc.cluster.local

nifi.web.https.port=8080

nifi.web.https.network.interface.default=

nifi.web.jetty.working.directory=./work/jetty

nifi.web.jetty.threads=200

nifi.web.max.header.size=16 KB

nifi.web.proxy.context.path=/nifi-api,/nifi

nifi.web.proxy.host=ingress.ourdomain.com


nifi.sensitive.props.key=

nifi.sensitive.props.key.protected=

nifi.sensitive.props.algorithm=PBEWITHMD5AND256BITAES-CBC-OPENSSL

nifi.sensitive.props.provider=BC

nifi.sensitive.props.additional.keys=


nifi.security.keystore=/opt/nifi/nifi-current/security/nifi-0.keystore.jks

nifi.security.keystoreType=jks

nifi.security.keystorePasswd=

nifi.security.keyPasswd=X

nifi.security.truststore=/opt/nifi/nifi-current/security/nifi-0.truststore.jks

nifi.security.truststoreType=jks

nifi.security.truststorePasswd=XXX

nifi.security.user.authorizer=managed-authorizer

nifi.security.user.login.identity.provider=

nifi.security.ocsp.responder.url=

nifi.security.ocsp.responder.certificate=


nifi.security.user.oidc.discovery.url=https://keycloak-server-address/auth/realms/Test/.well-known/openid-configuration

nifi.security.user.oidc.connect.timeout=15 secs

nifi.security.user.oidc.read.timeout=15 secs

nifi.security.user.oidc.client.id=nifi

nifi.security.user.oidc.client.secret=X

nifi.security.user.oidc.preferred.jwsalgorithm=RS512

nifi.security.user.oidc.additional.scopes=

nifi.security.user.oidc.claim.identifying.user=


nifi.security.user.knox.url=

nifi.security.user.knox.publicKey=

nifi.security.user.knox.cookieName=hadoop-jwt

nifi.security.user.knox.audiences=


nifi.cluster.protocol.heartbeat.interval=30 secs

nifi.cluster.protocol.is.secure=true


nifi.cluster.is.node=true

nifi.cluster.node.address=nifi-0.nifi.ki.svc.cluster.local

nifi.cluster.node.protocol.port=2882

nifi.cluster.node.protocol.threads=40

nifi.cluster.node.protocol.max.threads=50

nifi.cluster.node.event.history.size=25

nifi.cluster.node.connection.timeout=120 secs

nifi.cluster.node.read.timeout=120 secs

nifi.cluster.node.max.concurrent.requests=100

nifi.cluster.firewall.file=

nifi.cluster.flow.election.max.wait.time=5 mins

nifi.cluster.flow.election.max.candidates=


nifi.cluster.load.balance.host=nifi-0.nifi.ki.svc.cluster.local

nifi.cluster.load.balance.port=6342

nifi.cluster.load.balance.connections.per.node=4

nifi.cluster.load.balance.max.thread.count=8

nifi.cluster.load.balance.comms.timeout=30 sec


nifi.zookeeper.connect.string=zk-0.zk-hs.ki.svc.cluster.local:2181,zk-1.zk-hs.ki.svc.cluster.local:2181,zk-2.zk-hs.ki.svc.cluster.local:2181

nifi.zookeeper.connect.timeout=30 secs

nifi.zookeeper.session.timeout=30 secs

nifi.zookeeper.root.node=/nifi

nifi.zookeeper.auth.type=

nifi.zookeeper.kerberos.removeHostFromPrincipal=

nifi.zookeeper.kerberos.removeRealmFromPrincipal=


nifi.kerberos.krb5.file=


nifi.kerberos.service.principal=

nifi.kerberos.service.keytab.location=


nifi.kerberos.spnego.principal=

nifi.kerberos.spnego.keytab.location=

nifi.kerberos.spnego.authentication.expiration=12 hours


nifi.variable.registry.properties=


nifi.analytics.predict.enabled=false

nifi.analytics.predict.interval=3 mins

nifi.analytics.query.interval=5 mins

nifi.analytics.connection.model.implementation=org.apache.nifi.controller.status.analytics.models.OrdinaryLeastSquares

nifi.analytics.connection.model.score.name=rSquared

nifi.analytics.connection.model.score.threshold=.90


From: Chris Sampson 
Sent: Tuesday, September 29, 2020 12:41 PM
To: users@nifi.apache.org 
Subject: Re: Clustered nifi issues

Also, which version of zookeeper and what image (I've found different versions 
and images provided better stability)?


Cheers,

Chris Sampson

On Tue, 29 Sep 2020, 17:34 Sushil Kumar, 
mailto:skm@gmail.com>> wrote:
Hello Wyll

It may be helpful if you can send nifi.properties.

Thanks
Sushil Kumar

On Tue, Sep 29, 2020 at 7:58 AM Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>> wrote:

I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a 
StatefulSet) using external zookeeper (3 nodes also) to manage state.

Whenever even 1 node (pod/container) goes down or is restarted, it can throw 
the whole cluster into a bad state that forces me to restart ALL of the pods in 
order to recover.  This seems wrong.  The problem seems to be that when the 
primary node goes away, the remaining 2 nodes don't ever try to take over.  
Instead, I have restart all of them individually until one of them becomes the 
primary, then the other 2 eventually join and sync up.

When one of the nodes is refusing to sync up, I often see these errors in the 
log and the only way to get it back into the cluster is to restart it.  The 
node showing the errors below never seems to be able to rejoin or resync with 
the other 2 nodes.



2020-09-29 10:18:

Clustered nifi issues

2020-09-29 Thread Wyll Ingersoll

I have a 3-node Nifi (1.11.4) cluster in kubernetes environment (as a 
StatefulSet) using external zookeeper (3 nodes also) to manage state.

Whenever even 1 node (pod/container) goes down or is restarted, it can throw 
the whole cluster into a bad state that forces me to restart ALL of the pods in 
order to recover.  This seems wrong.  The problem seems to be that when the 
primary node goes away, the remaining 2 nodes don't ever try to take over.  
Instead, I have restart all of them individually until one of them becomes the 
primary, then the other 2 eventually join and sync up.

When one of the nodes is refusing to sync up, I often see these errors in the 
log and the only way to get it back into the cluster is to restart it.  The 
node showing the errors below never seems to be able to rejoin or resync with 
the other 2 nodes.



2020-09-29 10:18:53,324 ERROR [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Handling reconnection request failed 
due to: org.apache.nifi.cluster.ConnectionException: Failed to connect node to 
cluster due to: java.lang.NullPointerException

org.apache.nifi.cluster.ConnectionException: Failed to connect node to cluster 
due to: java.lang.NullPointerException

at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:1035)

at 
org.apache.nifi.controller.StandardFlowService.handleReconnectionRequest(StandardFlowService.java:668)

at 
org.apache.nifi.controller.StandardFlowService.access$200(StandardFlowService.java:109)

at 
org.apache.nifi.controller.StandardFlowService$1.run(StandardFlowService.java:415)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.NullPointerException: null

at 
org.apache.nifi.controller.StandardFlowService.loadFromConnectionResponse(StandardFlowService.java:989)

... 4 common frames omitted

2020-09-29 10:18:53,326 INFO [Reconnect to Cluster] 
o.a.c.f.imps.CuratorFrameworkImpl Starting

2020-09-29 10:18:53,327 INFO [Reconnect to Cluster] 
org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes

2020-09-29 10:18:53,328 INFO [Reconnect to Cluster] 
o.a.c.f.imps.CuratorFrameworkImpl Default schema

2020-09-29 10:18:53,807 INFO [Reconnect to Cluster-EventThread] 
o.a.c.f.state.ConnectionStateManager State change: CONNECTED

2020-09-29 10:18:53,809 INFO [Reconnect to Cluster-EventThread] 
o.a.c.framework.imps.EnsembleTracker New config event received: 
{server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant;0.0.0.0:2181, 
version=0, 
server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant;0.0.0.0:2181, 
server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant;0.0.0.0:2181}

2020-09-29 10:18:53,810 INFO [Curator-Framework-0] 
o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting

2020-09-29 10:18:53,813 INFO [Reconnect to Cluster-EventThread] 
o.a.c.framework.imps.EnsembleTracker New config event received: 
{server.1=zk-0.zk-hs.ki.svc.cluster.local:2888:3888:participant;0.0.0.0:2181, 
version=0, 
server.3=zk-2.zk-hs.ki.svc.cluster.local:2888:3888:participant;0.0.0.0:2181, 
server.2=zk-1.zk-hs.ki.svc.cluster.local:2888:3888:participant;0.0.0.0:2181}

2020-09-29 10:18:54,323 INFO [Reconnect to Cluster] 
o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election Role 
'Primary Node' becuase that role is not registered

2020-09-29 10:18:54,324 INFO [Reconnect to Cluster] 
o.a.n.c.l.e.CuratorLeaderElectionManager Cannot unregister Leader Election Role 
'Cluster Coordinator' becuase that role is not registered



Re: Data Provenance Stops Working

2020-08-10 Thread Wyll Ingersoll
Ah! That fixed my problem!

I am running a secure/authenticated configuration.  My user had the correct 
policy permissions when viewed from the hamburger menu, but not on the 
access-policies for the individual processors/canvas.

Very confusing!  One would think if the admin user had the permissions, they 
would apply everywhere, but apparently not.

Anyway, now I can see provenance data.  Thanks for the tip!

From: Shawn Weeks 
Sent: Monday, August 10, 2020 5:19 PM
To: users@nifi.apache.org 
Subject: Re: Data Provenance Stops Working


That sounds like a permission issue. Are you running in secure mode? If so 
right click on the main canvas and go to access policies and make sure your 
current user is allowed provenance access, you’ll also have to go to the 
policies section in the system menu in the upper right hand corner and do the 
same.



Thanks

Shawn



From: Darren Govoni 
Reply-To: "users@nifi.apache.org" 
Date: Monday, August 10, 2020 at 3:52 PM
To: "users@nifi.apache.org" 
Subject: Re: Data Provenance Stops Working



I also use 1.11.4 and out of the box there IS NO provenance data whatsoever. It 
just doesn't work if you install and run nifi as is.

Sent from my Verizon, Samsung Galaxy smartphone

Get Outlook for Android<https://aka.ms/ghei36>





From: Shawn Weeks 
Sent: Monday, August 10, 2020 2:23:19 PM
To: users@nifi.apache.org 
Subject: Re: Data Provenance Stops Working



It sounds like if I expand the retention time a lot, say 30 days the issue 
should be less bad?



Thanks

Shawn



From: Mark Payne 
Reply-To: "users@nifi.apache.org" 
Date: Monday, August 10, 2020 at 12:37 PM
To: "users@nifi.apache.org" 
Subject: Re: Data Provenance Stops Working



Shawn / Wyll,



I think you’re probably running into NIFI-7346 [1], which basically says 
there’s a case in which NiFi may “age off” old data even when it’s still the 
file that’s being actively written to. In Linux/OSX this results in simply 
deleting the file, and then anything else written to it disappears into the 
ether. Of course, now the file never exceeds the max size, since it’ll be 0 
bytes forever, os it never rolls over again. So when this happens, no more 
provenance data gets created until NiFi is restarted.



It’s also possible that you’re hitting NIFI-7375 [2]. This Jira only affects 
you if you get to provenance by right-clicking on a Processor and clicking View 
Provenance (i.e., not if you go to the Hamburger Menu in the top-right corner 
and go to Provenance from there and search that way). If this is the problem, 
once you right-click and go to View Provenance, you can actually click the 
Search icon (magnifying glass) in that empty Provenance Results panel and click 
Search and then it will actually bring back the results. So that’s obnoxious 
but it’s a workaround that may help.



The good news is that both of these have been addressed for 1.12.0, which 
sounds like it should be coming out very soon!



Thanks

-Mark



[1] https://issues.apache.org/jira/browse/NIFI-7346

[2] https://issues.apache.org/jira/browse/NIFI-7375





On Aug 10, 2020, at 1:26 PM, Joe Witt 
mailto:joe.w...@gmail.com>> wrote:



shawn - i believe it is related to our default settings and have phoned a 
friend to jump in here when able. but default retention and default sharding i 
*think* can result in this.  You can generate a thread dump before and after 
the locked state to see what it is stuck/sitting on.  That will help here



Thanks



On Mon, Aug 10, 2020 at 10:24 AM Shawn Weeks 
mailto:swe...@weeksconsulting.us>> wrote:

Out of the box even the initial admin user has to be granted permission I 
think, mine worked fine for several months since 1.11.1 was released and just 
started having an issues a couple of weeks ago. I’ve increasing the retention 
time a bit to see if that improves the situation a bit.



Thanks

Shawn Weeks



From: Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Date: Monday, August 10, 2020 at 12:22 PM
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Subject: Re: Data Provenance Stops Working



I run 1.11.4 in a cluster on AWS also and have a similar issue with the 
provenance data, I can't ever view it.  It's probably somehow misconfigured but 
I haven't been able to figure it out.



From: Andy LoPresto mailto:alopre...@apache.org>>
Sent: Monday, August 10, 2020 1:11 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Data Provenance Stops Working



Shawn,



I don’t know if this is specifically related, but there were a number of 
critical issues discovered in the 1.11.x release line tha

Re: Data Provenance Stops Working

2020-08-10 Thread Wyll Ingersoll
Great, thanks! Looking forward to trying it out on 1.12

I can't see any provenance data with either method (right-click on processor or 
hamburger menu).

When I login to the server, I do see data being written to the 
provenance_repository directory in a sub-dir that looks like 
"lucene-8-index-1560864167766" and which appears to have current data, I just 
cant view anything from the UI.



From: Mark Payne 
Sent: Monday, August 10, 2020 1:37 PM
To: users@nifi.apache.org 
Subject: Re: Data Provenance Stops Working

Shawn / Wyll,

I think you’re probably running into NIFI-7346 [1], which basically says 
there’s a case in which NiFi may “age off” old data even when it’s still the 
file that’s being actively written to. In Linux/OSX this results in simply 
deleting the file, and then anything else written to it disappears into the 
ether. Of course, now the file never exceeds the max size, since it’ll be 0 
bytes forever, os it never rolls over again. So when this happens, no more 
provenance data gets created until NiFi is restarted.

It’s also possible that you’re hitting NIFI-7375 [2]. This Jira only affects 
you if you get to provenance by right-clicking on a Processor and clicking View 
Provenance (i.e., not if you go to the Hamburger Menu in the top-right corner 
and go to Provenance from there and search that way). If this is the problem, 
once you right-click and go to View Provenance, you can actually click the 
Search icon (magnifying glass) in that empty Provenance Results panel and click 
Search and then it will actually bring back the results. So that’s obnoxious 
but it’s a workaround that may help.

The good news is that both of these have been addressed for 1.12.0, which 
sounds like it should be coming out very soon!

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-7346
[2] https://issues.apache.org/jira/browse/NIFI-7375


On Aug 10, 2020, at 1:26 PM, Joe Witt 
mailto:joe.w...@gmail.com>> wrote:

shawn - i believe it is related to our default settings and have phoned a 
friend to jump in here when able. but default retention and default sharding i 
*think* can result in this.  You can generate a thread dump before and after 
the locked state to see what it is stuck/sitting on.  That will help here

Thanks

On Mon, Aug 10, 2020 at 10:24 AM Shawn Weeks 
mailto:swe...@weeksconsulting.us>> wrote:

Out of the box even the initial admin user has to be granted permission I 
think, mine worked fine for several months since 1.11.1 was released and just 
started having an issues a couple of weeks ago. I’ve increasing the retention 
time a bit to see if that improves the situation a bit.



Thanks

Shawn Weeks



From: Wyll Ingersoll 
mailto:wyllys.ingers...@keepertech.com>>
Reply-To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Date: Monday, August 10, 2020 at 12:22 PM
To: "users@nifi.apache.org<mailto:users@nifi.apache.org>" 
mailto:users@nifi.apache.org>>
Subject: Re: Data Provenance Stops Working



I run 1.11.4 in a cluster on AWS also and have a similar issue with the 
provenance data, I can't ever view it.  It's probably somehow misconfigured but 
I haven't been able to figure it out.



From: Andy LoPresto mailto:alopre...@apache.org>>
Sent: Monday, August 10, 2020 1:11 PM
To: users@nifi.apache.org<mailto:users@nifi.apache.org> 
mailto:users@nifi.apache.org>>
Subject: Re: Data Provenance Stops Working



Shawn,



I don’t know if this is specifically related, but there were a number of 
critical issues discovered in the 1.11.x release line that have been fixed in 
1.11.4. I would not recommend running any prior version on that release line.



1.12.0 should be coming imminently, so if you are going to upgrade anyway, you 
may want to wait a week or so and get the newest bits with hundreds of new 
features, but for stability alone, I would strongly recommend 1.11.4.



https://cwiki.apache.org/confluence/display/NIFI/Release+Notes





Andy LoPresto
alopre...@apache.org<mailto:alopre...@apache.org>
alopresto.apa...@gmail.com<mailto:alopresto.apa...@gmail.com>
He/Him

PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69



On Aug 10, 2020, at 10:00 AM, Shawn Weeks 
mailto:swe...@weeksconsulting.us>> wrote:



I’m running a three node NiFi Cluster on AWS EC2s using integrated Zookeeper 
and SSL Enabled. Version is 1.11.1, OS is RedHat 7.7, Java is 1.8.0_242. For 
some reason after a period of time Data Provenance goes blank, old records are 
no longer queryable and new data provenance doesn’t appear to get written. No 
Lucene or other exceptions are logged and restarting the node causes data 
provenance to go back to being written however old data provenance does not 
re-appear. No exceptions appear when querying data provenance. All tests have 
been run as 

Re: Data Provenance Stops Working

2020-08-10 Thread Wyll Ingersoll
I run 1.11.4 in a cluster on AWS also and have a similar issue with the 
provenance data, I can't ever view it.  It's probably somehow misconfigured but 
I haven't been able to figure it out.

From: Andy LoPresto 
Sent: Monday, August 10, 2020 1:11 PM
To: users@nifi.apache.org 
Subject: Re: Data Provenance Stops Working

Shawn,

I don’t know if this is specifically related, but there were a number of 
critical issues discovered in the 1.11.x release line that have been fixed in 
1.11.4. I would not recommend running any prior version on that release line.

1.12.0 should be coming imminently, so if you are going to upgrade anyway, you 
may want to wait a week or so and get the newest bits with hundreds of new 
features, but for stability alone, I would strongly recommend 1.11.4.

https://cwiki.apache.org/confluence/display/NIFI/Release+Notes


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Aug 10, 2020, at 10:00 AM, Shawn Weeks 
mailto:swe...@weeksconsulting.us>> wrote:

I’m running a three node NiFi Cluster on AWS EC2s using integrated Zookeeper 
and SSL Enabled. Version is 1.11.1, OS is RedHat 7.7, Java is 1.8.0_242. For 
some reason after a period of time Data Provenance goes blank, old records are 
no longer queryable and new data provenance doesn’t appear to get written. No 
Lucene or other exceptions are logged and restarting the node causes data 
provenance to go back to being written however old data provenance does not 
re-appear. No exceptions appear when querying data provenance. All tests have 
been run as the initial admin user and roles and permissions appear to be 
correct. I’ve also checked selinux audit logs to see if something is being 
blocked but nothing appears.

WriteAheadProvenanceRepository, max storage is set to 24 hours, 1GB, 30 seconds 
for rollover, 100mb rollover size, 2 query threads, 2 index threads, compress 
on rollover, and don’t sync always.

Thanks
Shawn Weeks



FetchS3Object throwing com.amazonaws.SdkClientException

2020-06-24 Thread Wyll Ingersoll
Using Nifi 1.11.4, the FetchS3Object processor occasionally throws the 
following error:

routing to failure: com.amazonaws.SdkClientException: Unable to execute HTTP 
request: The target server failed to respond.
The full stack trace looks like:


com.amazonaws.SdkClientException: Unable to execute HTTP request: The target 
server failed to respond

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1175)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1121)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)

at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1472)

at 
org.apache.nifi.processors.aws.s3.FetchS3Object.onTrigger(FetchS3Object.java:159)

at 
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)

at 
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)

at 
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)

at 
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)

at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.http.NoHttpResponseException: The target server failed to 
respond

at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)

at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)

at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)

at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)

at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)

at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)

at 
com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)

at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)

at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)

at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)

at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)

at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)

at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)

at 
com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1297)

at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)

... 23 common frames omitted



The timeout period is plenty long enough and the server is definitely 
available.  Not sure what would be causing the problem but there were some bugs 
in the Apache HTTPClient module that are referenced in some forums discussing a 
similar issue - https://issues.apache.org/jira/browse/HTTPCLIENT-1610

Is this a known problem in the Nifi processor, perhaps using a buggy HTTP 
Client library? It only happens occasionally, usually when the processor is 
under heavy load. Looking for suggestions/workarounds.

Thanks
   Wyllys Ingersoll