Re: Issue with removal and re-add of a cluster node

2022-12-09 Thread Mark Payne
David,

I think you’re running into https://issues.apache.org/jira/browse/NIFI-10453, 
which was fixed in 1.19.

It results in the "Cannot set AnnotationData while processor is running” error.

Recommend upgrading to 1.19. In the meantime, though, you should be okay to 
shutdown node 3, delete the conf/flow.xml.gz and conf/flow.json.gz and restart
That will rejoin the cluster and inherit whatever the cluster’s flow is.

Thanks
-Mark


On Dec 9, 2022, at 2:18 PM, David Early via users  wrote:

Forgot my version: 1.16.3

Dave

On Fri, Dec 9, 2022 at 11:22 AM David Early 
mailto:david.ea...@grokstream.com>> wrote:
Hi all,

I have a major issue and am not sure what to do about it.

We have a 3 node cluster.  I was working on a one-off load for some data we 
were doing out of sequence and it resulted in build-up of some flowfiles in a 
queue.  In order to prevent a backpressure situation, I cleared one of the 
holding queues that had about 69k flow files.

During the clear operation the node I was on (node 3 UI in this case) returned 
and stated that the node was no longer part of the cluster.  Not clear why that 
happened.

This, by itself, is not really an issue.  Looking at the logs (at bottom of 
this note), you can see theflowfile drop and immediate adjustment to the node 3 
to state of CONNECTING to the cluster.  Subsequently, an error occurred:  
"Disconnecting node due to Failed to properly handle Reconnection request due 
to org.apache.nifi.controller.serialization.FlowSynchronizationException: 
Failed to connect node to cluster because local flow controller partially 
updated. Administrator should disconnect node and review flow for corruption".

When I attempted to readd the node from the UI, it repeated this error.

I compared users.xml and authroizations.xml on all three nodes, textually the 
same and identical md5sum on all (issues with users.xml and authorizations.xml 
were listed online as usual suspects).

I then offloaded the node via the UI to make sure I didn't have anything stuck 
in queue on node 3 and hoped it would allow the node to rejoin.  After 
offloading, I attempted to reconnect and what happened next gave me a heart 
attack:  Node 3 now showed as connected but in the UI (accessed via node 1), 
ALL PROCESSORS WERE SHOWN AS STOPPED.

A quick review showed that traffic was actually flowing (View status history 
showed flowfiles moving, observing some of our observation queues showed 
traffic on nodes 1 and 2).  Removing node 3 from the cluster restored the UI to 
show everything running, but adding it back in showed everything as stopped.

I tried to start some processors while node 3 was connected and while I could 
start individual processors, I could not do a "global" start by right clicking 
on canvas and trying "start".  I set up a sample processor to generate a file 
on all 3 nodes and it did generate a new flowfile on node 3.  All of that 
worked fine.

We have 400+ processors that I would need to hand start and I am super nervous 
about the cluster deciding to make node 3 the primary which would affect some 
timed process that we are running on the primary node.  As long as I don't 
restart the http input feed, I COULD restart all the processors, but this seems 
like the wrong process.

Anyone have any idea what I did wrong and how to fix it?  The errors show in 
the log attached happened before any offloading, but I wondered if the 
offloading caused part of this issue.

Is there anything I can do to readd the node without having to restart all the 
processors manually?

Should I clean up the node and add it as a "new" node and let it completely 
sync?

Thanks for any insight!


Dave


---
Log:
---
2022-12-08 22:26:20,706 INFO [Drop FlowFiles for Connection 
8b0ee741-0183-1000--68704c93] o.a.n.c.queue.SwappablePriorityQueue 
Successfully dropped 69615 FlowFiles (35496003 bytes) from Connection with ID 
8b0ee741-0183-1000--68704c93 on behalf of 
u...@org.com
2022-12-08 22:26:20,707 INFO [Process Cluster Protocol Request-29] 
o.a.n.c.c.node.NodeClusterCoordinator Status of prod-stsc2-3:8443 changed from 
NodeConnectionStatus[nodeId=prod-stsc2-3:8443, state=CONNECTED, updateId=108] 
to NodeConnectionStatus[nodeId=prod-stsc2-3:8443, state=CONNECTING, 
updateId=114]
2022-12-08 22:26:20,707 INFO [Process Cluster Protocol Request-29] 
o.a.n.c.p.impl.SocketProtocolListener Finished processing request 
070fe65c-4a77-41d0-9d7f-8f08ede6ac71 (type=NODE_STATUS_CHANGE, length=1217 
bytes) from 
prod-stsc2-1.internal.cloudapp.net 
in 10 seconds, 842 millis
2022-12-08 22:26:20,750 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Setting Flow Controller's Node ID: 
prod-stsc2-3:8443
2022-12-08 22:26:20,751 INFO [Reconnect to Cluster] 
o.a.n.c.s.VersionedFlowSynchronizer Synchronizing FlowController with proposed 
flow: Controller 

Re: Issue with removal and re-add of a cluster node

2022-12-09 Thread David Early via users
Forgot my version: 1.16.3

Dave

On Fri, Dec 9, 2022 at 11:22 AM David Early 
wrote:

> Hi all,
>
> I have a major issue and am not sure what to do about it.
>
> We have a 3 node cluster.  I was working on a one-off load for some data
> we were doing out of sequence and it resulted in build-up of some flowfiles
> in a queue.  In order to prevent a backpressure situation, I cleared one of
> the holding queues that had about 69k flow files.
>
> During the clear operation the node I was on (node 3 UI in this case)
> returned and stated that the node was no longer part of the cluster.  Not
> clear why that happened.
>
> This, by itself, is not really an issue.  Looking at the logs (at bottom
> of this note), you can see theflowfile drop and immediate adjustment to the
> node 3 to state of CONNECTING to the cluster.  Subsequently, an error
> occurred:  "*Disconnecting node due to Failed to properly handle
> Reconnection request due to
> org.apache.nifi.controller.serialization.FlowSynchronizationException:
> Failed to connect node to cluster because local flow controller partially
> updated. Administrator should disconnect node and review flow for
> corruption*".
>
> When I attempted to readd the node from the UI, it repeated this error.
>
> I compared users.xml and authroizations.xml on all three nodes, textually
> the same and identical md5sum on all (issues with users.xml and
> authorizations.xml were listed online as usual suspects).
>
> I then offloaded the node via the UI to make sure I didn't have anything
> stuck in queue on node 3 and hoped it would allow the node to rejoin.
> After offloading, I attempted to reconnect and what happened next gave me a
> heart attack:  Node 3 now showed as connected but in the UI (accessed via
> node 1), ALL PROCESSORS WERE SHOWN AS STOPPED.
>
> A quick review showed that traffic was actually flowing (View status
> history showed flowfiles moving, observing some of our observation queues
> showed traffic on nodes 1 and 2).  Removing node 3 from the cluster
> restored the UI to show everything running, but adding it back in showed
> everything as stopped.
>
> I tried to start some processors while node 3 was connected and while I
> could start individual processors, I could not do a "global" start by right
> clicking on canvas and trying "start".  I set up a sample processor to
> generate a file on all 3 nodes and it did generate a new flowfile on node
> 3.  All of that worked fine.
>
> We have 400+ processors that I would need to hand start and I am super
> nervous about the cluster deciding to make node 3 the primary which would
> affect some timed process that we are running on the primary node.  As long
> as I don't restart the http input feed, I COULD restart all the processors,
> but this seems like the wrong process.
>
> Anyone have any idea what I did wrong and how to fix it?  The errors show
> in the log attached happened before any offloading, but I wondered if the
> offloading caused part of this issue.
>
> Is there anything I can do to readd the node without having to restart all
> the processors manually?
>
> Should I clean up the node and add it as a "new" node and let it
> completely sync?
>
> Thanks for any insight!
>
>
> Dave
>
>
> ---
> Log:
> ---
> *2022-12-08 22:26:20,706 INFO [Drop FlowFiles for Connection
> 8b0ee741-0183-1000--68704c93] o.a.n.c.queue.SwappablePriorityQueue
> Successfully dropped 69615 FlowFiles (35496003 bytes) from Connection with
> ID 8b0ee741-0183-1000--68704c93 on behalf of u...@org.com
> *
> 2022-12-08 22:26:20,707 INFO [Process Cluster Protocol Request-29]
> o.a.n.c.c.node.NodeClusterCoordinator Status of prod-stsc2-3:8443 changed
> from NodeConnectionStatus[nodeId=prod-stsc2-3:8443, state=CONNECTED,
> updateId=108] to NodeConnectionStatus[nodeId=prod-stsc2-3:8443,
> state=CONNECTING, updateId=114]
> 2022-12-08 22:26:20,707 INFO [Process Cluster Protocol Request-29]
> o.a.n.c.p.impl.SocketProtocolListener Finished processing request
> 070fe65c-4a77-41d0-9d7f-8f08ede6ac71 (type=NODE_STATUS_CHANGE, length=1217
> bytes) from prod-stsc2-1.internal.cloudapp.net in 10 seconds, 842 millis
> 2022-12-08 22:26:20,750 INFO [Reconnect to Cluster]
> o.a.nifi.controller.StandardFlowService Setting Flow Controller's Node ID:
> prod-stsc2-3:8443
> 2022-12-08 22:26:20,751 INFO [Reconnect to Cluster]
> o.a.n.c.s.VersionedFlowSynchronizer Synchronizing FlowController with
> proposed flow: Controller Already Synchronized = true
> 2022-12-08 22:26:21,298 INFO [NiFi Web Server-1481911]
> o.a.c.f.imps.CuratorFrameworkImpl Starting
> 2022-12-08 22:26:21,298 INFO [NiFi Web Server-1481911]
> org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes
> 2022-12-08 22:26:21,304 INFO [NiFi Web Server-1481911]
> o.a.c.f.imps.CuratorFrameworkImpl Default schema
> 2022-12-08 22:26:21,314 INFO [NiFi Web Server-1481911-EventThread]
> o.a.c.f.state.ConnectionStateManager 

Issue with removal and re-add of a cluster node

2022-12-09 Thread David Early via users
Hi all,

I have a major issue and am not sure what to do about it.

We have a 3 node cluster.  I was working on a one-off load for some data we
were doing out of sequence and it resulted in build-up of some flowfiles in
a queue.  In order to prevent a backpressure situation, I cleared one of
the holding queues that had about 69k flow files.

During the clear operation the node I was on (node 3 UI in this case)
returned and stated that the node was no longer part of the cluster.  Not
clear why that happened.

This, by itself, is not really an issue.  Looking at the logs (at bottom of
this note), you can see theflowfile drop and immediate adjustment to the
node 3 to state of CONNECTING to the cluster.  Subsequently, an error
occurred:  "*Disconnecting node due to Failed to properly handle
Reconnection request due to
org.apache.nifi.controller.serialization.FlowSynchronizationException:
Failed to connect node to cluster because local flow controller partially
updated. Administrator should disconnect node and review flow for
corruption*".

When I attempted to readd the node from the UI, it repeated this error.

I compared users.xml and authroizations.xml on all three nodes, textually
the same and identical md5sum on all (issues with users.xml and
authorizations.xml were listed online as usual suspects).

I then offloaded the node via the UI to make sure I didn't have anything
stuck in queue on node 3 and hoped it would allow the node to rejoin.
After offloading, I attempted to reconnect and what happened next gave me a
heart attack:  Node 3 now showed as connected but in the UI (accessed via
node 1), ALL PROCESSORS WERE SHOWN AS STOPPED.

A quick review showed that traffic was actually flowing (View status
history showed flowfiles moving, observing some of our observation queues
showed traffic on nodes 1 and 2).  Removing node 3 from the cluster
restored the UI to show everything running, but adding it back in showed
everything as stopped.

I tried to start some processors while node 3 was connected and while I
could start individual processors, I could not do a "global" start by right
clicking on canvas and trying "start".  I set up a sample processor to
generate a file on all 3 nodes and it did generate a new flowfile on node
3.  All of that worked fine.

We have 400+ processors that I would need to hand start and I am super
nervous about the cluster deciding to make node 3 the primary which would
affect some timed process that we are running on the primary node.  As long
as I don't restart the http input feed, I COULD restart all the processors,
but this seems like the wrong process.

Anyone have any idea what I did wrong and how to fix it?  The errors show
in the log attached happened before any offloading, but I wondered if the
offloading caused part of this issue.

Is there anything I can do to readd the node without having to restart all
the processors manually?

Should I clean up the node and add it as a "new" node and let it completely
sync?

Thanks for any insight!


Dave


---
Log:
---
*2022-12-08 22:26:20,706 INFO [Drop FlowFiles for Connection
8b0ee741-0183-1000--68704c93] o.a.n.c.queue.SwappablePriorityQueue
Successfully dropped 69615 FlowFiles (35496003 bytes) from Connection with
ID 8b0ee741-0183-1000--68704c93 on behalf of u...@org.com
*
2022-12-08 22:26:20,707 INFO [Process Cluster Protocol Request-29]
o.a.n.c.c.node.NodeClusterCoordinator Status of prod-stsc2-3:8443 changed
from NodeConnectionStatus[nodeId=prod-stsc2-3:8443, state=CONNECTED,
updateId=108] to NodeConnectionStatus[nodeId=prod-stsc2-3:8443,
state=CONNECTING, updateId=114]
2022-12-08 22:26:20,707 INFO [Process Cluster Protocol Request-29]
o.a.n.c.p.impl.SocketProtocolListener Finished processing request
070fe65c-4a77-41d0-9d7f-8f08ede6ac71 (type=NODE_STATUS_CHANGE, length=1217
bytes) from prod-stsc2-1.internal.cloudapp.net in 10 seconds, 842 millis
2022-12-08 22:26:20,750 INFO [Reconnect to Cluster]
o.a.nifi.controller.StandardFlowService Setting Flow Controller's Node ID:
prod-stsc2-3:8443
2022-12-08 22:26:20,751 INFO [Reconnect to Cluster]
o.a.n.c.s.VersionedFlowSynchronizer Synchronizing FlowController with
proposed flow: Controller Already Synchronized = true
2022-12-08 22:26:21,298 INFO [NiFi Web Server-1481911]
o.a.c.f.imps.CuratorFrameworkImpl Starting
2022-12-08 22:26:21,298 INFO [NiFi Web Server-1481911]
org.apache.zookeeper.ClientCnxnSocket jute.maxbuffer value is 4194304 Bytes
2022-12-08 22:26:21,304 INFO [NiFi Web Server-1481911]
o.a.c.f.imps.CuratorFrameworkImpl Default schema
2022-12-08 22:26:21,314 INFO [NiFi Web Server-1481911-EventThread]
o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2022-12-08 22:26:21,322 INFO [NiFi Web Server-1481911-EventThread]
o.a.c.framework.imps.EnsembleTracker New config event received:
{server.1=prod-zkpr-1:2888:3888:participant;0.0.0.0:2181, version=0,

Re: Disabling flows - nifi registry

2022-12-09 Thread Kevin Doran
 Hi Deepak,

So far, we have been honoring the following policy for what constitutes a
change in version control:


   - stopped/started does not count as a "local change"
   - enabled/disabled does count as a change, and that state is captured in
   the flow snapshot json version saved to registry.


One reason for this is that some users want to setup CI/CD to deploy from
Registry and automatically start a flow in the target NiFi. If there are
components they don't want to start, the disabled state gives them a way to
capture that configuration in the flow.

Is there a reason you prefer disable over stop in your lower environment? I
leave flows stopped in dev environments all the time and have never run
into an issue, but of course, everyone's workflow and use case is slightly
different, so I'm interested in hearing your perspective on this to see if
we need to consider something more flexible.

Cheers,
Kevin

On Dec 9, 2022 at 10:26:27, "Chirthani, Deepak Reddy" <
c-deepakreddy.chirth...@charter.com> wrote:

> Hey guys,
>
> So, once I fully develop and parameterize my nifi dataflow, let’s say in
> dev environment, I enable the version control, import the flow in higher
> environment and turn on the dataflow. In most of the cases both the flows
> in lower and higher environments will be running. Let says dev nifi
> connects to dev gcp pubsub and prod nifi connects to prod gcp pubsub.
> However, in some cases, we do want to stop and disable the flow in lower
> env. When I do that the registry is identifying that local changes are made
> to the flow which is nothing but all the components are disabled. I don’t
> want to keep the processors in stopped state on the canvas(registry do not
> identify for stopping) but want to disable them. Any workaround for
> registry not to identify local changes when flow is disabled?
>
>
>
> Thanks
>
> Deepak
>
>
>
> *[image: image005]*
>
> *Deepak Reddy* | Data Engineer
> ​IT Centers of Excellence
> 13736 Riverport Dr., Maryland Heights, MO 63043
>
>
> The contents of this e-mail message and
> any attachments are intended solely for the
> addressee(s) and may contain confidential
> and/or legally privileged information. If you
> are not the intended recipient of this message
> or if this message has been addressed to you
> in error, please immediately alert the sender
> by reply e-mail and then delete this message
> and any attachments. If you are not the
> intended recipient, you are notified that
> any use, dissemination, distribution, copying,
> or storage of this message or any attachment
> is strictly prohibited.
>


Disabling flows - nifi registry

2022-12-09 Thread Chirthani, Deepak Reddy
Hey guys,

So, once I fully develop and parameterize my nifi dataflow, let’s say in dev 
environment, I enable the version control, import the flow in higher 
environment and turn on the dataflow. In most of the cases both the flows in 
lower and higher environments will be running. Let says dev nifi connects to 
dev gcp pubsub and prod nifi connects to prod gcp pubsub. However, in some 
cases, we do want to stop and disable the flow in lower env. When I do that the 
registry is identifying that local changes are made to the flow which is 
nothing but all the components are disabled. I don’t want to keep the 
processors in stopped state on the canvas(registry do not identify for 
stopping) but want to disable them. Any workaround for registry not to identify 
local changes when flow is disabled?

Thanks
Deepak

[image005]
Deepak Reddy | Data Engineer
​IT Centers of Excellence
13736 Riverport Dr., Maryland Heights, MO 63043

E-MAIL CONFIDENTIALITY NOTICE: 
The contents of this e-mail message and any attachments are intended solely for 
the addressee(s) and may contain confidential and/or legally privileged 
information. If you are not the intended recipient of this message or if this 
message has been addressed to you in error, please immediately alert the sender 
by reply e-mail and then delete this message and any attachments. If you are 
not the intended recipient, you are notified that any use, dissemination, 
distribution, copying, or storage of this message or any attachment is strictly 
prohibited.


NTLMv2 authentication for PutSMBFile

2022-12-09 Thread Hitimana, IA (Izi)
Hi,

On our system we've seen that PutSMBFile uses an NTLM to authenticate on the 
domain. Is there a way to change this to NTLMv2? We are currently on an older 
version of NiFi (1.13.2)

I would love to hear your suggestions.

Best regards,

Izi Hitimana
DevOps Engineer | +31 (0)6 - 38 544 713

Eneco Energy Trade | Team Integration and Asset Control
Marten Meesweg 5 | 3068 AV | Rotterdam | eneco.com

[cid:image001.png@01D90BD4.01C65E10]

Climate Neutral in 2035. We're on it. Now. This e-mail is intended for the 
addressee(s)
and may be confidential. If this message is not intended for you, you are 
kindly requested
to notify the sender and delete the message.