Oh I think I see how it works for the API. You need to update the node's status
using this call.
/controller/cluster/nodes/{id}
I will try this and see how it goes
thanks again
________________________________
From: Jean-Sebastien Vachon <[email protected]>
Sent: Thursday, August 29, 2019 7:54 AM
To: [email protected] <[email protected]>
Subject: Re: clean shutdown
Thanks to both of you to getting back to me...
I didn't know about offloading a node. I will certainly look into this. I
quickly looked through the API and saw no mention of the offload word.
Does that mean there is no equivalent function in the API?
Thanks
________________________________
From: Pierre Villard <[email protected]>
Sent: Thursday, August 29, 2019 3:41 AM
To: [email protected] <[email protected]>
Subject: Re: clean shutdown
OK, I didn't understand when you initially said "prevent data loss". I thought
you meant gracefully stop the instance to avoid data corruption of some sort.
Now I better understand your situation, I see two options:
- the one mentioned by Jon: decommission the node [1] with the REST API and
hope for the best but with no guarantee
- have the data on attached disks that you could re-attach to a new node at a
later time (I don't know the AWS specifics around that)
[1]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#decommission-nodes
Le jeu. 29 août 2019 à 03:32, Jon Logan
<[email protected]<mailto:[email protected]>> a écrit :
Remember that spot instances are given shutdown notifications on a best-effort
basis[1]. You would have to disconnect the node, drain it, then shut it down
after draining, and hope you do so before you get killed. You could also
consider the new hibernation feature -- it'll hibernate your node instead of
terminating, and then rehydrate it at a later time. Your cluster would have a
disconnected node in the mean time though. All of these scenarios introduce a
significant potential of data loss, you should be sure you could reproduce the
data from a durable source if needed (ex. Kafka, etc), or be accepting of the
data loss.
[1]
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html
While we make every effort to provide this warning as soon as possible, it is
possible that your Spot Instance is terminated before the warning can be made
available. Test your application to ensure that it handles an unexpected
instance termination gracefully, even if you are testing for interruption
notices. You can do so by running the application using an On-Demand Instance
and then terminating the On-Demand Instance yourself.
On Wed, Aug 28, 2019 at 8:57 PM Jean-Sebastien Vachon
<[email protected]<mailto:[email protected]>> wrote:
Hi Craig,
I made some additional tests and I am afraid I lost flows... I used the same
flow I described earlier, generated around 30k flows and load balanced them on
the three nodes forming my cluster.
I then shutdown one of the machine. The result is that I lost 10k flows that
were scheduled to be processed on this machine. This is a problem I need to
address and I'll be looking for ideas shortly.
For those interested in automating the removal of a spot instance from a
cluster... here is something to get you started.
AWS recommend to monitor the URL found in the if statement every 5s (or so)...
Since cron only supports 1 minute intervals and nothing smaller,
I accomplish what I wanted by adding multiple crons and sleeping for a variable
amount of time.
You will need jq and curl to be installed on your machine for this to work.
The basic idea is to wait until the web page appears to exist and then trigger
a series of actions.
---
#!/bin/bash
sleep $1
NODE_IP=`curl -s
http://169.254.169.254/latest/meta-data/local-ipv4`<http://169.254.169.254/latest/meta-data/local-ipv4>
NODE_ID=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster" | jq
--arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' == $IP).nodeId'`
OTHER_NODE=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster" | jq
--arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' !=
$IP).address' | head -1`
if [ -z $(curl -Is
http://169.254.169.254/latest/meta-data/spot/termination-time | head -1 | grep
404 | cut -d' ' -f 2) ]
then
echo "Running shutdown hook."
systemctl stop nifi
sleep 5
curl -s -X DELETE
"http://${OTHER_NODE}:8088/nifi-api/controller/cluster/nodes/$NODE_ID"
fi
________________________________
From: Jean-Sebastien Vachon
<[email protected]<mailto:[email protected]>>
Sent: Wednesday, August 28, 2019 7:39 PM
To: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Subject: Re: clean shutdown
Hi Craig,
First the generic stuff...
according to the tests I made, no flows are lost when a machine is removed from
the cluster. They seem to be requeued.
However, I only tested with a very basic flow and not with my whole flow which
involves a lot of things.
Basically, I used a GenerateFlow to generate some data and a dummy Python
process to do something with it. The queue between the two
processors was configured to do load balancing using a round robin. I must
admit that I haven't look if the item was requeued and dispatched to another
node.
The output of the python module was split between success and failure and no
single flow reached the failure state.
then to AWS specific stuff...
I had to script a few things to cleanup within the two minutes warning AWS is
giving me.
Since I am using spot instances, I know the instance will not come back so I
had to automate the clean up of the cluster by
using an API call to remove the machine from the cluster. In order to remove
the machine from the cluster, I need to stop Nifi first and then remove the
machine through
a call to the API on a second node. I am still polishing the script to
accomplish this. I may share it once it is working as expected in case someone
else has this issue.
Let me know if you need more details about anything...
________________________________
From: Craig Knell <[email protected]<mailto:[email protected]>>
Sent: Wednesday, August 28, 2019 6:52 PM
To: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Subject: Re: clean shutdown
Hi Jean-Sebastien,
I’d be interested to hear how this performs
Best regards
Craig
On 28 Aug 2019, at 22:28, Jean-Sebastien Vachon
<[email protected]<mailto:[email protected]>> wrote:
Hi Pierre,
thanks for your input.
I am already intercepting AWS termination notification so I will add a few
steps and see how it reacts
Thanks again
________________________________
From: Pierre Villard
<[email protected]<mailto:[email protected]>>
Sent: Wednesday, August 28, 2019 4:17 AM
To: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Subject: Re: clean shutdown
Hi Jean-Sebastien,
When you stop NiFi, by default, it will try to gracefully stop everything in 10
seconds, and if not all components are nicely stopped after that, it will force
shut down the NiFi process. This is configured with
"nifi.flowcontroller.graceful.shutdown.period" in nifi.properties file. If you
have processors/CS that might take longer to stop gracefully (because of
connections to external systems for instance), you could increase this value.
I'm not very familiar with AWS spot instances but I'd try to catch the spot
notification event to stop the NiFi service on the host before the instance is
stopped/killed.
Pierre
Le mar. 27 août 2019 à 20:05, Jean-Sebastien Vachon
<[email protected]<mailto:[email protected]>> a écrit :
Hi everybody,
I am working with AWS spot instances and one thing that is giving me a hard
time is to perform a clean (and quick) shutdown of Nifi in order to prevent
data loss.
AWS will give you about two minutes to clean up everything before the machine
is actually shutdown.
Is there a way to stop/kill all processes running on the host without loosing
anything? It is fine if all the flowfiles being processed are simply requeued.
Would simply killing the processes achieve this? (I doubt so)... would it be
better to fetch a list of running processors and terminate them using Nifi's
API?
All ideas and thoughts are welcome
thanks