Re: clean shutdown

Jean-Sebastien Vachon Thu, 29 Aug 2019 05:00:18 -0700

Oh I think I see how it works for the API. You need to update the node's status 
using this call.


/controller/cluster/nodes/{id}

 I will try this and see how it goes

thanks again


________________________________
From: Jean-Sebastien Vachon <[email protected]>
Sent: Thursday, August 29, 2019 7:54 AM
To: [email protected] <[email protected]>
Subject: Re: clean shutdown

Thanks to both of you to getting back to me...

I didn't know about offloading a node. I will certainly look into this. I 
quickly looked through the API and saw no mention of the offload word.
Does that mean there is no equivalent function in the API?


Thanks


________________________________
From: Pierre Villard <[email protected]>
Sent: Thursday, August 29, 2019 3:41 AM
To: [email protected] <[email protected]>
Subject: Re: clean shutdown

OK, I didn't understand when you initially said "prevent data loss". I thought 
you meant gracefully stop the instance to avoid data corruption of some sort.

Now I better understand your situation, I see two options:
- the one mentioned by Jon: decommission the node [1] with the REST API and 
hope for the best but with no guarantee
- have the data on attached disks that you could re-attach to a new node at a 
later time (I don't know the AWS specifics around that)

[1] 
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#decommission-nodes

Le jeu. 29 août 2019 à 03:32, Jon Logan 
<[email protected]<mailto:[email protected]>> a écrit :
Remember that spot instances are given shutdown notifications on a best-effort 
basis[1]. You would have to disconnect the node, drain it, then shut it down 
after draining, and hope you do so before you get killed. You could also 
consider the new hibernation feature -- it'll hibernate your node instead of 
terminating, and then rehydrate it at a later time. Your cluster would have a 
disconnected node in the mean time though. All of these scenarios introduce a 
significant potential of data loss, you should be sure you could reproduce the 
data from a durable source if needed (ex. Kafka, etc), or be accepting of the 
data loss.


[1]  
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html 
While we make every effort to provide this warning as soon as possible, it is 
possible that your Spot Instance is terminated before the warning can be made 
available. Test your application to ensure that it handles an unexpected 
instance termination gracefully, even if you are testing for interruption 
notices. You can do so by running the application using an On-Demand Instance 
and then terminating the On-Demand Instance yourself.

On Wed, Aug 28, 2019 at 8:57 PM Jean-Sebastien Vachon 
<[email protected]<mailto:[email protected]>> wrote:
Hi Craig,

I made some additional tests and I am afraid I lost flows... I used the same 
flow I described earlier, generated around 30k flows and load balanced them on 
the three nodes forming my cluster.
I then shutdown one of the machine. The result is that I lost 10k flows that 
were scheduled to be processed on this machine. This is a problem I need to 
address and I'll be looking for ideas shortly.

For those interested in automating the removal of a spot instance from a 
cluster... here is something to get you started.
AWS recommend to monitor the URL found in the if statement every 5s (or so)... 
Since cron only supports 1 minute intervals and nothing smaller,
I accomplish what I wanted by adding multiple crons and sleeping for a variable 
amount of time.

You will need jq and curl to be installed on your machine for this to work.
The basic idea is to wait until the web page appears to exist and then trigger 
a series of actions.

---

#!/bin/bash
sleep $1

NODE_IP=`curl -s 
http://169.254.169.254/latest/meta-data/local-ipv4`<http://169.254.169.254/latest/meta-data/local-ipv4>
NODE_ID=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster"; | jq 
--arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' == $IP).nodeId'`
OTHER_NODE=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster"; | jq 
--arg IP "${NODE_IP}"  -r '.cluster.nodes[] | select('.address' != 
$IP).address' | head -1`

if [ -z $(curl -Is 
http://169.254.169.254/latest/meta-data/spot/termination-time | head -1 | grep 
404 | cut -d' ' -f 2) ]
then
    echo "Running shutdown hook."
    systemctl stop nifi
    sleep 5
    curl -s -X DELETE 
"http://${OTHER_NODE}:8088/nifi-api/controller/cluster/nodes/$NODE_ID";
fi

________________________________
From: Jean-Sebastien Vachon 
<[email protected]<mailto:[email protected]>>
Sent: Wednesday, August 28, 2019 7:39 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: clean shutdown

Hi Craig,

First the generic stuff...

according to the tests I made, no flows are lost when a machine is removed from 
the cluster.  They seem to be requeued.
However, I only tested with a very basic flow and not with my whole flow which 
involves a lot of things.
Basically, I used a GenerateFlow to generate some data and a dummy Python 
process to do something with it. The queue between the two
processors was configured to do load balancing using a round robin. I must 
admit that I haven't look if the item was requeued and dispatched to another 
node.
The output of the python module was split between success and failure and no 
single flow reached the failure state.

then to AWS specific stuff...

I had to script a few things to cleanup within the two minutes warning AWS is 
giving me.
Since I am using spot instances, I know the instance will not come back so I 
had to automate the clean up of the cluster by
using an API call to remove the machine from the cluster. In order to remove 
the machine from the cluster, I need to stop Nifi first and then remove the 
machine through
a call to the API on a second node. I am still polishing the script to 
accomplish this. I may share it once it is working as expected in case someone 
else has this issue.

Let me know if you need more details about anything...
________________________________
From: Craig Knell <[email protected]<mailto:[email protected]>>
Sent: Wednesday, August 28, 2019 6:52 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: clean shutdown

Hi Jean-Sebastien,

I’d be interested to hear how this performs

Best regards

Craig

On 28 Aug 2019, at 22:28, Jean-Sebastien Vachon 
<[email protected]<mailto:[email protected]>> wrote:

Hi Pierre,

thanks for your input.

I am already intercepting AWS termination notification so I will add a few 
steps and see how it reacts

Thanks again
________________________________
From: Pierre Villard 
<[email protected]<mailto:[email protected]>>
Sent: Wednesday, August 28, 2019 4:17 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Re: clean shutdown

Hi Jean-Sebastien,

When you stop NiFi, by default, it will try to gracefully stop everything in 10 
seconds, and if not all components are nicely stopped after that, it will force 
shut down the NiFi process. This is configured with 
"nifi.flowcontroller.graceful.shutdown.period" in nifi.properties file. If you 
have processors/CS that might take longer to stop gracefully (because of 
connections to external systems for instance), you could increase this value.

I'm not very familiar with AWS spot instances but I'd try to catch the spot 
notification event to stop the NiFi service on the host before the instance is 
stopped/killed.

Pierre



Le mar. 27 août 2019 à 20:05, Jean-Sebastien Vachon 
<[email protected]<mailto:[email protected]>> a écrit :
Hi everybody,

I am working with AWS spot instances and one thing that is giving me a hard 
time is to perform a clean (and quick) shutdown of Nifi in order to prevent 
data loss.

AWS will give you about two minutes to clean up everything before the machine 
is actually shutdown.
Is there a way to stop/kill all processes running on the host without loosing 
anything? It is fine if all the flowfiles being processed are simply requeued.

Would simply killing the processes achieve this? (I doubt so)... would it be 
better to fetch a list of running processors and terminate them using Nifi's 
API?

All ideas and thoughts are welcome

thanks

Re: clean shutdown

Reply via email to