OK, I didn't understand when you initially said "prevent data loss". I thought you meant gracefully stop the instance to avoid data corruption of some sort.
Now I better understand your situation, I see two options: - the one mentioned by Jon: decommission the node [1] with the REST API and hope for the best but with no guarantee - have the data on attached disks that you could re-attach to a new node at a later time (I don't know the AWS specifics around that) [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#decommission-nodes Le jeu. 29 août 2019 à 03:32, Jon Logan <[email protected]> a écrit : > Remember that spot instances are given shutdown notifications on a > best-effort basis[1]. You would have to disconnect the node, drain it, then > shut it down after draining, and hope you do so before you get killed. You > could also consider the new hibernation feature -- it'll hibernate your > node instead of terminating, and then rehydrate it at a later time. Your > cluster would have a disconnected node in the mean time though. All of > these scenarios introduce a significant potential of data loss, you should > be sure you could reproduce the data from a durable source if needed (ex. > Kafka, etc), or be accepting of the data loss. > > > [1] > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html > While > we make every effort to provide this warning as soon as possible, it is > possible that your Spot Instance is terminated before the warning can be > made available. Test your application to ensure that it handles an > unexpected instance termination gracefully, even if you are testing for > interruption notices. You can do so by running the application using an > On-Demand Instance and then terminating the On-Demand Instance yourself. > > On Wed, Aug 28, 2019 at 8:57 PM Jean-Sebastien Vachon < > [email protected]> wrote: > >> Hi Craig, >> >> I made some additional tests and I am afraid I lost flows... I used the >> same flow I described earlier, generated around 30k flows and load balanced >> them on the three nodes forming my cluster. >> I then shutdown one of the machine. The result is that I lost 10k flows >> that were scheduled to be processed on this machine. This is a problem I >> need to address and I'll be looking for ideas shortly. >> >> For those interested in automating the removal of a spot instance from a >> cluster... here is something to get you started. >> AWS recommend to monitor the URL found in the if statement every 5s (or >> so)... Since cron only supports 1 minute intervals and nothing smaller, >> I accomplish what I wanted by adding multiple crons and sleeping for a >> variable amount of time. >> >> You will need jq and curl to be installed on your machine for this to >> work. >> The basic idea is to wait until the web page appears to exist and then >> trigger a series of actions. >> >> --- >> >> #!/bin/bash >> sleep $1 >> >> NODE_IP=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4` >> <http://169.254.169.254/latest/meta-data/local-ipv4> >> NODE_ID=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster" | >> jq --arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' == >> $IP).nodeId'` >> OTHER_NODE=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster" >> | jq --arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' != >> $IP).address' | head -1` >> >> if [ -z $(curl -Is >> http://169.254.169.254/latest/meta-data/spot/termination-time | head -1 >> | grep 404 | cut -d' ' -f 2) ] >> then >> echo "Running shutdown hook." >> systemctl stop nifi >> sleep 5 >> curl -s -X DELETE "http:// >> ${OTHER_NODE}:8088/nifi-api/controller/cluster/nodes/$NODE_ID" >> fi >> >> ------------------------------ >> *From:* Jean-Sebastien Vachon <[email protected]> >> *Sent:* Wednesday, August 28, 2019 7:39 PM >> *To:* [email protected] <[email protected]> >> *Subject:* Re: clean shutdown >> >> Hi Craig, >> >> First the generic stuff... >> >> according to the tests I made, no flows are lost when a machine is >> removed from the cluster. They seem to be requeued. >> However, I only tested with a very basic flow and not with my whole flow >> which involves a lot of things. >> Basically, I used a GenerateFlow to generate some data and a dummy Python >> process to do something with it. The queue between the two >> processors was configured to do load balancing using a round robin. I >> must admit that I haven't look if the item was requeued and dispatched to >> another node. >> The output of the python module was split between success and failure and >> no single flow reached the failure state. >> >> then to AWS specific stuff... >> >> I had to script a few things to cleanup within the two minutes warning >> AWS is giving me. >> Since I am using spot instances, I know the instance will not come back >> so I had to automate the clean up of the cluster by >> using an API call to remove the machine from the cluster. In order to >> remove the machine from the cluster, I need to stop Nifi first and then >> remove the machine through >> a call to the API on a second node. I am still polishing the script to >> accomplish this. I may share it once it is working as expected in case >> someone else has this issue. >> >> Let me know if you need more details about anything... >> ------------------------------ >> *From:* Craig Knell <[email protected]> >> *Sent:* Wednesday, August 28, 2019 6:52 PM >> *To:* [email protected] <[email protected]> >> *Subject:* Re: clean shutdown >> >> Hi Jean-Sebastien, >> >> I’d be interested to hear how this performs >> >> Best regards >> >> Craig >> >> On 28 Aug 2019, at 22:28, Jean-Sebastien Vachon <[email protected]> >> wrote: >> >> Hi Pierre, >> >> thanks for your input. >> >> I am already intercepting AWS termination notification so I will add a >> few steps and see how it reacts >> >> Thanks again >> ------------------------------ >> *From:* Pierre Villard <[email protected]> >> *Sent:* Wednesday, August 28, 2019 4:17 AM >> *To:* [email protected] <[email protected]> >> *Subject:* Re: clean shutdown >> >> Hi Jean-Sebastien, >> >> When you stop NiFi, by default, it will try to gracefully stop everything >> in 10 seconds, and if not all components are nicely stopped after that, it >> will force shut down the NiFi process. This is configured with >> "nifi.flowcontroller.graceful.shutdown.period" in nifi.properties file. If >> you have processors/CS that might take longer to stop gracefully (because >> of connections to external systems for instance), you could increase this >> value. >> >> I'm not very familiar with AWS spot instances but I'd try to catch the >> spot notification event to stop the NiFi service on the host before the >> instance is stopped/killed. >> >> Pierre >> >> >> >> Le mar. 27 août 2019 à 20:05, Jean-Sebastien Vachon < >> [email protected]> a écrit : >> >> Hi everybody, >> >> I am working with AWS spot instances and one thing that is giving me a >> hard time is to perform a clean (and quick) shutdown of Nifi in order to >> prevent data loss. >> >> AWS will give you about two minutes to clean up everything before the >> machine is actually shutdown. >> Is there a way to stop/kill all processes running on the host without >> loosing anything? It is fine if all the flowfiles being processed are >> simply requeued. >> >> Would simply killing the processes achieve this? (I doubt so)... would it >> be better to fetch a list of running processors and terminate them using >> Nifi's API? >> >> All ideas and thoughts are welcome >> >> thanks >> >>
