OK, I didn't understand when you initially said "prevent data loss". I
thought you meant gracefully stop the instance to avoid data corruption of
some sort.

Now I better understand your situation, I see two options:
- the one mentioned by Jon: decommission the node [1] with the REST API and
hope for the best but with no guarantee
- have the data on attached disks that you could re-attach to a new node at
a later time (I don't know the AWS specifics around that)

[1]
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#decommission-nodes

Le jeu. 29 août 2019 à 03:32, Jon Logan <[email protected]> a écrit :

> Remember that spot instances are given shutdown notifications on a
> best-effort basis[1]. You would have to disconnect the node, drain it, then
> shut it down after draining, and hope you do so before you get killed. You
> could also consider the new hibernation feature -- it'll hibernate your
> node instead of terminating, and then rehydrate it at a later time. Your
> cluster would have a disconnected node in the mean time though. All of
> these scenarios introduce a significant potential of data loss, you should
> be sure you could reproduce the data from a durable source if needed (ex.
> Kafka, etc), or be accepting of the data loss.
>
>
> [1]
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html 
> While
> we make every effort to provide this warning as soon as possible, it is
> possible that your Spot Instance is terminated before the warning can be
> made available. Test your application to ensure that it handles an
> unexpected instance termination gracefully, even if you are testing for
> interruption notices. You can do so by running the application using an
> On-Demand Instance and then terminating the On-Demand Instance yourself.
>
> On Wed, Aug 28, 2019 at 8:57 PM Jean-Sebastien Vachon <
> [email protected]> wrote:
>
>> Hi Craig,
>>
>> I made some additional tests and I am afraid I lost flows... I used the
>> same flow I described earlier, generated around 30k flows and load balanced
>> them on the three nodes forming my cluster.
>> I then shutdown one of the machine. The result is that I lost 10k flows
>> that were scheduled to be processed on this machine. This is a problem I
>> need to address and I'll be looking for ideas shortly.
>>
>> For those interested in automating the removal of a spot instance from a
>> cluster... here is something to get you started.
>> AWS recommend to monitor the URL found in the if statement every 5s (or
>> so)... Since cron only supports 1 minute intervals and nothing smaller,
>> I accomplish what I wanted by adding multiple crons and sleeping for a
>> variable amount of time.
>>
>> You will need jq and curl to be installed on your machine for this to
>> work.
>> The basic idea is to wait until the web page appears to exist and then
>> trigger a series of actions.
>>
>> ---
>>
>> #!/bin/bash
>> sleep $1
>>
>> NODE_IP=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`
>> <http://169.254.169.254/latest/meta-data/local-ipv4>
>> NODE_ID=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster"; |
>> jq --arg IP "${NODE_IP}" -r '.cluster.nodes[] | select('.address' ==
>> $IP).nodeId'`
>> OTHER_NODE=`curl -s "http://${NODE_IP}:8088/nifi-api/controller/cluster";
>> | jq --arg IP "${NODE_IP}"  -r '.cluster.nodes[] | select('.address' !=
>> $IP).address' | head -1`
>>
>> if [ -z $(curl -Is
>> http://169.254.169.254/latest/meta-data/spot/termination-time | head -1
>> | grep 404 | cut -d' ' -f 2) ]
>> then
>>     echo "Running shutdown hook."
>>     systemctl stop nifi
>>     sleep 5
>>     curl -s -X DELETE "http://
>> ${OTHER_NODE}:8088/nifi-api/controller/cluster/nodes/$NODE_ID"
>> fi
>>
>> ------------------------------
>> *From:* Jean-Sebastien Vachon <[email protected]>
>> *Sent:* Wednesday, August 28, 2019 7:39 PM
>> *To:* [email protected] <[email protected]>
>> *Subject:* Re: clean shutdown
>>
>> Hi Craig,
>>
>> First the generic stuff...
>>
>> according to the tests I made, no flows are lost when a machine is
>> removed from the cluster.  They seem to be requeued.
>> However, I only tested with a very basic flow and not with my whole flow
>> which involves a lot of things.
>> Basically, I used a GenerateFlow to generate some data and a dummy Python
>> process to do something with it. The queue between the two
>> processors was configured to do load balancing using a round robin. I
>> must admit that I haven't look if the item was requeued and dispatched to
>> another node.
>> The output of the python module was split between success and failure and
>> no single flow reached the failure state.
>>
>> then to AWS specific stuff...
>>
>> I had to script a few things to cleanup within the two minutes warning
>> AWS is giving me.
>> Since I am using spot instances, I know the instance will not come back
>> so I had to automate the clean up of the cluster by
>> using an API call to remove the machine from the cluster. In order to
>> remove the machine from the cluster, I need to stop Nifi first and then
>> remove the machine through
>> a call to the API on a second node. I am still polishing the script to
>> accomplish this. I may share it once it is working as expected in case
>> someone else has this issue.
>>
>> Let me know if you need more details about anything...
>> ------------------------------
>> *From:* Craig Knell <[email protected]>
>> *Sent:* Wednesday, August 28, 2019 6:52 PM
>> *To:* [email protected] <[email protected]>
>> *Subject:* Re: clean shutdown
>>
>> Hi Jean-Sebastien,
>>
>> I’d be interested to hear how this performs
>>
>> Best regards
>>
>> Craig
>>
>> On 28 Aug 2019, at 22:28, Jean-Sebastien Vachon <[email protected]>
>> wrote:
>>
>> Hi Pierre,
>>
>> thanks for your input.
>>
>> I am already intercepting AWS termination notification so I will add a
>> few steps and see how it reacts
>>
>> Thanks again
>> ------------------------------
>> *From:* Pierre Villard <[email protected]>
>> *Sent:* Wednesday, August 28, 2019 4:17 AM
>> *To:* [email protected] <[email protected]>
>> *Subject:* Re: clean shutdown
>>
>> Hi Jean-Sebastien,
>>
>> When you stop NiFi, by default, it will try to gracefully stop everything
>> in 10 seconds, and if not all components are nicely stopped after that, it
>> will force shut down the NiFi process. This is configured with
>> "nifi.flowcontroller.graceful.shutdown.period" in nifi.properties file. If
>> you have processors/CS that might take longer to stop gracefully (because
>> of connections to external systems for instance), you could increase this
>> value.
>>
>> I'm not very familiar with AWS spot instances but I'd try to catch the
>> spot notification event to stop the NiFi service on the host before the
>> instance is stopped/killed.
>>
>> Pierre
>>
>>
>>
>> Le mar. 27 août 2019 à 20:05, Jean-Sebastien Vachon <
>> [email protected]> a écrit :
>>
>> Hi everybody,
>>
>> I am working with AWS spot instances and one thing that is giving me a
>> hard time is to perform a clean (and quick) shutdown of Nifi in order to
>> prevent data loss.
>>
>> AWS will give you about two minutes to clean up everything before the
>> machine is actually shutdown.
>> Is there a way to stop/kill all processes running on the host without
>> loosing anything? It is fine if all the flowfiles being processed are
>> simply requeued.
>>
>> Would simply killing the processes achieve this? (I doubt so)... would it
>> be better to fetch a list of running processors and terminate them using
>> Nifi's API?
>>
>> All ideas and thoughts are welcome
>>
>> thanks
>>
>>

Reply via email to