Kevin Doran created NIFIREG-150:
-----------------------------------

             Summary: Maintenance mode switch via REST API for data backup
                 Key: NIFIREG-150
                 URL: https://issues.apache.org/jira/browse/NIFIREG-150
             Project: NiFi Registry
          Issue Type: New Feature
            Reporter: Kevin Doran


Currently, NiFi Registry does not offer High Availability (HA) out of the box. 
One has to configure an environment around one or more NiFi Registry instances 
to achieve the required level of recoverability and availability.

This is not a requirement in many deployment scenarios as NiFi Registry is on 
the critical path of most system architectures. That is, it is a place to save 
and retrieve versions of flows and extensions, but if NiFi Registry is 
temporarily offline, NiFi data flows deployed to NiFi and MiNiFi instances 
continue to function just fine. 

However, a bigger concern is data availability and backup; that is, the 
guarantee that data persisted to NiFi Registry is not lost due to an instance 
failure. Eventually, it will be nice to offer a NiFi Registry HA solution that 
allows for distributed/clustered data or external persistence providers (that 
themselves can be HA).

In the meantime, folks are looking for the best way to build their own data 
backup and recovery solutions for NiFi Registry. A lot of possible solutions 
and recommendations for backup and recovery or [cold-slave 
failover|http://www.sonatype.org/nexus/2015/07/10/high-availability-ha-and-continuous-integration-ci-with-nexus-oss/]
 require copying the data in the NiFi Registry's home directory host storage to 
another location, where it could be used to create another NiFi Registry with 
the same data on demand, e.g., in a cloud migration or disaster recovery 
scenario.

If the NiFi Registry service is running when this copy operation is performed, 
one risks copying partially-written data/records/files that could be corrupted 
when later loaded/read from disk. One solution for this today is to stop the 
NiFi Registry, but this leaves it unavailable for users and scripts, which is 
not ideal. For example, continuous deployment scripts for NiFi data flows that 
read flows from NiFi registry would not be able to access a required service.

In the long-term, it would be nice to offer proper HA NiFi Registry solution 
out of the box. However, in the short-term, it would be nice for users to be 
able to put a NiFi Registry instance into "read only maintenance mode", during 
which the contents of the NiFi Registry home directory could be more safely 
copied to a backup location or cold spare. (I say more safely because some 
files in the home directory, such as the default location for logs, would 
continue to be written too, but the most important files, such as the 
file-based database and persistence providers, would stabilize after existing 
write operations are flushed to disk).

Implementation thoughts:
 - endpoints for turning maintenance mode on/off would fit in nicely as custom 
endpoints under Actuator (NIFIREG-134), and therefore could be access 
controlled but Actuator authorization rules
 - when maintenance mode is enabled, a custom Spring filter could intercept any 
requests that modify persisted state (eg, by resource path and HTTP method 
pattern matching) return a "503 Service Unavailable" status code indicating 
that the resource is temporarily unavailable. A similar approach is used to 
authorize access to certain endpoints.
 - when maintenance mode is enabled, the /actuator/health endpoint could also 
indicate this, giving clients a way to check if a server is in maintenance mode 
or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to