Hi Sudeep, Answers to your questions:
1) In a clustered environment, each processor will have the same configuration the flows are identical across each node in the flow with the NCM establishing mutations across each instance via a two-phase commit. Through the mentioned scheduling strategy, only the primary node will receive an invocation of its #onTrigger. 2) To use clustering, a ZooKeeper ensemble is required. ZooKeeper is currently used to provide consistent state in either a local or cluster scope for the entirety of the cluster and will help forms the basis for clustering redesign [1] and possibly the proposed feature of HA Data [2]. NiFi makes it possible to use embedded ZooKeepers with each NiFi instance to create an ensemble or deferring to an external ensemble. For anyone else that may not have had the chance to review the associated docs, additional details are available in the Admin guide on State Management [3] 3) The answer now is that the data lives with the node. Typically production environments make use of things like RAID and durable block storage for cloud environments. As mentioned previously, this is something that has been discussed and a feature proposal has been drafted. [1] https://cwiki.apache.org/confluence/display/NIFI/Clustering+Redesign [2] https://cwiki.apache.org/confluence/display/NIFI/High+Availability+Processing [3] http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management On Wed, Mar 16, 2016 at 1:46 AM, sudeep mishra <[email protected]> wrote: > Hi, > > I have few doubts regarding NiFi cluster. > > 1) In case we schedule a processor to run on only one node using 'On > primary node' as 'Scheduling Strategy' then is the processor still > configured on all nodes? If yes, then does then which is the actual step in > processor lief cycle that takes place on the primary node? > > 2) I am starting to look into 'State Management' and do not see any > Zookeeper quorum specified in default 'state-management.xml' and also > 'nifi.state.management.embedded.zookeeper.start' is set as 'false' under > 'nifi.properties' file. Do we have to configure a Zookeeper for cluster and > how is Zookeepr being used? > > 3) All the nodes in the cluster work on the data flow. What happens to the > data if in case the node processing it goes down? Does the NCM takes care > of sending the failed data to other nodes for processing as the nodes do > not communicate with each other. > > Appreciate if someone can guide for any documentation apart from those > available under 'Administrator Guide' for NiFi clustering. > > > Thanks & Regards, > > Sudeep >
