Hi Sudeep,

Answers to your questions:

1) In a clustered environment, each processor will have the same
configuration the flows are identical across each node in the flow with the
NCM establishing mutations across each instance via a two-phase commit.
Through the mentioned scheduling strategy, only the primary node will
receive an invocation of its #onTrigger.
2) To use clustering, a ZooKeeper ensemble is required.  ZooKeeper is
currently used to provide consistent state in either a local or cluster
scope for the entirety of the cluster and will help forms the basis for
clustering redesign [1] and possibly the proposed feature of HA Data [2].
NiFi makes it possible to use embedded ZooKeepers with each NiFi instance
to create an ensemble or deferring to an external ensemble.  For anyone
else that may not have had the chance to review the associated docs,
additional details are available in the Admin guide on State Management [3]
3) The answer now is that the data lives with the node.  Typically
production environments make use of things like RAID and durable block
storage for cloud environments.  As mentioned previously, this is something
that has been discussed and a feature proposal has been drafted.

[1] https://cwiki.apache.org/confluence/display/NIFI/Clustering+Redesign
[2]
https://cwiki.apache.org/confluence/display/NIFI/High+Availability+Processing
[3]
http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management


On Wed, Mar 16, 2016 at 1:46 AM, sudeep mishra <[email protected]>
wrote:

> Hi,
>
> I have few doubts regarding NiFi cluster.
>
> 1) In case we schedule a processor to run on only one node using 'On
> primary node' as 'Scheduling Strategy' then is the processor still
> configured on all nodes? If yes, then does then which is the actual step in
> processor lief cycle that takes place on the primary node?
>
> 2) I am starting to look into 'State Management' and do not see any
> Zookeeper quorum specified in default 'state-management.xml' and also
> 'nifi.state.management.embedded.zookeeper.start' is set as 'false' under
> 'nifi.properties' file. Do we have to configure a Zookeeper for cluster and
> how is Zookeepr being used?
>
> 3) All the nodes in the cluster work on the data flow. What happens to the
> data if in case the node processing it goes down? Does the NCM takes care
> of sending the failed data to other nodes for processing as the nodes do
> not communicate with each other.
>
> Appreciate if someone can guide for any documentation apart from those
> available under 'Administrator Guide' for NiFi clustering.
>
>
> Thanks & Regards,
>
> Sudeep
>

Reply via email to