Hello Chakro,

When you create a cluster of NiFi instances, each node in the cluster is acting 
independently and in exactly
the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly the same 
flow. However, they will be
pulling in different data and therefore operating on different data.

So if you pull in 10 1-gig files from S3, each of those files will be processed 
on the node that pulled the data
in. NiFi does not currently shuffle data around between nodes in the cluster 
(you can use site-to-site to do
this if you want to, but it won't happen automatically). If you set the number 
of Concurrent Tasks to 5, then
you will have up to 5 threads running for that processor on each node.

The only exception to this is the Primary Node. You can schedule a Processor to 
run only on the Primary Node
by right-clicking on the Processor, and going to the Configure menu. In the 
Scheduling tab, you can change
the Scheduling Strategy to Primary Node Only. In this case, that Processor will 
only be triggered to run on
whichever node is elected the Primary Node (this can be changed in the Cluster 
management screen by clicking
the appropriate icon in the top-right corner of the UI).

The GetFile/PutFile will run on all nodes (unless you schedule it to run on 
primary node only).

If you are attempting to have a single input running HTTP and then push that 
out across the entire cluster to 
process the data, you would have a few options. First, you could just use an 
HTTP Load Balancer in front of NiFi.
The other option would be to have a ListenHTTP processor run on Primary Node 
only and then use Site-to-Site
to distribute the data to other nodes.

For more info on site-to-site, you can see the Site-to-Site section of the User 
Guide at
http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site 
<http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site>

If you have any more questions, let us know!

Thanks
-Mark

> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla 
> <chakrader.dewaraga...@lifelock.com> wrote:
> 
> Nifi Team – I would like to understand the advantages of Nifi clustering 
> setup. 
> 
> Questions : 
> 
>  - How does workflow work on multiple nodes ? Does it share the resources 
> intra nodes ? 
> Lets say I need to pull data 10 1Gig files from S3, how does work load 
> distribute  ? Setting concurrent tasks as 5. Does it spew 5 tasks per node ? 
>  
>  - How to “isolate” the processor to the master node (or one node)?
> 
> - Getfile/Putfile processors on cluster setup, does it get/put on primary 
> node ? How do I force processor to look in one of the slave node? 
> 
> - How can we have a workflow where the input side we want to receive requests 
> (http) and then the rest of the pipeline need to run in parallel on all the 
> nodes ? 
> 
> Thanks,
> -Chakro
> 
> The information contained in this transmission may contain privileged and 
> confidential information. It is intended only for the use of the person(s) 
> named above. If you are not the intended recipient, you are hereby notified 
> that any review, dissemination, distribution or duplication of this 
> communication is strictly prohibited. If you are not the intended recipient, 
> please contact the sender by reply email and destroy all copies of the 
> original message.

Reply via email to