Mans,

Nodes in a cluster work independently from one another and do not know about 
each other. That is accurate.
Each node in a cluster runs the same flow. Typically, if you want to pull from 
HDFS and partition that data
across the cluster, you would run ListHDFS on the Primary Node only, and then 
use Site-to-Site [1] to distribute
that listing to all nodes in the cluster. Each node would then pull the data 
that it is responsible to pull and begin
working on it. We do realize that this is not ideal to have to setup this way, 
and it is something that we are working
on so that it is much easier to have that listing automatically distributed 
across the cluster.

I'm not sure that I understand your #3 - how do we design the workflow so that 
the nodes work on one file at a time?
For each Processor, you can configure how many threads (Concurrent Tasks) are 
to be used in the Scheduling tab
of the Processor Configuration dialog. You can certainly configure that to run 
only a single Concurrent Task. 
This is the number of Concurrent Tasks that will run on each node in the 
cluster, not the total number of concurrent
tasks that would run across the entire cluster.

I am not sure that I understand your #4 either. Are you indicating that you 
want to configure each node in the cluster
with a different value for a processor property?

Does this help?

Thanks
-Mark

[1] http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site


> On Oct 14, 2015, at 4:49 PM, M Singh <[email protected]> wrote:
> 
> Hi:
> 
> 
> 
> A few questions about NiFi cluster:
> 
> 1. If we have multiple worker nodes in the cluster, do they partition the 
> work if the source allows partitioning - eg: HDFS, or do all the nodes work 
> on the same data ?
> 2. If the nodes partition the work, then how do they coordinate the work 
> distribution and recovery etc ?  From the documentation it appears that the 
> workers are not aware of each other.
> 3. If I need to process multiple files - how do we design the work flow so 
> that the nodes work on one file at a time ?
> 4. If I have multiple arguments and need to pass one parameter to each 
> worker, how can I do that ?
> 5. Is there any way to control how many workers are involved in processing 
> the flow ?
> 6. Does specifying the number of threads in the processor distribute work on 
> multiple workers ?  Does it split the task across the threads or is it the 
> responsibility of the application ?
> 
> I tried to find some answers from the documentation and users list but could 
> not get a clear picture.
> 
> Thanks
> 
> Mans
> 
> 
> 
> 

Reply via email to