Hi all,

In a clustered NiFi setup I have a flow (see below) which basically consists
of a GetSQS processor that receives notifications of added files in a S3
bucket, a FetchS3Object processor that downloads the files and a custom
processor that parses the data. Because of the size of the files the fetch
and parse processors take minutes to run. 



My goal is to get the nodes in the cluster to each process one file at the
time. However, when I set the file threshold of the two connections to 1 it
can happen that when two files become available and one node is currently
parsing the first file, that it also picks up the second file on that node
because the first connection is empty. However, in this case I want another
node to pick up the file as they have more resources available. This problem
becomes even bigger when other short running processors are added to the
flow (for instance UpdateAttributes processors) as each of the connections
required to fit these connections in the flow can then be filled by a
flowfile, even though other nodes are idle.

I tried setting the threshold of the connections to 0 but this does not seem
to work as NiFi then seems to ignore this value (the processor before such a
connection is not halted). Does anyone know a way to achieve this behaviour?



--
View this message in context: 
http://apache-nifi-users-list.2361937.n4.nabble.com/Problem-when-using-backpressure-to-distribute-load-over-nodes-in-a-cluster-tp863.html
Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

Reply via email to