Hi all, In a clustered NiFi setup I have a flow (see below) which basically consists of a GetSQS processor that receives notifications of added files in a S3 bucket, a FetchS3Object processor that downloads the files and a custom processor that parses the data. Because of the size of the files the fetch and parse processors take minutes to run.
My goal is to get the nodes in the cluster to each process one file at the time. However, when I set the file threshold of the two connections to 1 it can happen that when two files become available and one node is currently parsing the first file, that it also picks up the second file on that node because the first connection is empty. However, in this case I want another node to pick up the file as they have more resources available. This problem becomes even bigger when other short running processors are added to the flow (for instance UpdateAttributes processors) as each of the connections required to fit these connections in the flow can then be filled by a flowfile, even though other nodes are idle. I tried setting the threshold of the connections to 0 but this does not seem to work as NiFi then seems to ignore this value (the processor before such a connection is not halted). Does anyone know a way to achieve this behaviour? -- View this message in context: http://apache-nifi-users-list.2361937.n4.nabble.com/Problem-when-using-backpressure-to-distribute-load-over-nodes-in-a-cluster-tp863.html Sent from the Apache NiFi Users List mailing list archive at Nabble.com.
