Hi Bas, I think I was able to create a NiFi flow that works as you expected. Multiple GetSQS share workload using NiFi back pressure.
Did you set Batch Size of GetSQS to 1? I've put a flow template and detailed description on this Gist. https://gist.github.com/ijokarumawak/4a9189ac630cf6cf6cd2d35c19b43fd8 Hope this helps, thanks! Koji On Mon, Feb 6, 2017 at 6:50 PM, Bas van Kortenhof <[email protected]> wrote: > Hi all, > > In a clustered NiFi setup I have a flow (see below) which basically consists > of a GetSQS processor that receives notifications of added files in a S3 > bucket, a FetchS3Object processor that downloads the files and a custom > processor that parses the data. Because of the size of the files the fetch > and parse processors take minutes to run. > > > > My goal is to get the nodes in the cluster to each process one file at the > time. However, when I set the file threshold of the two connections to 1 it > can happen that when two files become available and one node is currently > parsing the first file, that it also picks up the second file on that node > because the first connection is empty. However, in this case I want another > node to pick up the file as they have more resources available. This problem > becomes even bigger when other short running processors are added to the > flow (for instance UpdateAttributes processors) as each of the connections > required to fit these connections in the flow can then be filled by a > flowfile, even though other nodes are idle. > > I tried setting the threshold of the connections to 0 but this does not seem > to work as NiFi then seems to ignore this value (the processor before such a > connection is not halted). Does anyone know a way to achieve this behaviour? > > > > -- > View this message in context: > http://apache-nifi-users-list.2361937.n4.nabble.com/Problem-when-using-backpressure-to-distribute-load-over-nodes-in-a-cluster-tp863.html > Sent from the Apache NiFi Users List mailing list archive at Nabble.com.
