Hi Bas,

I think I was able to create a NiFi flow that works as you expected.
Multiple GetSQS share workload using NiFi back pressure.

Did you set Batch Size of GetSQS to 1?

I've put a flow template and detailed description on this Gist.
https://gist.github.com/ijokarumawak/4a9189ac630cf6cf6cd2d35c19b43fd8

Hope this helps, thanks!
Koji

On Mon, Feb 6, 2017 at 6:50 PM, Bas van Kortenhof
<[email protected]> wrote:
> Hi all,
>
> In a clustered NiFi setup I have a flow (see below) which basically consists
> of a GetSQS processor that receives notifications of added files in a S3
> bucket, a FetchS3Object processor that downloads the files and a custom
> processor that parses the data. Because of the size of the files the fetch
> and parse processors take minutes to run.
>
>
>
> My goal is to get the nodes in the cluster to each process one file at the
> time. However, when I set the file threshold of the two connections to 1 it
> can happen that when two files become available and one node is currently
> parsing the first file, that it also picks up the second file on that node
> because the first connection is empty. However, in this case I want another
> node to pick up the file as they have more resources available. This problem
> becomes even bigger when other short running processors are added to the
> flow (for instance UpdateAttributes processors) as each of the connections
> required to fit these connections in the flow can then be filled by a
> flowfile, even though other nodes are idle.
>
> I tried setting the threshold of the connections to 0 but this does not seem
> to work as NiFi then seems to ignore this value (the processor before such a
> connection is not halted). Does anyone know a way to achieve this behaviour?
>
>
>
> --
> View this message in context: 
> http://apache-nifi-users-list.2361937.n4.nabble.com/Problem-when-using-backpressure-to-distribute-load-over-nodes-in-a-cluster-tp863.html
> Sent from the Apache NiFi Users List mailing list archive at Nabble.com.

Reply via email to