Fun with DistributeLoad

Martijn Dekkers Tue, 12 Jun 2018 23:12:55 -0700

All,

I have a more general question. We will be uploading files to an S3
compatible storage system. In our case, this system presents 24 endpoints
to upload to. Given the volume of data we are sending to this device, we
want to avoid using a loadbalancer like HAProxy for some use-cases, to
avoid bottlenecks. The use-case is to enable direct, parallel uploads
against all S3 endpoints. We will have roughly 100 systems generating data,
each running this Flow.


We did consider running a local loadbalancer on each of the systems, but
this is unfortunately not possible due to other constraints.

We have set up a flow that uses DistributeLoad to distribute flowfiles to
24 PutS3Object processors, and it works like a charm.

We did some gymnastics with counters and failure routing to deal with
endpoints becoming unavailable for some reason - the failure connection for
each PutS3Object processor is routed to the next Puts3Object processor in
line, all around these 24, until at the end we route a dedicated failure
management flow. The only downside is a very busy UI, as we need to fit all
these processors, and their success and failure connectors.

Does anyone have any ideas on how to clean this up somewhat, or streamline
something like this?

Thanks

Martijn

Fun with DistributeLoad

Reply via email to