All, I have a more general question. We will be uploading files to an S3 compatible storage system. In our case, this system presents 24 endpoints to upload to. Given the volume of data we are sending to this device, we want to avoid using a loadbalancer like HAProxy for some use-cases, to avoid bottlenecks. The use-case is to enable direct, parallel uploads against all S3 endpoints. We will have roughly 100 systems generating data, each running this Flow.
We did consider running a local loadbalancer on each of the systems, but this is unfortunately not possible due to other constraints. We have set up a flow that uses DistributeLoad to distribute flowfiles to 24 PutS3Object processors, and it works like a charm. We did some gymnastics with counters and failure routing to deal with endpoints becoming unavailable for some reason - the failure connection for each PutS3Object processor is routed to the next Puts3Object processor in line, all around these 24, until at the end we route a dedicated failure management flow. The only downside is a very busy UI, as we need to fit all these processors, and their success and failure connectors. Does anyone have any ideas on how to clean this up somewhat, or streamline something like this? Thanks Martijn