Re: Fun with DistributeLoad

Sivaprasanna Tue, 12 Jun 2018 23:31:00 -0700

Martijn,

One clean up approach that comes immediately to my mind is to use 'Process
Groups'. Using which, you can group processors together that perform a
related sequence of actions. You can think of them as 'functions' or
'methods' in programming terms. And since you mentioned that you are using
24 PutS3Object processors, you can even club them together under a
processor group and name it something like 'Write to S3' and have it
abstracted from the developer's immediate view. so it would like
DistributeLoad -> Write to S3 [Process Group] -> Failure handling flow. If
the failure handling involves a sequence of steps, you can again put them
together under a processor group and name it "Failure Handler".


Thanks,
Sivaprasanna


On Wed, Jun 13, 2018 at 11:41 AM, Martijn Dekkers <mart...@dekkers.org.uk>
wrote:

> All,
>
> I have a more general question. We will be uploading files to an S3
> compatible storage system. In our case, this system presents 24 endpoints
> to upload to. Given the volume of data we are sending to this device, we
> want to avoid using a loadbalancer like HAProxy for some use-cases, to
> avoid bottlenecks. The use-case is to enable direct, parallel uploads
> against all S3 endpoints. We will have roughly 100 systems generating data,
> each running this Flow.
>
> We did consider running a local loadbalancer on each of the systems, but
> this is unfortunately not possible due to other constraints.
>
> We have set up a flow that uses DistributeLoad to distribute flowfiles to
> 24 PutS3Object processors, and it works like a charm.
>
> We did some gymnastics with counters and failure routing to deal with
> endpoints becoming unavailable for some reason - the failure connection for
> each PutS3Object processor is routed to the next Puts3Object processor in
> line, all around these 24, until at the end we route a dedicated failure
> management flow. The only downside is a very busy UI, as we need to fit all
> these processors, and their success and failure connectors.
>
> Does anyone have any ideas on how to clean this up somewhat, or streamline
> something like this?
>
> Thanks
>
> Martijn
>

Re: Fun with DistributeLoad

Reply via email to