Brian,

Great use case and you're right we don't have an easy way of handling this
now.  If you do indeed have a load balancer in front of the receiving nifi
cluster and it can support affinity of some kind then it is possible you
can set a header in HTTP Post I believe which would come from a flowfile
attribute which would be on each split and would be the hash of its full
object.  If the load balancer ensured all splits (based on that header
matching) were on the same machine then you'd be in business.  There are
some load balancers that do this (i'm thinking of a commercial one).  But,
I admit that is a lot of moving parts to keep in mind.  We need to improve
our site-to-site feature to do things like automatically split content for
you and handle the partitioning/affinity logic I suggested.  You might also
consider avoiding the splitting for now to keep things super simple though
I recognize that exposes alternative tradeoffs.

Great case for us to work on/rally around though.

Thanks
Joe

On Wed, Feb 15, 2017 at 4:29 PM, Kiran <[email protected]>
wrote:

> Hello,
>
> I need to send data from one organisation to another but there are data
> size limits between them (this isn't my choice and has been enforced on
> me). I've got a 4 node NiFi cluster in each organisation.
>
> The sending NiFi cluster has the following data flow:
> Ingest the data by various means
>    -> Compress Data using CompressContent
>      -> If file size > X amount I use SplitContent
>        -> HTTPS POST to load balancer sitting in front of the NiFi
> cluster in the other organisation
>
> On the receiving NiFi cluster I wanted to:
> -> Receive the data
>    -> MergeContent
>      -> Do what ever else with the data...
>
> The problem I can't get round is that if I split the content into 3
> fragments and send them to the receiving NiFi instance because it's
> behind a load balancer I can't guarantee that the 3 fragments are
> received by the same node.
>
> Q1) I'm assuming that for MergeContent to work all the fragments of a
> single piece of data have to arrive on the same NiFi node or is there a
> option to have it working across a cluster?
>
> Q2) How long does the MergeContent processor wait for all the fragments?
> If one of the fragments gets lost does it timeout after a certain
> period?
>
> I was thinking one way to solve this of to have the HTTPListener on the
> receiving NiFi only listening on the primary node which would ensure all
> the fragments arrive on the same node. The downside would be that I end
> up with idle NiFi nodes.
>
> Is there anything obvious that I'm missed that would solve my issue?
>
> Thanks in advance,
>
> Brian
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon>
>  Virus-free.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link>
>

Reply via email to