Brian, Great use case and you're right we don't have an easy way of handling this now. If you do indeed have a load balancer in front of the receiving nifi cluster and it can support affinity of some kind then it is possible you can set a header in HTTP Post I believe which would come from a flowfile attribute which would be on each split and would be the hash of its full object. If the load balancer ensured all splits (based on that header matching) were on the same machine then you'd be in business. There are some load balancers that do this (i'm thinking of a commercial one). But, I admit that is a lot of moving parts to keep in mind. We need to improve our site-to-site feature to do things like automatically split content for you and handle the partitioning/affinity logic I suggested. You might also consider avoiding the splitting for now to keep things super simple though I recognize that exposes alternative tradeoffs.
Great case for us to work on/rally around though. Thanks Joe On Wed, Feb 15, 2017 at 4:29 PM, Kiran <[email protected]> wrote: > Hello, > > I need to send data from one organisation to another but there are data > size limits between them (this isn't my choice and has been enforced on > me). I've got a 4 node NiFi cluster in each organisation. > > The sending NiFi cluster has the following data flow: > Ingest the data by various means > -> Compress Data using CompressContent > -> If file size > X amount I use SplitContent > -> HTTPS POST to load balancer sitting in front of the NiFi > cluster in the other organisation > > On the receiving NiFi cluster I wanted to: > -> Receive the data > -> MergeContent > -> Do what ever else with the data... > > The problem I can't get round is that if I split the content into 3 > fragments and send them to the receiving NiFi instance because it's > behind a load balancer I can't guarantee that the 3 fragments are > received by the same node. > > Q1) I'm assuming that for MergeContent to work all the fragments of a > single piece of data have to arrive on the same NiFi node or is there a > option to have it working across a cluster? > > Q2) How long does the MergeContent processor wait for all the fragments? > If one of the fragments gets lost does it timeout after a certain > period? > > I was thinking one way to solve this of to have the HTTPListener on the > receiving NiFi only listening on the primary node which would ensure all > the fragments arrive on the same node. The downside would be that I end > up with idle NiFi nodes. > > Is there anything obvious that I'm missed that would solve my issue? > > Thanks in advance, > > Brian > > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=icon> > Virus-free. > www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient&utm_term=link> >
