Chakri, Glad you got site-to-site working.
Regarding the data distribution, I'm not sure why it is behaving that way. I just did a similar test running ncm, node1, and node2 all on my local machine, with GenerateFlowFile running every 10 seconds, and Input Port going to a LogAttribute, and I see it alternating between node1 and node2 logs every 10 seconds. Is there anything in your primary node logs (primary_node/logs/nifi-app.log) when you see the data on the other node? -Bryan On Sun, Jan 10, 2016 at 3:44 PM, Joe Witt <joe.w...@gmail.com> wrote: > Chakri, > > Would love to hear what you've learned and how that differed from the > docs themselves. Site-to-site has proven difficult to setup so we're > clearly not there yet in having the right operator/admin experience. > > Thanks > Joe > > On Sun, Jan 10, 2016 at 3:41 PM, Chakrader Dewaragatla > <chakrader.dewaraga...@lifelock.com> wrote: > > I was able to get site-to-site work. > > I tried to follow your instructions to send data distribute across the > > nodes. > > > > GenerateFlowFile (On Primary) —> RPG > > RPG —> Input Port —> Putfile (Time driven scheduling) > > > > However, data is only written to one slave (Secondary slave). Primary > slave > > has not data. > > > > Image screenshot : > > http://tinyurl.com/jjvjtmq > > > > From: Chakrader Dewaragatla <chakrader.dewaraga...@lifelock.com> > > Date: Sunday, January 10, 2016 at 11:26 AM > > > > To: "users@nifi.apache.org" <users@nifi.apache.org> > > Subject: Re: Nifi cluster features - Questions > > > > Bryan – Thanks – I am trying to setup site-to-site. > > I have two slaves and one NCM. > > > > My properties as follows : > > > > On both Slaves: > > > > nifi.remote.input.socket.port=10880 > > nifi.remote.input.secure=false > > > > On NCM: > > nifi.remote.input.socket.port=10880 > > nifi.remote.input.secure=false > > > > When I try drop remote process group (with http://<NCM IP>:8080/nifi), > I see > > error as follows for two nodes. > > > > [<Slave1 ip>:8080] - Remote instance is not allowed for Site to Site > > communication > > [<Slave2 ip>:8080] - Remote instance is not allowed for Site to Site > > communication > > > > Do you have insight why its trying to connecting 8080 on slaves ? When do > > 10880 port come into the picture ? I remember try setting site to site > few > > months back and succeeded. > > > > Thanks, > > -Chakri > > > > > > > > From: Bryan Bende <bbe...@gmail.com> > > Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> > > Date: Saturday, January 9, 2016 at 11:22 AM > > To: "users@nifi.apache.org" <users@nifi.apache.org> > > Subject: Re: Nifi cluster features - Questions > > > > The sending node (where the remote process group is) will distribute the > > data evenly across the two nodes, so an individual file will only be > sent to > > one of the nodes. You could think of it as if a separate NiFi instance > was > > sending directly to a two node cluster, it would be evenly distributing > the > > data across the two nodes. In this case it just so happens to all be > with in > > the same cluster. > > > > The most common use case for this scenario is the List and Fetch > processors > > like HDFS. You can perform the listing on primary node, and then > distribute > > the results so the fetching takes place on all nodes. > > > > On Saturday, January 9, 2016, Chakrader Dewaragatla > > <chakrader.dewaraga...@lifelock.com> wrote: > >> > >> Bryan – Thanks, how do the nodes distribute the load for a input port. > As > >> port is open and listening on two nodes, does it copy same files on > both > >> the nodes? > >> I need to try this setup to see the results, appreciate your help. > >> > >> Thanks, > >> -Chakri > >> > >> From: Bryan Bende <bbe...@gmail.com> > >> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> > >> Date: Friday, January 8, 2016 at 3:44 PM > >> To: "users@nifi.apache.org" <users@nifi.apache.org> > >> Subject: Re: Nifi cluster features - Questions > >> > >> Hi Chakri, > >> > >> I believe the DistributeLoad processor is more for load balancing when > >> sending to downstream systems. For example, if you had two HTTP > endpoints, > >> you could have the first relationship from DistributeLoad going to a > >> PostHTTP that posts to endpoint #1, and the second relationship going > to a > >> second PostHTTP that goes to endpoint #2. > >> > >> If you want to distribute the data with in the cluster, then you need to > >> use site-to-site. The way you do this is the following... > >> > >> - Add an Input Port connected to your PutFile. > >> - Add GenerateFlowFile scheduled on primary node only, connected to a > >> Remote Process Group. The Remote Process Group should be connected to > the > >> Input Port from the previous step. > >> > >> So both nodes have an input port listening for data, but only the > primary > >> node produces a FlowFile and sends it to the RPG which then > re-distributes > >> it back to one of the Input Ports. > >> > >> In order for this to work you need to set nifi.remote.input.socket.port > in > >> nifi.properties to some available port, and you probably want > >> nifi.remote.input.secure=false for testing. > >> > >> -Bryan > >> > >> > >> On Fri, Jan 8, 2016 at 6:27 PM, Chakrader Dewaragatla > >> <chakrader.dewaraga...@lifelock.com> wrote: > >>> > >>> Mark – I have setup a two node cluster and tried the following . > >>> GenrateFlowfile processor (Run only on primary node) —> > DistributionLoad > >>> processor (RoundRobin) —> PutFile > >>> > >>> >> The GetFile/PutFile will run on all nodes (unless you schedule it to > >>> >> run on primary node only). > >>> From your above comment, It should put file on two nodes. It put files > on > >>> primary node only. Any thoughts ? > >>> > >>> Thanks, > >>> -Chakri > >>> > >>> From: Mark Payne <marka...@hotmail.com> > >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> > >>> Date: Wednesday, October 7, 2015 at 11:28 AM > >>> > >>> To: "users@nifi.apache.org" <users@nifi.apache.org> > >>> Subject: Re: Nifi cluster features - Questions > >>> > >>> Chakri, > >>> > >>> Correct - when NiFi instances are clustered, they do not transfer data > >>> between the nodes. This is very different > >>> than you might expect from something like Storm or Spark, as the key > >>> goals and design are quite different. > >>> We have discussed providing the ability to allow the user to indicate > >>> that they want to have the framework > >>> do load balancing for specific connections in the background, but it's > >>> still in more of a discussion phase. > >>> > >>> Site-to-Site is simply the capability that we have developed to > transfer > >>> data between one instance of > >>> NiFi and another instance of NiFi. So currently, if we want to do load > >>> balancing across the cluster, we would > >>> create a site-to-site connection (by dragging a Remote Process Group > onto > >>> the graph) and give that > >>> site-to-site connection the URL of our cluster. That way, you can push > >>> data to your own cluster, effectively > >>> providing a load balancing capability. > >>> > >>> If you were to just run ListenHTTP without setting it to Primary Node, > >>> then every node in the cluster will be listening > >>> for incoming HTTP connections. So you could then use a simple load > >>> balancer in front of NiFi to distribute the load > >>> across your cluster. > >>> > >>> Does this help? If you have any more questions we're happy to help! > >>> > >>> Thanks > >>> -Mark > >>> > >>> > >>> On Oct 7, 2015, at 2:32 PM, Chakrader Dewaragatla > >>> <chakrader.dewaraga...@lifelock.com> wrote: > >>> > >>> Mark - Thanks for the notes. > >>> > >>> >> The other option would be to have a ListenHTTP processor run on > >>> >> Primary Node only and then use Site-to-Site to distribute the data > to other > >>> >> nodes. > >>> Lets say I have 5 node cluster and ListenHTTP processor on Primary > node, > >>> collected data on primary node is not transfered to other nodes by > default > >>> for processing despite all nodes are part of one cluster? > >>> If ListenHTTP processor is running as a dafult (with out explicit > >>> setting to run on primary node), how does the data transferred to rest > of > >>> the nodes? Does site-to-site come in play when I make one processor to > run > >>> on primary node ? > >>> > >>> Thanks, > >>> -Chakri > >>> > >>> From: Mark Payne <marka...@hotmail.com> > >>> Reply-To: "users@nifi.apache.org" <users@nifi.apache.org> > >>> Date: Wednesday, October 7, 2015 at 7:00 AM > >>> To: "users@nifi.apache.org" <users@nifi.apache.org> > >>> Subject: Re: Nifi cluster features - Questions > >>> > >>> Hello Chakro, > >>> > >>> When you create a cluster of NiFi instances, each node in the cluster > is > >>> acting independently and in exactly > >>> the same way. I.e., if you have 5 nodes, all 5 nodes will run exactly > the > >>> same flow. However, they will be > >>> pulling in different data and therefore operating on different data. > >>> > >>> So if you pull in 10 1-gig files from S3, each of those files will be > >>> processed on the node that pulled the data > >>> in. NiFi does not currently shuffle data around between nodes in the > >>> cluster (you can use site-to-site to do > >>> this if you want to, but it won't happen automatically). If you set the > >>> number of Concurrent Tasks to 5, then > >>> you will have up to 5 threads running for that processor on each node. > >>> > >>> The only exception to this is the Primary Node. You can schedule a > >>> Processor to run only on the Primary Node > >>> by right-clicking on the Processor, and going to the Configure menu. In > >>> the Scheduling tab, you can change > >>> the Scheduling Strategy to Primary Node Only. In this case, that > >>> Processor will only be triggered to run on > >>> whichever node is elected the Primary Node (this can be changed in the > >>> Cluster management screen by clicking > >>> the appropriate icon in the top-right corner of the UI). > >>> > >>> The GetFile/PutFile will run on all nodes (unless you schedule it to > run > >>> on primary node only). > >>> > >>> If you are attempting to have a single input running HTTP and then push > >>> that out across the entire cluster to > >>> process the data, you would have a few options. First, you could just > use > >>> an HTTP Load Balancer in front of NiFi. > >>> The other option would be to have a ListenHTTP processor run on Primary > >>> Node only and then use Site-to-Site > >>> to distribute the data to other nodes. > >>> > >>> For more info on site-to-site, you can see the Site-to-Site section of > >>> the User Guide at > >>> > http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#site-to-site > >>> > >>> If you have any more questions, let us know! > >>> > >>> Thanks > >>> -Mark > >>> > >>> On Oct 7, 2015, at 2:33 AM, Chakrader Dewaragatla > >>> <chakrader.dewaraga...@lifelock.com> wrote: > >>> > >>> Nifi Team – I would like to understand the advantages of Nifi > clustering > >>> setup. > >>> > >>> Questions : > >>> > >>> - How does workflow work on multiple nodes ? Does it share the > resources > >>> intra nodes ? > >>> Lets say I need to pull data 10 1Gig files from S3, how does work load > >>> distribute ? Setting concurrent tasks as 5. Does it spew 5 tasks per > node ? > >>> > >>> - How to “isolate” the processor to the master node (or one node)? > >>> > >>> - Getfile/Putfile processors on cluster setup, does it get/put on > primary > >>> node ? How do I force processor to look in one of the slave node? > >>> > >>> - How can we have a workflow where the input side we want to receive > >>> requests (http) and then the rest of the pipeline need to run in > parallel on > >>> all the nodes ? > >>> > >>> Thanks, > >>> -Chakro > >>> > >>> ________________________________ > >>> The information contained in this transmission may contain privileged > and > >>> confidential information. It is intended only for the use of the > person(s) > >>> named above. If you are not the intended recipient, you are hereby > notified > >>> that any review, dissemination, distribution or duplication of this > >>> communication is strictly prohibited. If you are not the intended > recipient, > >>> please contact the sender by reply email and destroy all copies of the > >>> original message. > >>> ________________________________ > >>> > >>> > >>> ________________________________ > >>> The information contained in this transmission may contain privileged > and > >>> confidential information. It is intended only for the use of the > person(s) > >>> named above. If you are not the intended recipient, you are hereby > notified > >>> that any review, dissemination, distribution or duplication of this > >>> communication is strictly prohibited. If you are not the intended > recipient, > >>> please contact the sender by reply email and destroy all copies of the > >>> original message. > >>> ________________________________ > >>> > >>> > >>> ________________________________ > >>> The information contained in this transmission may contain privileged > and > >>> confidential information. It is intended only for the use of the > person(s) > >>> named above. If you are not the intended recipient, you are hereby > notified > >>> that any review, dissemination, distribution or duplication of this > >>> communication is strictly prohibited. If you are not the intended > recipient, > >>> please contact the sender by reply email and destroy all copies of the > >>> original message. > >>> ________________________________ > >> > >> > >> ________________________________ > >> The information contained in this transmission may contain privileged > and > >> confidential information. It is intended only for the use of the > person(s) > >> named above. If you are not the intended recipient, you are hereby > notified > >> that any review, dissemination, distribution or duplication of this > >> communication is strictly prohibited. If you are not the intended > recipient, > >> please contact the sender by reply email and destroy all copies of the > >> original message. > >> ________________________________ > > > > > > > > -- > > Sent from Gmail Mobile > > ________________________________ > > The information contained in this transmission may contain privileged and > > confidential information. It is intended only for the use of the > person(s) > > named above. If you are not the intended recipient, you are hereby > notified > > that any review, dissemination, distribution or duplication of this > > communication is strictly prohibited. If you are not the intended > recipient, > > please contact the sender by reply email and destroy all copies of the > > original message. > > ________________________________ >