Re: Load-balancing web api in cluster

Hart, Greg Tue, 20 Dec 2016 11:19:10 -0800

Hi Jeff,

I saw this and looked into it. The data in those nodes are the 
nifi.cluster.node.address and nifi.cluster.node.protocol.port values. In order 
to get the nifi.web.http.host and nifi.web.http.port values, it seems I would 
have to connect first using the cluster node protocol and pretend to be a NiFi 
node so that I can query the cluster coordinator for the list of NodeIdentifier 
objects. Is this cluster node protocol stable enough to use in a production 
application? It doesn’t seem to be documented anywhere so I was assuming it may 
change in a minor release without much notice.

Thanks!
-Greg

From: Jeff <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, December 20, 2016 at 11:10 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Load-balancing web api in cluster

Greg,

That first statement in my previous email should read "which nodes can be the 
primary or cluster coordinator".  I apologize for any confusion!

- Jeff

On Tue, Dec 20, 2016 at 2:04 PM Jeff 
<[email protected]<mailto:[email protected]>> wrote:
Greg,

NiFi does store which nodes are the primary and coordinator.  Relevant nodes in 
ZK are (for instance, in a cluster I'm running locally):
/nifi/leaders/Primary 
Node/_c_c94f1eb8-e5ac-443c-9643-2668b6f685b2-lock-0000000553,
/nifi/leaders/Primary 
Node/_c_7cd14bd5-85f5-4ea9-b849-121496269ef4-lock-0000000554,
/nifi/leaders/Primary 
Node/_c_99b79311-495f-4619-b316-9e842d445a8d-lock-0000000552,
/nifi/leaders/Cluster 
Coordinator/_c_dc449a75-1a14-42d6-98ab-2cef3e74d616-lock-0000005967,
/nifi/leaders/Cluster 
Coordinator/_c_2fbc68df-c9cd-4ecd-99d2-234b7b801110-lock-0000005966,
/nifi/leaders/Cluster 
Coordinator/_c_a2b9c2be-c0fd-4bf7-a479-e011a7792fc3-lock-0000005968

The data on each of these nodes should have the host:port.  These are the 
candidate nodes for being elected the Primary or Cluster Coordinator.  I don't 
think that the current active Primary and Cluster Coordinator is stored in ZK, 
just the nodes that are candidates to fulfill those roles.  I'll have to get 
back to you on that for sure, though.

- Jeff

On Tue, Dec 20, 2016 at 1:45 PM Hart, Greg 
<[email protected]<mailto:[email protected]>> wrote:
Hi Jeff,

My application communicates with the NiFi REST API to import templates, 
instantiate flows from templates, edit processor properties, and a few other 
things. I’m currently using Jersey to send calls to one NiFi node but if that 
node goes down then my application has to be manually reconfigured with the 
hostname and port of another NiFi node. HAProxy would handle failover but it 
still must be manually reconfigured when a NiFi node is added or removed from 
the cluster.

I was hoping that NiFi would use ZooKeeper similarly to other applications 
(Hive or HBase) where a client can easily get the hostname and port of the 
cluster coordinator (or active master). Unfortunately, the information in 
ZooKeeper does not include the value of nifi.rest.http.host and 
nifi.rest.http.port of any NiFi nodes.

It sounds like HAProxy might be the better solution for now. Luckily, adding or 
removing nodes from a cluster shouldn’t be a daily occurrence. If you have any 
other ideas please let me know.

Thanks!
-Greg

From: Jeff <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, December 20, 2016 at 8:56 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Load-balancing web api in cluster

Hello Greg,

You can use the REST API on any of the nodes in the cluster.  Could you provide 
more details on what you're trying to accomplish?  If, for instance, you are 
posting data to a ListenHTTP processor and you want to balance POSTs across the 
instances of ListenHTTP on your cluster, then haproxy would probably be a good 
idea.  If you're trying to distribute the processing load once the data is 
received, you can use a Remote Process Group to distribute the data across the 
cluster.  Pierre Villard has written a nice blog about setting up a cluster and 
configuring a flow using a Remote Process Group to distribute the processing 
load [1].  It details creating a Remote Process Group to send data back to an 
Input Port in the same NiFi cluster, and allows NiFi to distribute the 
processing load across all the nodes in your cluster.

You can use a combination of haproxy and Remote Process Group to load balance 
connections to the REST API on each NiFi node and to balance the processing 
load across the cluster.

[1] https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/

- Jeff

On Mon, Dec 19, 2016 at 9:25 PM Hart, Greg 
<[email protected]<mailto:[email protected]>> wrote:
Hi all,

What¹s the recommended way for communicating with the NiFi REST API in a
cluster? I see that NiFi uses ZooKeeper so is it possible to get the
Cluster Coordinator hostname and API port from ZooKeeper, or should I use
something like haproxy?

Thanks!
-Greg

Re: Load-balancing web api in cluster

Reply via email to