Hello, Generally the RPG approach should be better in that it handles load balancing and failover for you.
For testing the RPG, you shouldn't need to use the DistributeLoad processor with the RPG. You should be able to have GenerateFlowFile -> RPG (with URL of any node) and then an Input Port -> some processor. The RPG will figure out all of the nodes in the cluster and send data to all of them. On the RPG, if you go into the remote ports menu, each port has some settings that will control how many flow files get sent per transaction. Generally you will probably get a more even distribution with a smaller batch size, but you will get much better performance with a larger batch size. The RPG also factors in the # of flow files on each node when determining where to send them, so if node 1 is the primary node and has more flow files queued, then the RPG will likely send more flow files to node 2. Hope that helps. -Bryan On Mon, Jun 19, 2017 at 10:38 PM, 진유리 <[email protected]> wrote: > Hi All, > I have some questions about load balancing for clustered NiFi v1.0.0 (2 > nodes) > > > I consider 2 ways. > > 1) RPG way : Remote Processor Group (HTTP/RAW) + InputPort > 2) HTTP way : PostHTTP + ListenHTTP > > > > Could you tell me which one is better and explain why ? > On my test case, HTTP way is faster than RPG way but HTTP way have a > disadvantage to assign unique port number for each ListenHTTP processor. > (Actually, I don't understand why HTTP way is faster than RPG way..) > > > > Moreover, I found some strange things on my workflow. > This in my NiFi workflow to compare performance between PRG way and HTTP way > > > > ---------------------------------------------------------------------------------------------------------------------------------- > GenerateFlowFile(On Primary Node) -> DistributeLoad -> RPG (to node 1) > > -> RPG (to node 2) > -> > DistributeLoad -> PostHTTP (to node 1) > > -> PostHTTP (to node 2) > ---------------------------------------------------------------------------------------------------------------------------------- > InputPort -> PutFile > ListenHTTP-> PutFile > ---------------------------------------------------------------------------------------------------------------------------------- > > First, I got Socket Exception on 'PostHTTP' processor. > (java.net.SocketException: Connection reset, Broken pipe (Write failed)) > I guess it cause that calling this too many times. > > PostHTTP processor shows error mark and logs but RPG does not show anything. > I guess both have same problem because both don't work anymore. > > > > > > And I select 'Round Robin' Strategy all 'DistributeLoad' Processors. > But result of above 2 ways is different for each node. > > - RPG way : One node wrote 2 times more files than another node > - HTTP way : Each node wrote almost same number files > > > Please share your opinions or tips for load balancing. > > Thanks > -Yuri Jin- >
