Thanks Joe, just a few follow-up questions: re:durability -- is this something that people have just been accepting as a risk and hoping for the best? Or is this something people build their applications around -- ie. using durability outside of the Nifi system boundary and push it into a database, etc?
re:heterogenous -- you can join nodes of differing hardware specs, but it seems like you will end up causing your lighter-weight nodes to explode as there's no way to configure how many tasks and how much to have processing "in-flight" on the node different than the other nodes? ie. if I know my large nodes can handle 3 of a cpu-intensive task, that's going to cause issues for smaller nodes. This is an even bigger problem for differing memory sizes. And an unrelated question to the previous -- is there a way to skew or influence how a RPG distributes its tasks? Say, you wanted to do a group-by type distribution? Thanks again! Jon On Fri, Apr 13, 2018 at 2:17 PM, Joe Witt <[email protected]> wrote: > Jon, > > Node Failure: > You have to care about two things generally speaking. First is the > flow execution and second is data in-flight > For flow execution nifi clustering will take care of re-assigning the > primary node and cluster coordinator as needed. > For data we do not at present offer distributed data durability. The > current model is predicated on using reliable storage such as RAID, > EBS, etc.. > There is a very clear and awesome looking K8S based path though that > will make this work really nicely with persistent volumes and elastic > scaling. No clear timeline but discussions/JIRA/contributions i hope > to start or participate in soon. > > How scalable is the NiFi scaling model: > Usually NiFi clusters are a few nodes to maybe 10-20 or so. Some > have been larger but generally if you're needing that much flow > management then often it makes more sense to have clusters dedicated > along various domains of expertise anyway. So say 3-10 nodes with > each handling 100,000 events per second around say 100MB per second > (conservatively) and you can see why a single fairly small cluster can > handle pretty massive volumes. > > RPGs feeding back: > - This caused issues previously but I believe in recent releases has > improved significantly. > > UI Actions Causing issues: > There have been reports similar to this especially for some of the > really massive flows we've seen in terms of number of components and > concurrent users. These JIRAs when sorted will help a lot [1], [2], > [3]. > > Heterogenous cluster nodes: > - This should work quite well actually and is a major reason why NiFi > and the S2S protocol supports/honors backpressure. Nodes that can > take on more work take on more work and nodes that cannot pushback. > You also want to ensure you're using good and scalable protocols to > source data into the cluster. If you find you're using a lot of > protocols requiring you to make many data sourcing steps run 'primary > node only' then that will require that primary node to do more work > than others and I have seen uneven behavior in such cases. Yes, you > can then route using S2S/RPG which we recommend but still...try to > design away from 'primary node only' when possible. > > > Thanks > Joe > > > [1] https://issues.apache.org/jira/browse/NIFI-950 > [2] https://issues.apache.org/jira/browse/NIFI-5064 > [3] https://issues.apache.org/jira/browse/NIFI-5066 > > On Fri, Apr 13, 2018 at 5:49 PM, Jon Logan <[email protected]> wrote: > > All, I had a few general questions regarding Clustering, and was looking > for > > any sort of advice or best-practices information -- > > > > - documentation discusses failure handling primarily from a NiFi crash > > scenario, but I don't recall seeing any information on entire > node-failure > > scenarios. Is there a way that this is supposed to be handled? > > - at what point should we expect pain in scaling? I am particularly > > concerned about the all-to-all relationship that seems to exist if you > > connect a cluster RPG to itself, as all nodes need to distribute all > data to > > all other nodes. We have been also been having some issues when things > are > > not as responsive as NiFi would like -- namely, the UI seems to get very > > upset and crash > > - do UI actions (incl read-only) require delegation to all nodes > underneath? > > I suspect this is the case as otherwise you wouldn't be able to determine > > queue sizes? > > - is there a way to have a cluster with heterogeneous node sizes? > > > > > > Thanks in advance! >
