Thanks Joe, just a few follow-up questions:

re:durability -- is this something that people have just been accepting as
a risk and hoping for the best? Or is this something people build their
applications around -- ie. using durability outside of the Nifi system
boundary and push it into a database, etc?

re:heterogenous -- you can join nodes of differing hardware specs, but it
seems like you will end up causing your lighter-weight nodes to explode as
there's no way to configure how many tasks and how much to have processing
"in-flight" on the node different than the other nodes? ie. if I know my
large nodes can handle 3 of a cpu-intensive task, that's going to cause
issues for smaller nodes. This is an even bigger problem for differing
memory sizes.

And an unrelated question to the previous -- is there a way to skew or
influence how a RPG distributes its tasks? Say, you wanted to do a group-by
type distribution?


Thanks again!
Jon


On Fri, Apr 13, 2018 at 2:17 PM, Joe Witt <[email protected]> wrote:

> Jon,
>
> Node Failure:
>  You have to care about two things generally speaking.  First is the
> flow execution and second is data in-flight
>  For flow execution nifi clustering will take care of re-assigning the
> primary node and cluster coordinator as needed.
>  For data we do not at present offer distributed data durability.  The
> current model is predicated on using reliable storage such as RAID,
> EBS, etc..
>   There is a very clear and awesome looking K8S based path though that
> will make this work really nicely with persistent volumes and elastic
> scaling.  No clear timeline but discussions/JIRA/contributions i hope
> to start or participate in soon.
>
> How scalable is the NiFi scaling model:
>   Usually NiFi clusters are a few nodes to maybe 10-20 or so.  Some
> have been larger but generally if you're needing that much flow
> management then often it makes more sense to have clusters dedicated
> along various domains of expertise anyway.  So say 3-10 nodes with
> each handling 100,000 events per second around say 100MB per second
> (conservatively) and you can see why a single fairly small cluster can
> handle pretty massive volumes.
>
> RPGs feeding back:
> - This caused issues previously but I believe in recent releases has
> improved significantly.
>
> UI Actions Causing issues:
> There have been reports similar to this especially for some of the
> really massive flows we've seen in terms of number of components and
> concurrent users.  These JIRAs when sorted will help a lot [1], [2],
> [3].
>
> Heterogenous cluster nodes:
> - This should work quite well actually and is a major reason why NiFi
> and the S2S protocol supports/honors backpressure.  Nodes that can
> take on more work take on more work and nodes that cannot pushback.
> You also want to ensure you're using good and scalable protocols to
> source data into the cluster.  If you find you're using a lot of
> protocols requiring you to make many data sourcing steps run 'primary
> node only' then that will require that primary node to do more work
> than others and I have seen uneven behavior in such cases.  Yes, you
> can then route using S2S/RPG which we recommend but still...try to
> design away from 'primary node only' when possible.
>
>
> Thanks
> Joe
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-950
> [2] https://issues.apache.org/jira/browse/NIFI-5064
> [3] https://issues.apache.org/jira/browse/NIFI-5066
>
> On Fri, Apr 13, 2018 at 5:49 PM, Jon Logan <[email protected]> wrote:
> > All, I had a few general questions regarding Clustering, and was looking
> for
> > any sort of advice or best-practices information --
> >
> > - documentation discusses failure handling primarily from a NiFi crash
> > scenario, but I don't recall seeing any information on entire
> node-failure
> > scenarios. Is there a way that this is supposed to be handled?
> > - at what point should we expect pain in scaling? I am particularly
> > concerned about the all-to-all relationship that seems to exist if you
> > connect a cluster RPG to itself, as all nodes need to distribute all
> data to
> > all other nodes. We have been also been having some issues when things
> are
> > not as responsive as NiFi would like -- namely, the UI seems to get very
> > upset and crash
> > - do UI actions (incl read-only) require delegation to all nodes
> underneath?
> > I suspect this is the case as otherwise you wouldn't be able to determine
> > queue sizes?
> > - is there a way to have a cluster with heterogeneous node sizes?
> >
> >
> > Thanks in advance!
>

Reply via email to