Re: Clustering Questions

2018-04-18 Thread Pierre Villard
Hi Jon,

Just as a note for your unrelated question:
I opened NIFI-4026 few months ago but didn't have time to work on it so far.

[1] https://issues.apache.org/jira/browse/NIFI-4026



2018-04-17 20:34 GMT+02:00 Jon Logan :

> Thanks Joe, just a few follow-up questions:
>
> re:durability -- is this something that people have just been accepting as
> a risk and hoping for the best? Or is this something people build their
> applications around -- ie. using durability outside of the Nifi system
> boundary and push it into a database, etc?
>
> re:heterogenous -- you can join nodes of differing hardware specs, but it
> seems like you will end up causing your lighter-weight nodes to explode as
> there's no way to configure how many tasks and how much to have processing
> "in-flight" on the node different than the other nodes? ie. if I know my
> large nodes can handle 3 of a cpu-intensive task, that's going to cause
> issues for smaller nodes. This is an even bigger problem for differing
> memory sizes.
>
> And an unrelated question to the previous -- is there a way to skew or
> influence how a RPG distributes its tasks? Say, you wanted to do a group-by
> type distribution?
>
>
> Thanks again!
> Jon
>
>
> On Fri, Apr 13, 2018 at 2:17 PM, Joe Witt  wrote:
>
>> Jon,
>>
>> Node Failure:
>>  You have to care about two things generally speaking.  First is the
>> flow execution and second is data in-flight
>>  For flow execution nifi clustering will take care of re-assigning the
>> primary node and cluster coordinator as needed.
>>  For data we do not at present offer distributed data durability.  The
>> current model is predicated on using reliable storage such as RAID,
>> EBS, etc..
>>   There is a very clear and awesome looking K8S based path though that
>> will make this work really nicely with persistent volumes and elastic
>> scaling.  No clear timeline but discussions/JIRA/contributions i hope
>> to start or participate in soon.
>>
>> How scalable is the NiFi scaling model:
>>   Usually NiFi clusters are a few nodes to maybe 10-20 or so.  Some
>> have been larger but generally if you're needing that much flow
>> management then often it makes more sense to have clusters dedicated
>> along various domains of expertise anyway.  So say 3-10 nodes with
>> each handling 100,000 events per second around say 100MB per second
>> (conservatively) and you can see why a single fairly small cluster can
>> handle pretty massive volumes.
>>
>> RPGs feeding back:
>> - This caused issues previously but I believe in recent releases has
>> improved significantly.
>>
>> UI Actions Causing issues:
>> There have been reports similar to this especially for some of the
>> really massive flows we've seen in terms of number of components and
>> concurrent users.  These JIRAs when sorted will help a lot [1], [2],
>> [3].
>>
>> Heterogenous cluster nodes:
>> - This should work quite well actually and is a major reason why NiFi
>> and the S2S protocol supports/honors backpressure.  Nodes that can
>> take on more work take on more work and nodes that cannot pushback.
>> You also want to ensure you're using good and scalable protocols to
>> source data into the cluster.  If you find you're using a lot of
>> protocols requiring you to make many data sourcing steps run 'primary
>> node only' then that will require that primary node to do more work
>> than others and I have seen uneven behavior in such cases.  Yes, you
>> can then route using S2S/RPG which we recommend but still...try to
>> design away from 'primary node only' when possible.
>>
>>
>> Thanks
>> Joe
>>
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-950
>> [2] https://issues.apache.org/jira/browse/NIFI-5064
>> [3] https://issues.apache.org/jira/browse/NIFI-5066
>>
>> On Fri, Apr 13, 2018 at 5:49 PM, Jon Logan  wrote:
>> > All, I had a few general questions regarding Clustering, and was
>> looking for
>> > any sort of advice or best-practices information --
>> >
>> > - documentation discusses failure handling primarily from a NiFi crash
>> > scenario, but I don't recall seeing any information on entire
>> node-failure
>> > scenarios. Is there a way that this is supposed to be handled?
>> > - at what point should we expect pain in scaling? I am particularly
>> > concerned about the all-to-all relationship that seems to exist if you
>> > connect a cluster RPG to itself, as all nodes need to distribute all
>> data to
>> > all other nodes. We have been also been having some issues when things
>> are
>> > not as responsive as NiFi would like -- namely, the UI seems to get very
>> > upset and crash
>> > - do UI actions (incl read-only) require delegation to all nodes
>> underneath?
>> > I suspect this is the case as otherwise you wouldn't be able to
>> determine
>> > queue sizes?
>> > - is there a way to have a cluster with heterogeneous node sizes?
>> >
>> >
>> > Thanks in advance!
>>
>
>


Re: Clustering Questions

2018-04-17 Thread Jon Logan
Thanks Joe, just a few follow-up questions:

re:durability -- is this something that people have just been accepting as
a risk and hoping for the best? Or is this something people build their
applications around -- ie. using durability outside of the Nifi system
boundary and push it into a database, etc?

re:heterogenous -- you can join nodes of differing hardware specs, but it
seems like you will end up causing your lighter-weight nodes to explode as
there's no way to configure how many tasks and how much to have processing
"in-flight" on the node different than the other nodes? ie. if I know my
large nodes can handle 3 of a cpu-intensive task, that's going to cause
issues for smaller nodes. This is an even bigger problem for differing
memory sizes.

And an unrelated question to the previous -- is there a way to skew or
influence how a RPG distributes its tasks? Say, you wanted to do a group-by
type distribution?


Thanks again!
Jon


On Fri, Apr 13, 2018 at 2:17 PM, Joe Witt  wrote:

> Jon,
>
> Node Failure:
>  You have to care about two things generally speaking.  First is the
> flow execution and second is data in-flight
>  For flow execution nifi clustering will take care of re-assigning the
> primary node and cluster coordinator as needed.
>  For data we do not at present offer distributed data durability.  The
> current model is predicated on using reliable storage such as RAID,
> EBS, etc..
>   There is a very clear and awesome looking K8S based path though that
> will make this work really nicely with persistent volumes and elastic
> scaling.  No clear timeline but discussions/JIRA/contributions i hope
> to start or participate in soon.
>
> How scalable is the NiFi scaling model:
>   Usually NiFi clusters are a few nodes to maybe 10-20 or so.  Some
> have been larger but generally if you're needing that much flow
> management then often it makes more sense to have clusters dedicated
> along various domains of expertise anyway.  So say 3-10 nodes with
> each handling 100,000 events per second around say 100MB per second
> (conservatively) and you can see why a single fairly small cluster can
> handle pretty massive volumes.
>
> RPGs feeding back:
> - This caused issues previously but I believe in recent releases has
> improved significantly.
>
> UI Actions Causing issues:
> There have been reports similar to this especially for some of the
> really massive flows we've seen in terms of number of components and
> concurrent users.  These JIRAs when sorted will help a lot [1], [2],
> [3].
>
> Heterogenous cluster nodes:
> - This should work quite well actually and is a major reason why NiFi
> and the S2S protocol supports/honors backpressure.  Nodes that can
> take on more work take on more work and nodes that cannot pushback.
> You also want to ensure you're using good and scalable protocols to
> source data into the cluster.  If you find you're using a lot of
> protocols requiring you to make many data sourcing steps run 'primary
> node only' then that will require that primary node to do more work
> than others and I have seen uneven behavior in such cases.  Yes, you
> can then route using S2S/RPG which we recommend but still...try to
> design away from 'primary node only' when possible.
>
>
> Thanks
> Joe
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-950
> [2] https://issues.apache.org/jira/browse/NIFI-5064
> [3] https://issues.apache.org/jira/browse/NIFI-5066
>
> On Fri, Apr 13, 2018 at 5:49 PM, Jon Logan  wrote:
> > All, I had a few general questions regarding Clustering, and was looking
> for
> > any sort of advice or best-practices information --
> >
> > - documentation discusses failure handling primarily from a NiFi crash
> > scenario, but I don't recall seeing any information on entire
> node-failure
> > scenarios. Is there a way that this is supposed to be handled?
> > - at what point should we expect pain in scaling? I am particularly
> > concerned about the all-to-all relationship that seems to exist if you
> > connect a cluster RPG to itself, as all nodes need to distribute all
> data to
> > all other nodes. We have been also been having some issues when things
> are
> > not as responsive as NiFi would like -- namely, the UI seems to get very
> > upset and crash
> > - do UI actions (incl read-only) require delegation to all nodes
> underneath?
> > I suspect this is the case as otherwise you wouldn't be able to determine
> > queue sizes?
> > - is there a way to have a cluster with heterogeneous node sizes?
> >
> >
> > Thanks in advance!
>


Re: Clustering Questions

2018-04-13 Thread Joe Witt
Jon,

Node Failure:
 You have to care about two things generally speaking.  First is the
flow execution and second is data in-flight
 For flow execution nifi clustering will take care of re-assigning the
primary node and cluster coordinator as needed.
 For data we do not at present offer distributed data durability.  The
current model is predicated on using reliable storage such as RAID,
EBS, etc..
  There is a very clear and awesome looking K8S based path though that
will make this work really nicely with persistent volumes and elastic
scaling.  No clear timeline but discussions/JIRA/contributions i hope
to start or participate in soon.

How scalable is the NiFi scaling model:
  Usually NiFi clusters are a few nodes to maybe 10-20 or so.  Some
have been larger but generally if you're needing that much flow
management then often it makes more sense to have clusters dedicated
along various domains of expertise anyway.  So say 3-10 nodes with
each handling 100,000 events per second around say 100MB per second
(conservatively) and you can see why a single fairly small cluster can
handle pretty massive volumes.

RPGs feeding back:
- This caused issues previously but I believe in recent releases has
improved significantly.

UI Actions Causing issues:
There have been reports similar to this especially for some of the
really massive flows we've seen in terms of number of components and
concurrent users.  These JIRAs when sorted will help a lot [1], [2],
[3].

Heterogenous cluster nodes:
- This should work quite well actually and is a major reason why NiFi
and the S2S protocol supports/honors backpressure.  Nodes that can
take on more work take on more work and nodes that cannot pushback.
You also want to ensure you're using good and scalable protocols to
source data into the cluster.  If you find you're using a lot of
protocols requiring you to make many data sourcing steps run 'primary
node only' then that will require that primary node to do more work
than others and I have seen uneven behavior in such cases.  Yes, you
can then route using S2S/RPG which we recommend but still...try to
design away from 'primary node only' when possible.


Thanks
Joe


[1] https://issues.apache.org/jira/browse/NIFI-950
[2] https://issues.apache.org/jira/browse/NIFI-5064
[3] https://issues.apache.org/jira/browse/NIFI-5066

On Fri, Apr 13, 2018 at 5:49 PM, Jon Logan  wrote:
> All, I had a few general questions regarding Clustering, and was looking for
> any sort of advice or best-practices information --
>
> - documentation discusses failure handling primarily from a NiFi crash
> scenario, but I don't recall seeing any information on entire node-failure
> scenarios. Is there a way that this is supposed to be handled?
> - at what point should we expect pain in scaling? I am particularly
> concerned about the all-to-all relationship that seems to exist if you
> connect a cluster RPG to itself, as all nodes need to distribute all data to
> all other nodes. We have been also been having some issues when things are
> not as responsive as NiFi would like -- namely, the UI seems to get very
> upset and crash
> - do UI actions (incl read-only) require delegation to all nodes underneath?
> I suspect this is the case as otherwise you wouldn't be able to determine
> queue sizes?
> - is there a way to have a cluster with heterogeneous node sizes?
>
>
> Thanks in advance!