Re: NIFI /Kafka - Data loss possibility with node failures
Thanks Joe. This is really helpful. On Tue, Aug 11, 2020 at 9:33 AM Joe Witt wrote: > Asmath > > In a traditional installation, regardless of how a NiFi cluster obtains > data (kafka, ftp, HTTP calls, TCP listening, etc, ) once it is > responsible for the data it has ack'd its receipt to the source(s). > > If that NiFi node were to become offline the data it owns is delayed. If > that node becomes unrecoverably offline the data is likely to be lost. > > If you're going to run in environments where there are more powerful > storage alignment options like in many Kubernetes based deployments then > there are definitely options to solve the possibility of loss case to a > very high degree and to ensure there is only minimal data delay in the > worst case. > > In a Hadoop style environment though the traditional model I describe > works very well, leverages appropriate RAID, and is proven highly reliable > and durable. > > Thanks > > On Tue, Aug 11, 2020 at 7:26 AM KhajaAsmath Mohammed < > mdkhajaasm...@gmail.com> wrote: > >> Hi, >> >> [image: image.png] >> >> we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3 >> were disconnected when the flow was running . Consume kafka was reading >> data from all node settings and loading the data into the database. >> >> In the above scenario, is there a possibility of loss of data? >> Distributed processing in terms of hadoop will handle it automatically and >> assign the task to other active nodes. Will it be the same case with the >> NIFI cluster? >> >> Thanks, >> Asmath >> >
Re: NIFI /Kafka - Data loss possibility with node failures
Asmath In a traditional installation, regardless of how a NiFi cluster obtains data (kafka, ftp, HTTP calls, TCP listening, etc, ) once it is responsible for the data it has ack'd its receipt to the source(s). If that NiFi node were to become offline the data it owns is delayed. If that node becomes unrecoverably offline the data is likely to be lost. If you're going to run in environments where there are more powerful storage alignment options like in many Kubernetes based deployments then there are definitely options to solve the possibility of loss case to a very high degree and to ensure there is only minimal data delay in the worst case. In a Hadoop style environment though the traditional model I describe works very well, leverages appropriate RAID, and is proven highly reliable and durable. Thanks On Tue, Aug 11, 2020 at 7:26 AM KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > [image: image.png] > > we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3 > were disconnected when the flow was running . Consume kafka was reading > data from all node settings and loading the data into the database. > > In the above scenario, is there a possibility of loss of data? > Distributed processing in terms of hadoop will handle it automatically and > assign the task to other active nodes. Will it be the same case with the > NIFI cluster? > > Thanks, > Asmath >
NIFI /Kafka - Data loss possibility with node failures
Hi, [image: image.png] we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3 were disconnected when the flow was running . Consume kafka was reading data from all node settings and loading the data into the database. In the above scenario, is there a possibility of loss of data? Distributed processing in terms of hadoop will handle it automatically and assign the task to other active nodes. Will it be the same case with the NIFI cluster? Thanks, Asmath