Re: NIFI /Kafka - Data loss possibility with node failures

2020-08-11 Thread KhajaAsmath Mohammed
Thanks Joe. This is really helpful.

On Tue, Aug 11, 2020 at 9:33 AM Joe Witt  wrote:

> Asmath
>
> In a traditional installation, regardless of how a NiFi cluster obtains
> data (kafka, ftp, HTTP calls, TCP listening, etc, ) once it is
> responsible for the data it has ack'd its receipt to the source(s).
>
> If that NiFi node were to become offline the data it owns is delayed. If
> that node becomes unrecoverably offline the data is likely to be lost.
>
> If you're going to run in environments where there are more powerful
> storage alignment options like in many Kubernetes based deployments then
> there are definitely options to solve the possibility of loss case to a
> very high degree and to ensure there is only minimal data delay in the
> worst case.
>
> In a Hadoop style environment though the traditional model I describe
> works very well, leverages appropriate RAID, and is proven highly reliable
> and durable.
>
> Thanks
>
> On Tue, Aug 11, 2020 at 7:26 AM KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com> wrote:
>
>> Hi,
>>
>> [image: image.png]
>>
>> we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3
>> were disconnected when the flow was running . Consume kafka was reading
>> data from all node settings and loading the data into the database.
>>
>> In the above scenario, is there a possibility of loss of data?
>> Distributed processing in terms of hadoop will handle it automatically and
>> assign the task to other active nodes. Will it be the same case with the
>> NIFI cluster?
>>
>> Thanks,
>> Asmath
>>
>


Re: NIFI /Kafka - Data loss possibility with node failures

2020-08-11 Thread Joe Witt
Asmath

In a traditional installation, regardless of how a NiFi cluster obtains
data (kafka, ftp, HTTP calls, TCP listening, etc, ) once it is
responsible for the data it has ack'd its receipt to the source(s).

If that NiFi node were to become offline the data it owns is delayed. If
that node becomes unrecoverably offline the data is likely to be lost.

If you're going to run in environments where there are more powerful
storage alignment options like in many Kubernetes based deployments then
there are definitely options to solve the possibility of loss case to a
very high degree and to ensure there is only minimal data delay in the
worst case.

In a Hadoop style environment though the traditional model I describe works
very well, leverages appropriate RAID, and is proven highly reliable and
durable.

Thanks

On Tue, Aug 11, 2020 at 7:26 AM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi,
>
> [image: image.png]
>
> we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3
> were disconnected when the flow was running . Consume kafka was reading
> data from all node settings and loading the data into the database.
>
> In the above scenario, is there a possibility of loss of data?
> Distributed processing in terms of hadoop will handle it automatically and
> assign the task to other active nodes. Will it be the same case with the
> NIFI cluster?
>
> Thanks,
> Asmath
>


NIFI /Kafka - Data loss possibility with node failures

2020-08-11 Thread KhajaAsmath Mohammed
Hi,

[image: image.png]

we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3 were
disconnected when the flow was running . Consume kafka was reading data
from all node settings and loading the data into the database.

In the above scenario, is there a possibility of loss of data?  Distributed
processing in terms of hadoop will handle it automatically and assign the
task to other active nodes. Will it be the same case with the NIFI cluster?

Thanks,
Asmath