Re: DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes

2022-10-18 Thread Mark Payne
I wouldn’t say that it should never be used in production. In fact, it’s used 
quite heavily in production. But it does have some limitations. Where those 
limitations are acceptable, it’s still very reasonable to use. Redis offers a 
lot on top, but it comes with complexity also, having to manage another 
service, etc.

When DIstriibutedMapCacheServer is used in a cluster, it is run on all nodes. 
But you need to point to just a single node. It can be any node in the cluster. 
But the client on every node should point to the same node in your cluster 
(nifi-0, for instance), not localhost. In this way, Primary Node doesn’t 
matter. Primary Node can switch 1,000 times and you’ll still be pointing at the 
same node that has all of the data. But it does mean that the service has a 
very real limitation, in that it’s a single point of failure. If that 
particular node goes down, the DistributedMapCache clients won’t be able to 
communicate with it until that node comes back up. If that limitation is okay 
for you, then by all means you can use it in production. But if you need 
something that provides High Availability, most people turn to Redis.

Thanks
-Mark


> On Oct 15, 2022, at 11:19 AM, Jörg Hammerbacher  
> wrote:
> 
> Hi Chris,
> 
> thanks a lot for you information.
> 
> I was not aware that "DistributedMapCacheServer" should not be used in 
> production. Maybe a short hint in the Controller Service Documentation would 
> be helpful also for other users.
> 
> Pointing to RedisDistributedMapCacheClientService lead us to the decision 
> using Redis in the future for distributed caching data (used it he first time 
> now). What type of Redis persistence type to be used (RDB and/or AOF) would 
> be important to handle data loss vs. performance.
> 
> In general i would like to say thank you to all the people who constantly 
> develop the NiFi ecosystem! Well done.
> 
> regards,
> Jörg
> 
> 
> On 2022/10/14 16:22:14 Chris Sampson wrote:
> > The DistributedMapCacheServer is, I believe, meant as a reference
> > implementation of the service to be used as an example rather than in
> > production. The kind of scenario you describe is exactly the reason to not
> > use this in-memory (optionally locally persisted on disk) in a clustered
> > production environment.
> >
> > That said, it can be used if the use case of the Flow doesn't have problems
> > if a node goes offline, etc.
> >
> > The recommended approach is to use an external service such as Redis with
> > the RedisDistributedMapCacheClientService [1]. This can interface with your
> > external Redis cluster/instance using the same API. Other external services
> > can be used, see the selection of related Controller Services in the nifi
> > docs [2] (e.g. search for "cache").
> >
> > [1]:
> > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-redis-nar/1.18.0/org.apache.nifi.redis.service.RedisDistributedMapCacheClientService/index.html
> >
> > [2]: https://nifi.apache.org/docs/nifi-docs
> >
> > On Fri, 14 Oct 2022, 17:13 Jörg Hammerbacher,  
> > wrote:
> >
> > > Hi,
> > >
> > >
> > > I have one thing where i am looking for a solution. Maybe someone can
> > > help me out or give me hint how to do.
> > >
> > >
> > > Problem:
> > >
> > > I often use a NiFi Clusters with "DistributedMapCacheClientService"
> > > which uses a "DistributedMapCacheServer" for cluster wide key/value
> > > storage. Per default the DMCS uses "in memory" and sockets for
> > > synchronization. We use a persistence directory to make the data
> > > persistent and to avoid that the data is gone after restarting the
> > > entire cluster. But in the case, if the primary node changes, i think
> > > the data will be outdated or used from a potential outdated other node.
> > > If this other Node takes the primary node role, old data will be used
> > > for next FecthDistrubutedMapCache. The latest updates over the old
> > > primar node are gone.
> > >
> > > Is there a service using e.g. zookeeper "int the backgroud" to get a
> > > real distributed persitent Cache - even after restarting the entire
> > > cluster / all nodes?
> > >
> > >
> > > I know, the standard cache is able to provide a hugh frequent
> > > read/update servise if the data is in memory. But if we need just one or
> > > max a few updates per minute ...
> > >
> > > Yes, using another system like a Database (as persistent singleton) can
> > > be a solution - a not really matching solution. Why is there no standard
> > > service in NiFi for this? Isn't it a good idea or i am the only one with
> > > this problem in the past?
> > >
> > >
> > > Thanks in advance for answers,
> > >
> > > Jörg (Hammerbacher)
> > >
> > >
> > >
> >
> 
> -- 
> mit freundlichen Grüßen,
> Jörg Hammerbacher
> http://www.hammerbacher-it.de
> j...@hammerbacher-it.de
> 



RE: Re: DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes

2022-10-15 Thread Jörg Hammerbacher

Hi Chris,

thanks a lot for you information.

I was not aware that "DistributedMapCacheServer" should not be used in 
production. Maybe a short hint in the Controller Service Documentation 
would be helpful also for other users.


Pointing to RedisDistributedMapCacheClientService lead us to the 
decision using Redis in the future for distributed caching data (used it 
he first time now). What type of Redis persistence type to be used (RDB 
and/or AOF) would be important to handle data loss vs. performance.


In general i would like to say thank you to all the people who 
constantly develop the NiFi ecosystem! Well done.


regards,
Jörg


On 2022/10/14 16:22:14 Chris Sampson wrote:
> The DistributedMapCacheServer is, I believe, meant as a reference
> implementation of the service to be used as an example rather than in
> production. The kind of scenario you describe is exactly the reason 
to not

> use this in-memory (optionally locally persisted on disk) in a clustered
> production environment.
>
> That said, it can be used if the use case of the Flow doesn't have 
problems

> if a node goes offline, etc.
>
> The recommended approach is to use an external service such as Redis with
> the RedisDistributedMapCacheClientService [1]. This can interface 
with your
> external Redis cluster/instance using the same API. Other external 
services

> can be used, see the selection of related Controller Services in the nifi
> docs [2] (e.g. search for "cache").
>
> [1]:
> 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-redis-nar/1.18.0/org.apache.nifi.redis.service.RedisDistributedMapCacheClientService/index.html

>
> [2]: https://nifi.apache.org/docs/nifi-docs
>
> On Fri, 14 Oct 2022, 17:13 Jörg Hammerbacher, 
 wrote:

>
> > Hi,
> >
> >
> > I have one thing where i am looking for a solution. Maybe someone can
> > help me out or give me hint how to do.
> >
> >
> > Problem:
> >
> > I often use a NiFi Clusters with "DistributedMapCacheClientService"
> > which uses a "DistributedMapCacheServer" for cluster wide key/value
> > storage. Per default the DMCS uses "in memory" and sockets for
> > synchronization. We use a persistence directory to make the data
> > persistent and to avoid that the data is gone after restarting the
> > entire cluster. But in the case, if the primary node changes, i think
> > the data will be outdated or used from a potential outdated other node.
> > If this other Node takes the primary node role, old data will be used
> > for next FecthDistrubutedMapCache. The latest updates over the old
> > primar node are gone.
> >
> > Is there a service using e.g. zookeeper "int the backgroud" to get a
> > real distributed persitent Cache - even after restarting the entire
> > cluster / all nodes?
> >
> >
> > I know, the standard cache is able to provide a hugh frequent
> > read/update servise if the data is in memory. But if we need just 
one or

> > max a few updates per minute ...
> >
> > Yes, using another system like a Database (as persistent singleton) can
> > be a solution - a not really matching solution. Why is there no 
standard
> > service in NiFi for this? Isn't it a good idea or i am the only one 
with

> > this problem in the past?
> >
> >
> > Thanks in advance for answers,
> >
> > Jörg (Hammerbacher)
> >
> >
> >
>

--
mit freundlichen Grüßen,
Jörg Hammerbacher
http://www.hammerbacher-it.de
j...@hammerbacher-it.de



Re: DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes

2022-10-14 Thread Chris Sampson
The DistributedMapCacheServer is, I believe, meant as a reference
implementation of the service to be used as an example rather than in
production. The kind of scenario you describe is exactly the reason to not
use this in-memory (optionally locally persisted on disk) in a clustered
production environment.

That said, it can be used if the use case of the Flow doesn't have problems
if a node goes offline, etc.

The recommended approach is to use an external service such as Redis with
the RedisDistributedMapCacheClientService [1]. This can interface with your
external Redis cluster/instance using the same API. Other external services
can be used, see the selection of related Controller Services in the nifi
docs [2] (e.g. search for "cache").

[1]:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-redis-nar/1.18.0/org.apache.nifi.redis.service.RedisDistributedMapCacheClientService/index.html

[2]: https://nifi.apache.org/docs/nifi-docs

On Fri, 14 Oct 2022, 17:13 Jörg Hammerbacher,  wrote:

> Hi,
>
>
> I have one thing where i am looking for a solution. Maybe someone can
> help me out or give me hint how to do.
>
>
> Problem:
>
> I often use a NiFi Clusters with "DistributedMapCacheClientService"
> which uses a "DistributedMapCacheServer" for cluster wide key/value
> storage. Per default the DMCS uses "in memory" and sockets for
> synchronization. We use a persistence directory to make the data
> persistent and to avoid that the data is gone after restarting the
> entire cluster. But in the case, if the primary node changes, i think
> the data will be outdated or used from a potential outdated other node.
> If this other Node takes the primary node role, old data will be used
> for next FecthDistrubutedMapCache. The latest updates over the old
> primar node are gone.
>
> Is there a service using e.g. zookeeper "int the backgroud" to get a
> real distributed persitent Cache - even after restarting the entire
> cluster / all nodes?
>
>
> I know, the standard cache is able to provide a hugh frequent
> read/update servise if the data is in memory. But if we need just one or
> max a few updates per minute ...
>
> Yes, using another system like a Database (as persistent singleton) can
> be a solution - a not really matching solution. Why is there no standard
> service in NiFi for this? Isn't it a good idea or i am the only one with
> this problem in the past?
>
>
> Thanks in advance for answers,
>
> Jörg (Hammerbacher)
>
>
>


DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes

2022-10-14 Thread Jörg Hammerbacher

Hi,


I have one thing where i am looking for a solution. Maybe someone can 
help me out or give me hint how to do.



Problem:

I often use a NiFi Clusters with "DistributedMapCacheClientService" 
which uses a "DistributedMapCacheServer" for cluster wide key/value 
storage. Per default the DMCS uses "in memory" and sockets for 
synchronization. We use a persistence directory to make the data 
persistent and to avoid that the data is gone after restarting the 
entire cluster. But in the case, if the primary node changes, i think 
the data will be outdated or used from a potential outdated other node. 
If this other Node takes the primary node role, old data will be used 
for next FecthDistrubutedMapCache. The latest updates over the old 
primar node are gone.


Is there a service using e.g. zookeeper "int the backgroud" to get a 
real distributed persitent Cache - even after restarting the entire 
cluster / all nodes?



I know, the standard cache is able to provide a hugh frequent 
read/update servise if the data is in memory. But if we need just one or 
max a few updates per minute ...


Yes, using another system like a Database (as persistent singleton) can 
be a solution - a not really matching solution. Why is there no standard 
service in NiFi for this? Isn't it a good idea or i am the only one with 
this problem in the past?



Thanks in advance for answers,

Jörg (Hammerbacher)