Re: DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes
I wouldn’t say that it should never be used in production. In fact, it’s used quite heavily in production. But it does have some limitations. Where those limitations are acceptable, it’s still very reasonable to use. Redis offers a lot on top, but it comes with complexity also, having to manage another service, etc. When DIstriibutedMapCacheServer is used in a cluster, it is run on all nodes. But you need to point to just a single node. It can be any node in the cluster. But the client on every node should point to the same node in your cluster (nifi-0, for instance), not localhost. In this way, Primary Node doesn’t matter. Primary Node can switch 1,000 times and you’ll still be pointing at the same node that has all of the data. But it does mean that the service has a very real limitation, in that it’s a single point of failure. If that particular node goes down, the DistributedMapCache clients won’t be able to communicate with it until that node comes back up. If that limitation is okay for you, then by all means you can use it in production. But if you need something that provides High Availability, most people turn to Redis. Thanks -Mark > On Oct 15, 2022, at 11:19 AM, Jörg Hammerbacher > wrote: > > Hi Chris, > > thanks a lot for you information. > > I was not aware that "DistributedMapCacheServer" should not be used in > production. Maybe a short hint in the Controller Service Documentation would > be helpful also for other users. > > Pointing to RedisDistributedMapCacheClientService lead us to the decision > using Redis in the future for distributed caching data (used it he first time > now). What type of Redis persistence type to be used (RDB and/or AOF) would > be important to handle data loss vs. performance. > > In general i would like to say thank you to all the people who constantly > develop the NiFi ecosystem! Well done. > > regards, > Jörg > > > On 2022/10/14 16:22:14 Chris Sampson wrote: > > The DistributedMapCacheServer is, I believe, meant as a reference > > implementation of the service to be used as an example rather than in > > production. The kind of scenario you describe is exactly the reason to not > > use this in-memory (optionally locally persisted on disk) in a clustered > > production environment. > > > > That said, it can be used if the use case of the Flow doesn't have problems > > if a node goes offline, etc. > > > > The recommended approach is to use an external service such as Redis with > > the RedisDistributedMapCacheClientService [1]. This can interface with your > > external Redis cluster/instance using the same API. Other external services > > can be used, see the selection of related Controller Services in the nifi > > docs [2] (e.g. search for "cache"). > > > > [1]: > > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-redis-nar/1.18.0/org.apache.nifi.redis.service.RedisDistributedMapCacheClientService/index.html > > > > [2]: https://nifi.apache.org/docs/nifi-docs > > > > On Fri, 14 Oct 2022, 17:13 Jörg Hammerbacher, > > wrote: > > > > > Hi, > > > > > > > > > I have one thing where i am looking for a solution. Maybe someone can > > > help me out or give me hint how to do. > > > > > > > > > Problem: > > > > > > I often use a NiFi Clusters with "DistributedMapCacheClientService" > > > which uses a "DistributedMapCacheServer" for cluster wide key/value > > > storage. Per default the DMCS uses "in memory" and sockets for > > > synchronization. We use a persistence directory to make the data > > > persistent and to avoid that the data is gone after restarting the > > > entire cluster. But in the case, if the primary node changes, i think > > > the data will be outdated or used from a potential outdated other node. > > > If this other Node takes the primary node role, old data will be used > > > for next FecthDistrubutedMapCache. The latest updates over the old > > > primar node are gone. > > > > > > Is there a service using e.g. zookeeper "int the backgroud" to get a > > > real distributed persitent Cache - even after restarting the entire > > > cluster / all nodes? > > > > > > > > > I know, the standard cache is able to provide a hugh frequent > > > read/update servise if the data is in memory. But if we need just one or > > > max a few updates per minute ... > > > > > > Yes, using another system like a Database (as persistent singleton) can > > > be a solution - a not really matching solution. Why is there no standard > > > service in NiFi for this? Isn't it a good idea or i am the only one with > > > this problem in the past? > > > > > > > > > Thanks in advance for answers, > > > > > > Jörg (Hammerbacher) > > > > > > > > > > > > > -- > mit freundlichen Grüßen, > Jörg Hammerbacher > http://www.hammerbacher-it.de > j...@hammerbacher-it.de >
RE: Re: DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes
Hi Chris, thanks a lot for you information. I was not aware that "DistributedMapCacheServer" should not be used in production. Maybe a short hint in the Controller Service Documentation would be helpful also for other users. Pointing to RedisDistributedMapCacheClientService lead us to the decision using Redis in the future for distributed caching data (used it he first time now). What type of Redis persistence type to be used (RDB and/or AOF) would be important to handle data loss vs. performance. In general i would like to say thank you to all the people who constantly develop the NiFi ecosystem! Well done. regards, Jörg On 2022/10/14 16:22:14 Chris Sampson wrote: > The DistributedMapCacheServer is, I believe, meant as a reference > implementation of the service to be used as an example rather than in > production. The kind of scenario you describe is exactly the reason to not > use this in-memory (optionally locally persisted on disk) in a clustered > production environment. > > That said, it can be used if the use case of the Flow doesn't have problems > if a node goes offline, etc. > > The recommended approach is to use an external service such as Redis with > the RedisDistributedMapCacheClientService [1]. This can interface with your > external Redis cluster/instance using the same API. Other external services > can be used, see the selection of related Controller Services in the nifi > docs [2] (e.g. search for "cache"). > > [1]: > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-redis-nar/1.18.0/org.apache.nifi.redis.service.RedisDistributedMapCacheClientService/index.html > > [2]: https://nifi.apache.org/docs/nifi-docs > > On Fri, 14 Oct 2022, 17:13 Jörg Hammerbacher, wrote: > > > Hi, > > > > > > I have one thing where i am looking for a solution. Maybe someone can > > help me out or give me hint how to do. > > > > > > Problem: > > > > I often use a NiFi Clusters with "DistributedMapCacheClientService" > > which uses a "DistributedMapCacheServer" for cluster wide key/value > > storage. Per default the DMCS uses "in memory" and sockets for > > synchronization. We use a persistence directory to make the data > > persistent and to avoid that the data is gone after restarting the > > entire cluster. But in the case, if the primary node changes, i think > > the data will be outdated or used from a potential outdated other node. > > If this other Node takes the primary node role, old data will be used > > for next FecthDistrubutedMapCache. The latest updates over the old > > primar node are gone. > > > > Is there a service using e.g. zookeeper "int the backgroud" to get a > > real distributed persitent Cache - even after restarting the entire > > cluster / all nodes? > > > > > > I know, the standard cache is able to provide a hugh frequent > > read/update servise if the data is in memory. But if we need just one or > > max a few updates per minute ... > > > > Yes, using another system like a Database (as persistent singleton) can > > be a solution - a not really matching solution. Why is there no standard > > service in NiFi for this? Isn't it a good idea or i am the only one with > > this problem in the past? > > > > > > Thanks in advance for answers, > > > > Jörg (Hammerbacher) > > > > > > > -- mit freundlichen Grüßen, Jörg Hammerbacher http://www.hammerbacher-it.de j...@hammerbacher-it.de
Re: DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes
The DistributedMapCacheServer is, I believe, meant as a reference implementation of the service to be used as an example rather than in production. The kind of scenario you describe is exactly the reason to not use this in-memory (optionally locally persisted on disk) in a clustered production environment. That said, it can be used if the use case of the Flow doesn't have problems if a node goes offline, etc. The recommended approach is to use an external service such as Redis with the RedisDistributedMapCacheClientService [1]. This can interface with your external Redis cluster/instance using the same API. Other external services can be used, see the selection of related Controller Services in the nifi docs [2] (e.g. search for "cache"). [1]: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-redis-nar/1.18.0/org.apache.nifi.redis.service.RedisDistributedMapCacheClientService/index.html [2]: https://nifi.apache.org/docs/nifi-docs On Fri, 14 Oct 2022, 17:13 Jörg Hammerbacher, wrote: > Hi, > > > I have one thing where i am looking for a solution. Maybe someone can > help me out or give me hint how to do. > > > Problem: > > I often use a NiFi Clusters with "DistributedMapCacheClientService" > which uses a "DistributedMapCacheServer" for cluster wide key/value > storage. Per default the DMCS uses "in memory" and sockets for > synchronization. We use a persistence directory to make the data > persistent and to avoid that the data is gone after restarting the > entire cluster. But in the case, if the primary node changes, i think > the data will be outdated or used from a potential outdated other node. > If this other Node takes the primary node role, old data will be used > for next FecthDistrubutedMapCache. The latest updates over the old > primar node are gone. > > Is there a service using e.g. zookeeper "int the backgroud" to get a > real distributed persitent Cache - even after restarting the entire > cluster / all nodes? > > > I know, the standard cache is able to provide a hugh frequent > read/update servise if the data is in memory. But if we need just one or > max a few updates per minute ... > > Yes, using another system like a Database (as persistent singleton) can > be a solution - a not really matching solution. Why is there no standard > service in NiFi for this? Isn't it a good idea or i am the only one with > this problem in the past? > > > Thanks in advance for answers, > > Jörg (Hammerbacher) > > >
DistributedMapCacheServer persistent directory - Cluster wide same values after primary node changes
Hi, I have one thing where i am looking for a solution. Maybe someone can help me out or give me hint how to do. Problem: I often use a NiFi Clusters with "DistributedMapCacheClientService" which uses a "DistributedMapCacheServer" for cluster wide key/value storage. Per default the DMCS uses "in memory" and sockets for synchronization. We use a persistence directory to make the data persistent and to avoid that the data is gone after restarting the entire cluster. But in the case, if the primary node changes, i think the data will be outdated or used from a potential outdated other node. If this other Node takes the primary node role, old data will be used for next FecthDistrubutedMapCache. The latest updates over the old primar node are gone. Is there a service using e.g. zookeeper "int the backgroud" to get a real distributed persitent Cache - even after restarting the entire cluster / all nodes? I know, the standard cache is able to provide a hugh frequent read/update servise if the data is in memory. But if we need just one or max a few updates per minute ... Yes, using another system like a Database (as persistent singleton) can be a solution - a not really matching solution. Why is there no standard service in NiFi for this? Isn't it a good idea or i am the only one with this problem in the past? Thanks in advance for answers, Jörg (Hammerbacher)