Hi Isha

We are using Redis in a 3 node Redis Sentinel cluster for HA purpose. It
works fine

Kind regards
Jens M. Kofoed

Den ons. 22. feb. 2023 kl. 11.36 skrev Isha Lamboo <
[email protected]>:

> Hi Simon,
>
> Thanks for your explanation. It will help me manage expectations with the
> team that developed the flow. We were hoping to do exactly as you suggest,
> drop in a redundant cache without the time and resource investment of
> setting up an external cluster like Redis or Hazelcast. And in fact, it
> runs fine on most days, but as currently set up it doesn't play nice when
> the load on the cluster gets too high or nodes disconnect.
>
> If I get the time to run some tests I'll share the results, but for now
> I'll advise the devs to accept a longer run and schedule the
> DetectDuplicate less often or to revert to using the
> DistributedMapCacheServer on a single node again. If neither is acceptable
> they can request an external cache service cluster.
>
> Thank you very much,
>
> Isha
>
> -----Oorspronkelijk bericht-----
> Van: Simon Bence <[email protected]>
> Verzonden: woensdag 22 februari 2023 10:47
> Aan: [email protected]
> Onderwerp: Re: Embedded Hazelcast Cachemanager
>
> Hi Isha,
>
> Without deeper understanding of the situation I am not sure if the load
> comes from entirely this part of this given batch processing, but for the
> scope of the discussion I do assume this also with the assumption that this
> shows drastic contrast with the same measurements using DistributedMapCache
> as cache.
>
> The EmbeddedHazelcastCacheManager was primarily added for simpler
> scenarios as an out of the box solution, might be "grabbed to the canvas”
> without much fuss. As of this, it has very limited customisation
> capabilities. As your scenario looks to utilize Hazelcast heavily, this
> might not be the ideal tool. Also it is important to mention, that in case
> of the embedded approach, the Hazelcast instances running on the same
> server, thus they are adding to the load already produced by other parts of
> the flow.
>
> Using ExternalHazelcastCacheManager can provide much more flexibility: as
> it works with standalone Hazelcast instances, this approach opens the whole
> range of performance optimization capabilities of it. You can use either
> one single instance touched by all the nodes (which comes with no
> synchronization between Hazelcast nodes but might be a bottleneck at some
> point) or even build up a separate cluster. Of course, the results are
> highly depend on network topology and other factors specific for your use
> case.
>
> Also I am not sure the details of your flows or if you prefer processing
> time over throughput or not, but it is also a possible optimization
> opportunity to distribute the batch in time resulting smaller peaks.
>
> Best regards,
> Bence
>
>
> > On 2023. Feb 21., at 21:45, Isha Lamboo <[email protected]>
> wrote:
> >
> > Hi Simon,
> >
> > The Hazelcast cache is being used by a DetectDuplicate processor to
> cache and eliminate message ids. These arrive in large daily batches with
> 300-500k messages, most (90+%) of which are actually duplicates. This was
> previously done with a DistributedMapCacheServer, but that involved using
> only one of the nodes (hardcoded in the MapCacheClient controller), giving
> us a single point of failure for the flow. We had hoped to use Hazelcast to
> have a redundant cacheserver, but I’m starting to think that this scenario
> causes too many concurrent updates of the cache, on top of the already
> heavy load from other processing on the batch.
> >
> > What was new to me is the CPU load on the cluster in question going
> through the roof, on all 3 nodes. I have no idea how a 16 vCPU server gets
> to a load of 100+.
> >
> > The start roughly coincides with the arrival of the daily batch, though
> there may have been other batch processes going on since it’s a Sunday.
> However, the queues were pretty much empty again in an hour and yet the
> craziness kept going until I finally decided to restart all nodes.
> > <image001.png>
> >
> > The hazelcast troubles might well be a side-effect of the NiFi servers
> being overloaded. There could have been issues at the Azure VM level etc.
> But activating the Hazelcast controller is the only change I *know* about.
> And it doesn’t seem farfetched that it got into a loop trying to
> migrate/copy partitions “lost” on other nodes.
> >
> > I’ve attached a file with selected hazelcast warnings and errors from
> the nifi-app.log files, trying to include as many unique ones as possible.
> >
> > The errors that kept repeating where these (always together):
> >
> > 2023-02-19 08:58:39,899Z (UTC+0) ERROR
> [hz.68e948cb-6e3f-445e-b1c8-70311cae9b84.cached.thread-47]
> c.h.i.c.i.operations.LockClusterStateOp [su20cnifi103-ap.REDACTED.nl]:5701
> [nifi] [4.2.5] Still have pending migration tasks, cannot lock cluster
> state! New state: ClusterStateChange{type=class
> com.hazelcast.cluster.ClusterState, newState=FROZEN}, current state: ACTIVE
> > 2023-02-19 08:58:39,900Z (UTC+0) WARN
> [hz.68e948cb-6e3f-445e-b1c8-70311cae9b84.cached.thread-47]
> c.h.internal.cluster.impl.TcpIpJoiner [su20cnifi103-ap.REDACTED.nl]:5701
> [nifi] [4.2.5] While changing cluster state to FROZEN!
> java.lang.IllegalStateException: Still have pending migration tasks, cannot
> lock cluster state! New state: ClusterStateChange{type=class
> com.hazelcast.cluster.ClusterState, newState=FROZEN}, current state: ACTIVE
> >
> > Thanks,
> >
> > Isha
> >
> > Van: Simon Bence <[email protected]>
> > Verzonden: dinsdag 21 februari 2023 08:52
> > Aan: [email protected]
> > Onderwerp: Re: Embedded Hazelcast Cachemanager
> >
> > Hi Isha,
> >
> > Could you please share the error messages? It might bring light to
> something might effect the performance.
> >
> > In the other hand, I am not aware of exhaustive performance tests for
> the Hazelcast Cache. In general it should not be the bottleneck, but if you
> could please give some details about the error and possibly the intended
> way of usage, it could help to find a more specific answer.
> >
> > Best regards,
> > Bence Simon
> >
> >
> > On 2023. Feb 20., at 15:19, Isha Lamboo <[email protected]>
> wrote:
> >
> > Hi all,
> >
> > This morning I had to fix up a cluster of NiFi 1.18.0 servers where the
> primary was constantly crashing and moving to the next server.
> >
> > One of the recent changes was activating an Embedded Hazelcast Cache,
> and I did see errors reported trying with promotions going wrong. I can’t
> tell if this is cause or effect, so I’m trying to get a feeling for the
> performance demands of Hazelcast, but there is nothing, only a time to live
> for cache items. The diagnostics dump also didn’t give me anything on this
> controllerservice.
> >
> > Does anyone have experience with tuning/diagnosing the Hazelcast
> components within NiFi?
> >
> > Met vriendelijke groet,
> >
> > Isha Lamboo
> > Data Engineer
> >  <image001.png>
> >
> > <nifi_hazelcast_log.txt>
>
>

Reply via email to