RE: Embedded Hazelcast Cachemanager

2023-02-21 Thread Isha Lamboo
Hi Simon,

The Hazelcast cache is being used by a DetectDuplicate processor to cache and 
eliminate message ids. These arrive in large daily batches with 300-500k 
messages, most (90+%) of which are actually duplicates. This was previously 
done with a DistributedMapCacheServer, but that involved using only one of the 
nodes (hardcoded in the MapCacheClient controller), giving us a single point of 
failure for the flow. We had hoped to use Hazelcast to have a redundant 
cacheserver, but I’m starting to think that this scenario causes too many 
concurrent updates of the cache, on top of the already heavy load from other 
processing on the batch.

What was new to me is the CPU load on the cluster in question going through the 
roof, on all 3 nodes. I have no idea how a 16 vCPU server gets to a load of 
100+.

The start roughly coincides with the arrival of the daily batch, though there 
may have been other batch processes going on since it’s a Sunday. However, the 
queues were pretty much empty again in an hour and yet the craziness kept going 
until I finally decided to restart all nodes.
[cid:image001.png@01D94637.5895ACD0]

The hazelcast troubles might well be a side-effect of the NiFi servers being 
overloaded. There could have been issues at the Azure VM level etc. But 
activating the Hazelcast controller is the only change I *know* about. And it 
doesn’t seem farfetched that it got into a loop trying to migrate/copy 
partitions “lost” on other nodes.

I’ve attached a file with selected hazelcast warnings and errors from the 
nifi-app.log files, trying to include as many unique ones as possible.

The errors that kept repeating where these (always together):

2023-02-19 08:58:39,899Z (UTC+0) ERROR 
[hz.68e948cb-6e3f-445e-b1c8-70311cae9b84.cached.thread-47] 
c.h.i.c.i.operations.LockClusterStateOp [su20cnifi103-ap.REDACTED.nl]:5701 
[nifi] [4.2.5] Still have pending migration tasks, cannot lock cluster state! 
New state: ClusterStateChange{type=class com.hazelcast.cluster.ClusterState, 
newState=FROZEN}, current state: ACTIVE
2023-02-19 08:58:39,900Z (UTC+0) WARN 
[hz.68e948cb-6e3f-445e-b1c8-70311cae9b84.cached.thread-47] 
c.h.internal.cluster.impl.TcpIpJoiner [su20cnifi103-ap. REDACTED.nl]:5701 
[nifi] [4.2.5] While changing cluster state to FROZEN! 
java.lang.IllegalStateException: Still have pending migration tasks, cannot 
lock cluster state! New state: ClusterStateChange{type=class 
com.hazelcast.cluster.ClusterState, newState=FROZEN}, current state: ACTIVE

Thanks,

Isha

Van: Simon Bence 
Verzonden: dinsdag 21 februari 2023 08:52
Aan: users@nifi.apache.org
Onderwerp: Re: Embedded Hazelcast Cachemanager

Hi Isha,

Could you please share the error messages? It might bring light to something 
might effect the performance.

In the other hand, I am not aware of exhaustive performance tests for the 
Hazelcast Cache. In general it should not be the bottleneck, but if you could 
please give some details about the error and possibly the intended way of 
usage, it could help to find a more specific answer.

Best regards,
Bence Simon


On 2023. Feb 20., at 15:19, Isha Lamboo 
mailto:isha.lam...@virtualsciences.nl>> wrote:

Hi all,

This morning I had to fix up a cluster of NiFi 1.18.0 servers where the primary 
was constantly crashing and moving to the next server.

One of the recent changes was activating an Embedded Hazelcast Cache, and I did 
see errors reported trying with promotions going wrong. I can’t tell if this is 
cause or effect, so I’m trying to get a feeling for the performance demands of 
Hazelcast, but there is nothing, only a time to live for cache items. The 
diagnostics dump also didn’t give me anything on this controllerservice.

Does anyone have experience with tuning/diagnosing the Hazelcast components 
within NiFi?

Met vriendelijke groet,

Isha Lamboo
Data Engineer
­ 

# First there are warnings about slow operations and time jumps, probably 
because the system is overloaded

2023-02-19 01:51:22,847Z (UTC+0) WARN 
[hz.68e948cb-6e3f-445e-b1c8-70311cae9b84.SlowOperationDetectorThread] 
c.h.s.i.o.s.SlowOperationDetector [su20cnifi103-ap.REDACTED.nl]:5701 [nifi] 
[4.2.5] Slow operation detected: com.hazelcast.internal.nio.Packet
2023-02-19 01:51:24,077Z (UTC+0) WARN 
[hz.68e948cb-6e3f-445e-b1c8-70311cae9b84.cached.thread-36] 
c.h.i.c.impl.ClusterHeartbeatManager [su20cnifi103-ap.REDACTED.nl]:5701 [nifi] 
[4.2.5] Resetting heartbeat timestamps because of huge system clock jump! 
Clock-Jump: 35000 ms, Heartbeat-Timeout: 6 ms
2023-02-19 01:53:13,991Z (UTC+0) WARN 
[hz.68e948cb-6e3f-445e-b1c8-70311cae9b84.priority-generic-operation.thread-0] 
c.h.i.c.impl.ClusterHeartbeatManager [su20cnifi103-ap.REDACTED.nl]:5701 [nifi] 
[4.2.5] Ignoring heartbeat from Member [su20cnifi101-ap.REDACTED.nl]:5701 - 
91c9a123-a1b9-4db5-9a87-9bb4a229d5bd since it is expired (now: 2023-02-19 
02:51:34.755, timestamp: 2023-02-19 02:50:59.107)

# This continues for several hours until stuff gets worse:

2023-02-19

Re: Processor with cron scheduling in middle of flow

2023-02-21 Thread Mark Payne
John,

You should not be using CRON driven for any processors in the middle of a flow. 
In fact, we really should probably just disable that all together.
In fact, it’s exceedingly rare that you’d want anything other than Timer-Driven 
with a Run Schedule of 0 sec.
MergeContent will not create any merged output on its first iteration after 
it’s scheduled to run. It requires at least a second iteration before anything 
is transferred. Its algorithm has evolved over time, and it may well have 
happened to work previously but it’s really not being configured as intended.

When you say that you’re retrieving data from a few sources and then “merges 
that all back into a single file” - does that mean that you started with one 
FlowFile, split it up, and then want to re-assemble the data after performing 
enrichment? If so you’ll want to use a Merge Strategy of Defragment.

If you’re trying to just bring in some data and merge it together by 
correlation attribute, then Bin Packing makes sense. Here, you have a few 
properties that you can use to try to get the best bin packing. In short, a bin 
will be created when any of these conditions is met:

- The Minimum Group Size is reached AND the Minimum Number of Entries is met
- The Maximum Group Size OR the Maximum Number of Entries is met
- A bin has been sitting for “Max Bin Age” amount of time
- If a correlation attribute is used, and a FlowFile comes in that can’t go 
into any bin, it will evict the oldest.

If you’re seeing bins smaller than expected, you can look at the Data 
Provenance for the merged FlowFile, and it will tell you exactly which of the 
conditions above triggered the data to be merged. This may help to adjust these 
settings.

Hope this is helpful.

Thanks
-Mark


> On Feb 17, 2023, at 1:39 PM, John McGinn via users  
> wrote:
> 
> Hello,
> 
> NiFi 1.19.0 - I need some help in trying to make my idea work, or figure out 
> the better way to do this.
> 
> I've got a flow that retrieves data from a few data sources, enhances 
> individual flow files, converts attributes to CSV and then merges that all 
> back into a single file. It takes roughly 20 minutes for the process to run 
> from start to the MergeContent part, so when I do it manually, I stop the 
> MergeContent processor until all flowfiles are in the queue waiting, and then 
> I start the MergeContent processor. (Run One Time doesn't work for some 
> reason.) That works fine, manually. 
> 
> When I try to put cron scheduling in, it never kicks off. For instance, the 
> initial processor in the flow has a cron schedule of the top of the hour. (0 
> 0 * * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?). When 
> I start the flow, the flowfiles are generated and queue up in front of 
> MergeContent by 25 minutes past the hour, but the MergeContent never kicks 
> off.
> 
> I added a correlation attribute recently and removed the cron entry, but the 
> MergeContent just creates small bunches of merged files.
> 
> I even attempted to put a cron on the AttributesToCSV with a maximum bin age 
> on the Merge Content, since it takes less than a minute for the 
> AttribuesToCSV to process the flowfiles at that point, but the cron didn't 
> kick off there either.
> 
> Any ideas on how to get this to work?
> 
> Thanks,
> John



Re: Processor with cron scheduling in middle of flow

2023-02-21 Thread Joe Witt
John

MergeContent cannot reliably work well with cron scheduling.  That
component is designed to get threads consistently so it can perform
its bin packing function and time and size based kick out functions.

If it ever worked with cron scheduling that was mostly on accident I'd say.

Thanks

Joe

On Tue, Feb 21, 2023 at 1:26 PM John McGinn via users
 wrote:
>
> I've created an XML template as a GitHub Gist of the possible MergeContent 
> cron scheduling bug. The template was created in 1.12.0, and uploaded to 
> 1.18.0. Worked in 1.12, didn't work in 1.18.0. I downloaded 1.20.0 locally, 
> and it doesn't work there either. I have the initial GenerateFlowFile and the 
> MergeContent set for cron scheduling with a 25 minute break in between the 
> two. (I know the file extension is .js, but it's an XML file.)
>
> https://gist.github.com/Figgie123/245e49ca29135ef6e4db50a7b4f5d5b7.js
> On Friday, February 17, 2023, 01:39:17 PM EST, John McGinn 
>  wrote:
>
>
> Hello,
>
> NiFi 1.19.0 - I need some help in trying to make my idea work, or figure out 
> the better way to do this.
>
> I've got a flow that retrieves data from a few data sources, enhances 
> individual flow files, converts attributes to CSV and then merges that all 
> back into a single file. It takes roughly 20 minutes for the process to run 
> from start to the MergeContent part, so when I do it manually, I stop the 
> MergeContent processor until all flowfiles are in the queue waiting, and then 
> I start the MergeContent processor. (Run One Time doesn't work for some 
> reason.) That works fine, manually.
>
> When I try to put cron scheduling in, it never kicks off. For instance, the 
> initial processor in the flow has a cron schedule of the top of the hour. (0 
> 0 * * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?). When 
> I start the flow, the flowfiles are generated and queue up in front of 
> MergeContent by 25 minutes past the hour, but the MergeContent never kicks 
> off.
>
> I added a correlation attribute recently and removed the cron entry, but the 
> MergeContent just creates small bunches of merged files.
>
> I even attempted to put a cron on the AttributesToCSV with a maximum bin age 
> on the Merge Content, since it takes less than a minute for the 
> AttribuesToCSV to process the flowfiles at that point, but the cron didn't 
> kick off there either.
>
> Any ideas on how to get this to work?
>
> Thanks,
> John


Re: Processor with cron scheduling in middle of flow

2023-02-21 Thread John McGinn via users
 I've created an XML template as a GitHub Gist of the possible MergeContent 
cron scheduling bug. The template was created in 1.12.0, and uploaded to 
1.18.0. Worked in 1.12, didn't work in 1.18.0. I downloaded 1.20.0 locally, and 
it doesn't work there either. I have the initial GenerateFlowFile and the 
MergeContent set for cron scheduling with a 25 minute break in between the two. 
(I know the file extension is .js, but it's an XML file.)

https://gist.github.com/Figgie123/245e49ca29135ef6e4db50a7b4f5d5b7.js
 On Friday, February 17, 2023, 01:39:17 PM EST, John McGinn 
 wrote:  
 
 Hello,

NiFi 1.19.0 - I need some help in trying to make my idea work, or figure out 
the better way to do this.

I've got a flow that retrieves data from a few data sources, enhances 
individual flow files, converts attributes to CSV and then merges that all back 
into a single file. It takes roughly 20 minutes for the process to run from 
start to the MergeContent part, so when I do it manually, I stop the 
MergeContent processor until all flowfiles are in the queue waiting, and then I 
start the MergeContent processor. (Run One Time doesn't work for some reason.) 
That works fine, manually. 

When I try to put cron scheduling in, it never kicks off. For instance, the 
initial processor in the flow has a cron schedule of the top of the hour. (0 0 
* * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?). When I 
start the flow, the flowfiles are generated and queue up in front of 
MergeContent by 25 minutes past the hour, but the MergeContent never kicks off.

I added a correlation attribute recently and removed the cron entry, but the 
MergeContent just creates small bunches of merged files.

I even attempted to put a cron on the AttributesToCSV with a maximum bin age on 
the Merge Content, since it takes less than a minute for the AttribuesToCSV to 
process the flowfiles at that point, but the cron didn't kick off there either.

Any ideas on how to get this to work?

Thanks,
John  

Re: Web Based Nifi for Python

2023-02-21 Thread Joe Witt
Darren

Again please stop using the Apache NiFi mailing lists as an
advertising platform for your unrelated activities.

What you're doing surely looks cool.  You're clearly passionate about
it.  You were clearly very inspired by what this community has built
and is doing. That is all great.  Go have fun.

Thanks

On Tue, Feb 21, 2023 at 7:27 AM Patrick Timmins  wrote:
>
> I guess Joe already responded on this front ... more than a year and a half 
> ago!
>
>
>
>  Forwarded Message 
> From:Joe Witt 
> Date:Sat, 10 Jul 2021 09:44:53 -0700
> Subject: Re: PYFI Python Nifi Clone
> To:  users@nifi.apache.org
>
>
> Sounds fun and looks cool.
>
> But do not violate the marks such as do not use the Apache NiFi logo.
>
> Thanks
>
>
> On Sat, Jul 10, 2021 at 9:38 AM Darren Govoni  wrote:
>
> Hi!,
>Just sharing a fun project I'll post on github soon. I'm creating a 
> pure python clone of Nifi that separates the UI (Vue/NodeJS implementation) 
> from the backend distributed messaging layer (RabbitMQ, Redis, AMQP, SQS). It 
> will allow for runtime scripting of processors using python and leverage a 
> variety of transactional message brokers and distributed topologies (e.g. 
> AMQP).
> 
>
> Cheers!
>
> Darren
>
>
>
> On 2/21/2023 8:13 AM, Patrick Timmins wrote:
>
> So this looks a lot like the "PyFi" related projects that you put out on 
> GitHub back in ~ July 2021.  But the GitHub stuff that I'm seeing for 
> ElasticCode are just all stub projects ... and they look like they were 
> created just a few weeks ago:
>
> https://github.com/elasticcode-ai/elasticcode
>
> So I'm assuming the ElasticCode source is *not* Apache V2 licensed?
>
> But it also looks like the Apache NiFi logo and color scheme is still being 
> used in the ElasticCode GUI ... just like in the "PyFi" GUI:
>
> https://www.youtube.com/watch?v=h334byeynFo
>
> ... probably should not be advertising this on the NiFi mailing list ... but 
> Joe W. can probably make that call.
>
>
> Pat
>
>
>
>
>  Forwarded Message 
>
> From:Darren Govoni 
> To:  users@nifi.apache.org 
> Subject: Re: PYFI Python Nifi Clone
> Date:Tue, 13 Jul 2021 12:08:27 +
>
>
> Hello!
>
> Here is my github repo for PYFI NodeJS/Vue implementation of Nifi UI. Enjoy!
>
> https://github.com/radiantone/pyfi-ui
>
> Darren
> From: Darren Govoni
> Sent: Saturday, July 10, 2021 12:38 PM
> To: users@nifi.apache.org 
> Subject: PYFI Python Nifi Clone
>
> Hi!,
>Just sharing a fun project I'll post on github soon. I'm creating a pure 
> python clone of Nifi that separates the UI (Vue/NodeJS implementation) from 
> the backend distributed messaging layer (RabbitMQ, Redis, AMQP, SQS). It will 
> allow for runtime scripting of processors using python and leverage a variety 
> of transactional message brokers and distributed topologies (e.g. AMQP).
>
> Here is a sneak peek at my port of the UI to Vue/NodeJS which I'll share on 
> github soon (minified). It's a fully MVC/Node/Vue reactive and responsive UI 
> that adheres to Material Design 2.0 standard. Also uses webpack build and is 
> minified, etc.
>
> Makes a number of improvements such as tabs for multiple flow renders and 
> will interface directly with git for flow versioning.
>
> Cheers!
> Darren
>
>
>
>
> On 2/21/2023 5:43 AM, Darren Govoni wrote:
>
> Hi,
>This looks like a promising spiritual successor to Java Nifi. Pure python. 
> Processors scale individually, directly on the metal.
> Also runs purely in browser for testing.
>
> https://elasticcode.ai
>
> Pretty neat!


Re: Web Based Nifi for Python

2023-02-21 Thread Patrick Timmins
I guess Joe already responded on this front ... more than a year and a 
half ago!




 Forwarded Message 
From:    Joe Witt 
Date:    Sat, 10 Jul 2021 09:44:53 -0700
Subject: Re: PYFI Python Nifi Clone
To: users@nifi.apache.org


Sounds fun and looks cool.

But do not violate the marks such as do not use the Apache NiFi logo.

Thanks


On Sat, Jul 10, 2021 at 9:38 AM Darren Govoni  wrote:

    Hi!,
   Just sharing a fun project I'll post on github soon. I'm 
creating a pure python clone of Nifi that separates the UI (Vue/NodeJS 
implementation) from the backend distributed messaging layer (RabbitMQ, 
Redis, AMQP, SQS). It will allow for runtime scripting of processors 
using python and leverage a variety of transactional message brokers and 
distributed topologies (e.g. AMQP).



    Cheers!

    Darren



On 2/21/2023 8:13 AM, Patrick Timmins wrote:


So this looks a lot like the "PyFi" related projects that you put out 
on GitHub back in ~ July 2021.  But the GitHub stuff that I'm seeing 
for ElasticCode are just all stub projects ... and they look like they 
were created just a few weeks ago:


https://github.com/elasticcode-ai/elasticcode

So I'm assuming the ElasticCode source is *not* Apache V2 licensed?

But it also looks like the Apache NiFi logo and color scheme is still 
being used in the ElasticCode GUI ... just like in the "PyFi" GUI:


https://www.youtube.com/watch?v=h334byeynFo

... probably should not be advertising this on the NiFi mailing list 
... but Joe W. can probably make that call.



Pat




 Forwarded Message 

From:    Darren Govoni 
To: users@nifi.apache.org 
Subject: Re: PYFI Python Nifi Clone
Date:    Tue, 13 Jul 2021 12:08:27 +


Hello!

Here is my github repo for PYFI NodeJS/Vue implementation of Nifi UI. 
Enjoy!


https://github.com/radiantone/pyfi-ui

Darren
From: Darren Govoni
Sent: Saturday, July 10, 2021 12:38 PM
To: users@nifi.apache.org 
Subject: PYFI Python Nifi Clone

Hi!,
   Just sharing a fun project I'll post on github soon. I'm creating a 
pure python clone of Nifi that separates the UI (Vue/NodeJS 
implementation) from the backend distributed messaging layer 
(RabbitMQ, Redis, AMQP, SQS). It will allow for runtime scripting of 
processors using python and leverage a variety of transactional 
message brokers and distributed topologies (e.g. AMQP).


Here is a sneak peek at my port of the UI to Vue/NodeJS which I'll 
share on github soon (minified). It's a fully MVC/Node/Vue reactive 
and responsive UI that adheres to Material Design 2.0 standard. Also 
uses webpack build and is minified, etc.


Makes a number of improvements such as tabs for multiple flow renders 
and will interface directly with git for flow versioning.


Cheers!
Darren




On 2/21/2023 5:43 AM, Darren Govoni wrote:

Hi,
   This looks like a promising spiritual successor to Java Nifi. Pure 
python. Processors scale individually, directly on the metal.

Also runs purely in browser for testing.

https://elasticcode.ai

Pretty neat!

Re: Web Based Nifi for Python

2023-02-21 Thread Patrick Timmins
So this looks a lot like the "PyFi" related projects that you put out on 
GitHub back in ~ July 2021.  But the GitHub stuff that I'm seeing for 
ElasticCode are just all stub projects ... and they look like they were 
created just a few weeks ago:


https://github.com/elasticcode-ai/elasticcode

So I'm assuming the ElasticCode source is *not* Apache V2 licensed?

But it also looks like the Apache NiFi logo and color scheme is still 
being used in the ElasticCode GUI ... just like in the "PyFi" GUI:


https://www.youtube.com/watch?v=h334byeynFo

... probably should not be advertising this on the NiFi mailing list ... 
but Joe W. can probably make that call.



Pat




 Forwarded Message 

From:    Darren Govoni 
To: users@nifi.apache.org 
Subject: Re: PYFI Python Nifi Clone
Date:    Tue, 13 Jul 2021 12:08:27 +


Hello!

Here is my github repo for PYFI NodeJS/Vue implementation of Nifi UI. Enjoy!

https://github.com/radiantone/pyfi-ui

Darren
From: Darren Govoni
Sent: Saturday, July 10, 2021 12:38 PM
To: users@nifi.apache.org 
Subject: PYFI Python Nifi Clone

Hi!,
   Just sharing a fun project I'll post on github soon. I'm creating a 
pure python clone of Nifi that separates the UI (Vue/NodeJS 
implementation) from the backend distributed messaging layer (RabbitMQ, 
Redis, AMQP, SQS). It will allow for runtime scripting of processors 
using python and leverage a variety of transactional message brokers and 
distributed topologies (e.g. AMQP).


Here is a sneak peek at my port of the UI to Vue/NodeJS which I'll share 
on github soon (minified). It's a fully MVC/Node/Vue reactive and 
responsive UI that adheres to Material Design 2.0 standard. Also uses 
webpack build and is minified, etc.


Makes a number of improvements such as tabs for multiple flow renders 
and will interface directly with git for flow versioning.


Cheers!
Darren




On 2/21/2023 5:43 AM, Darren Govoni wrote:

Hi,
   This looks like a promising spiritual successor to Java Nifi. Pure 
python. Processors scale individually, directly on the metal.

Also runs purely in browser for testing.

https://elasticcode.ai

Pretty neat!

Web Based Nifi for Python

2023-02-21 Thread Darren Govoni
Hi,
   This looks like a promising spiritual successor to Java Nifi. Pure python. 
Processors scale individually, directly on the metal.
Also runs purely in browser for testing.

https://elasticcode.ai

Pretty neat!