[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2024-02-08 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815864#comment-17815864
 ] 

ASF subversion and git services commented on NIFI-12236:


Commit a519585b02f37c654638ece4681928fd3abbe355 in nifi's branch 
refs/heads/main from Simon Bence
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a519585b02 ]

NIFI-12236 Improved Fault Tolerance in QuestDB Status Repository

- Moved QuestDB components to nifi-questdb-bundle

This closes #8152

Signed-off-by: David Handermann 


> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-07 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794395#comment-17794395
 ] 

Pierre Villard commented on NIFI-12236:
---

Thanks David, this is definitely a nice improvement and will make things much 
better.

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-07 Thread David Handermann (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794392#comment-17794392
 ] 

David Handermann commented on NIFI-12236:
-

To help move things forward, I created a separate Jira issue NIFI-12492 for 
moving QuestDB to its own NAR and submitted a pull request for review.

I realize that moving the components around would require rebasing the changes, 
but the goal is to address one of the concerns about continued maintenance of 
the QuestDB implementation. With the StatusHistoryRepository as an existing 
extension point, having a separate NAR makes it much easier to iterate and 
improve the implementation without changes to the framework itself.

I was able to incorporate upgrading from QuestDB 7.2 to 7.3.7 for NIFI-12435, 
as the differences were minimal.

The move to a separate NAR did not require any code changes, other than the 
minor adjustments for updating to QuestDB 7.3.7.

Although not wanting to slow progress on the more fundamental improvements, 
moving the implementation to a separate NAR should address some general 
concerns and provide a better path forward.

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-07 Thread David Handermann (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794320#comment-17794320
 ] 

David Handermann commented on NIFI-12236:
-

Regarding the particular point of moving QuestDB support to a separate NAR, it 
is important to note that the StatusHistoryRepository is already an extension 
point, so it is open to custom implementations. That highlights the importance 
of avoiding API changes for this specific implementation. That also means 
moving the QuestDB implementation to a separate NAR should be straightforward. 
On that point in particular, I would be willing to move the existing 
implementation to a separate NAR, which would help keep these changes more 
focused.

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-07 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794312#comment-17794312
 ] 

Joe Witt commented on NIFI-12236:
-

Regarding defaults - thank you.

API Changes: 
These should get the highest degree of scrutiny by reviewers and the most 
effort and thought by authors.  At this stage I do not understand the intention 
for the change.  It appears that concern is shared. If there is a 
non-API/specific to this implementation option that should be pursued instead.

Suggestion about the nar:
I am not aware of other cases where the optional implementation of a thing is 
the light in terms of dependencies path.  In this case the optional 
implementation (QuestDB backed stats) is the heavier one in terms of 
dependency/operations.  By suggesting it be in its own nar I am not of the view 
it is a lot more work to be clear.  I could be wrong.  But the more work 
consideration is also not a strong one.  The 'it is already in there...' 
argument is better since well - that is true.  I'm just asking that the effort 
to do that be strongly considered because then those who would not use this 
option don't have to carry the component into deployment nor the potential 
vulnerabilities the libraries might bring.  But those that want it are good to 
go and can use it.  If it is baked into the framework nar this is not an option.

As evidence of my concern regarding the vulnerability it does give pause that 
the version of QuestDB was not considered in this PR to this point.  Keeping 
dependencies up to date is *everyones* responsibility.  

The NiFi 2.0 line is by the way super wildly close to being entirely free of 
static coding practice and dependency vulnerabilities (excluding hadoop related 
components) which is in itself a miracle.  Please ensure the dependencies in 
this PR are up to date.  On that note do we know what QuestDB version changes 
mean in terms of breaking changes, changes that might require users to do 
something to keep their state, etc..?  I ask because we learned our lessons the 
hard way here with H2.  H2 served us extremely well over the years but 
eventually it created some very complicated compelling events.

Anyway - I dont want to come off like this thing can't happen.  The root idea 
of this I think we all agree is useful. There is some debate on whether this 
implementation approach is desirable vs another but even that does not matter.  
He who does the work for a given implementation ultimately wins.  What I am 
asking then though is please make that implementation as narrow and specific as 
possible to give others choice on whether they are exposed to it.  That is the 
heart of the replies I am making.

Thanks

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-07 Thread Simon Bence (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794244#comment-17794244
 ] 

Simon Bence commented on NIFI-12236:


Hei [~pvillard]!

Thanks for mentioning the admin guide! I forgot to add the newly wired out 
properties indeed. I will add those during the review.

1. I think we agreed on the defaulting, with review changes it will be changed 
back as well
2. As this looks to be more a nuanced question and I already have a discussion 
about it on the PR with David Hendermann, I suggest to continue this topic 
there. I am open for reverting those changes but before just doing that, I 
would like to be sure that the intentions for the change are clear.
3. I am not sure what Joe meant under moving it into a nar. If it is the 
already mentioned pluggability (which I totally in line with in the long run) I 
think it is not something to add to this PR, which is more like a glorified 
refactor story. If it is about something else, could you please specifiy it 
[~joewitt] ? (Note: the actual QuestDB related code is in a separate jar. All 
the code remained in the `nifi-framework-core` is related to the actual 
repository implementation (counterpart of the Volatile implementation

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-07 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794214#comment-17794214
 ] 

Pierre Villard commented on NIFI-12236:
---

I definitely agree on the documentation aspect and it'd be nice to have the PR 
update the admin-guide with as much documentation as possible regarding this 
implementation (its value, its configuration, etc).

There is already a bit here: 
[https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-repository]
 

Regarding your points:
{quote} # Avoid making this the default. It can be opted into.{quote}
Definite agree here. Let's not change the default at this point.
{quote}2. Avoid making nifi-api changes.
{quote}
I believe [~simonbence] gave the reasons as why the changes are making sense 
and since we're still working towards NiFi 2.0, I believe this is worth 
discussing but I believe the current work can also be done without API change 
if that's the real blocker.
{quote}3. Move this into its own nar instead of the core framework nar. It can 
certainly still be included as it is today.
{quote}
Yeah, ideally every repository in NiFi could be a pluggable endpoint where a 
NAR can be provided with a given implementation. I'd see a lot of value there 
and I believe it's been mentioned a few times in the community. However, making 
the repositories a pluggable endpoint would likely be a significant effort and, 
in my opinion, it should be a follow up effort, not a prerequisite for the 
current work to be included (which is what you said, if I understood you 
correctly).

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-06 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793842#comment-17793842
 ] 

Joe Witt commented on NIFI-12236:
-

I think it would help to better explain what the properties/tunables are.  
Maybe that is made available elsewhere but i'm not sure what batch size means 
for this and I am also curious whether the user can control how much data is 
retained and what the default is for that.

A path forward that I think keeps things moving but doesn't create new risks 
currently not well enough explored
1. Avoid making this the default.  It can be opted into.
2. Avoid making nifi-api changes.
3. Move this into its own nar instead of the core framework nar.  It can 
certainly still be included as it is today.

Remaining concerns can be addressed later and based on needs and findings from 
actual usage.

Thanks

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-06 Thread Simon Bence (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793816#comment-17793816
 ] 

Simon Bence commented on NIFI-12236:


Thanks for the clarification! Let me give some context for the different 
questions.

The idea:
- The funcionality this provides almost identical to the original 
implementation not talking about error management. This is truth to the 
possible parametrization which however was not fully exposed to properties 
before. Now some of these settings (like the mentioned batch size) are exposed 
to provide the possibility to tune them in the case of need.

The implementation options:
- I completely agree with you on the part of in the long run it would be 
beneficial to have a variery of possible storage mechanism. Even so I think 
opening up the Status History Registry as a pluggable service would be a good 
step forward. It was not however the aim of this PR. The PR focused mainly on 
"bulletproofing" the existing implementation not extending the capabilities.

The implementation:
- I think this is the same case as the previous. The PR did not aim for 
extending the feature but make it more safe for production use. Personally I 
think the changes were added can be used as a good basis for future 
improvements but those should be the subject of further PRs.
- In the PR I provided a short reasoning for the API changes to 
[~exceptionfactory] and I consider it as an ongoing discussion. The same result 
might be achieved without API changes but please find my thinking process 
there. I do not inist for this approach however I see some merits of it.
- The Retry mechanism is actively used by the QuestDB implementation as part of 
error management. It is possible that I was not clear of that in the PR 
summary, but it is not something for a later commit. I separated it from the 
QuestDB related code in order to 1. have a more focused code structure 2. it is 
a possible tool for later usage within the project.

The default selection:
- This topic is already touched on the PR and I am completely okay with moving 
that part out from the commit. In the long term I would see benefits of making 
it a default (the mail thread started with this and I still have the same 
standing) but of course lets be sure about its safeness before moving on

I hope I could give some useful information and clear some fog around the PR. 
Please if you still have conflicted feeling about the approach (and I 
deliberately do not use the phrase "final result", as I see some possible 
furute steps), share it.

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-06 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793786#comment-17793786
 ] 

Joe Witt commented on NIFI-12236:
-

I want to keep the different threads of concern clear.

The idea:
* I think we all agree there is good value in having these data points 
persisted across restarts.  The example Pierre gives is a perfect example of 
why.
* The reality though is our user interface is not designed for data to be held 
for long.  It isnt clear to me from the properties mentioned in PR but what is 
the plan for helping the user configure how long this data is 
retained/queryable?  I see batch size and frequency but not quite sure what 
those would mean relative to retention.  Perhaps that is part of the earlier 
implementation.

The implementation options:
* Offer an embedded state storage mechanism.  QuestDB is one example. A 
database per nifi instance or a database per cluster.  This embedded/batteries 
included mode is quite convenient for the user but we then of course have to do 
it quite thoroughly and consider upgrade scenarios.  I think our pucker factor 
here is far higher given the challenges we've had to work through from H2 in 
the recent year or two.
* Offer the ability to connect to/use a database of the user choosing.  Defer 
installation/durability/security of that database to the user as part of their 
normal database operations/etc..  This I find is more in-line with deployment 
styles we see in the Cloud, or automated with Ansible, or how one would deploy 
in K8S.

This implementation:
* It is a questdb per nifi node.  Does not address the other mode should a user 
want that.  Of note when we offered the Zookeeper embedded mode we also offered 
to connect to a real zookeeper install.  That was to reflect the likely 
non-prod vs prod usage and I think that holds here as well.
* It has a change to the nifi-api.  We should strongly avoid any such API 
changes as part of this activity unless that change has value to various other 
components and the purpose/meaning of that change is very clear.
* It includes a Retry mechanism that the PR suggests might be addressed in a 
later commit.  I don't quite follow what that really means but I recommend 
making this implementation as simple/straight forward as possible.

The default selection:
* We should not be changing the default until this model is proven to be stable 
and recoverable in the same manner to the current in memory implementation.



> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-06 Thread Pierre Villard (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793687#comment-17793687
 ] 

Pierre Villard commented on NIFI-12236:
---

I'd like to add my 2ct here:

This feature is already used by some of the NiFi users and has proved to be 
extremely useful when it comes to debug problematic situations. It's really not 
about providing a long term monitoring solution but more about being able to 
troubleshoot things in case something bad happened. In particular, when 
something bad happens and a node is restarted, all of the data, with the 
current default implementation is lost, and this is making things really 
complicated for debugging/troubleshooting.

Regarding Datadog/Prometheus, as someone passionate about NiFi's monitoring, I 
definitely agree that anyone using NiFi in production should be deploying NiFi 
and have those solutions for the monitoring of the NiFi service. However, what 
we're talking about here is very different (in my opinion). We all know that 
NiFi users are most of the time using NiFi in a very "multi tenant" approach 
(ie. many different use cases / process groups running in the same NiFi 
environment), while the Prometheus endpoint and the reporting tasks are great 
to report high level monitoring metrics to tools where you'll have very 
advanced dashboarding capabilities, it's a completely different story to have 
"per use case" monitoring. And even if you're sending everything into something 
like Prometheus, doing the monitoring dashboards per use case will be quite 
some work. And I think this would be the same if we say: please provide your 
database and we'll push data there and then it's up to you to build dashboards 
on top of that.

As far as I'm concerned and just to give a single example, when trying to look 
at performance optimizations in a flow, I may want to look at the graph for a 
specific processor showing the metrics average task duration / average lineage 
duration. This level of detail is something that would be quite hard to get if 
not offered by NiFi. Having a capability persisting all of this data across 
restarts is really useful.

While I can definitely accept the concerns around making it the default for 
NiFi 2.0, I think the work in the corresponding PR is a tremendous amount of 
work that would be valuable for the NiFi users.

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-06 Thread Simon Bence (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793614#comment-17793614
 ] 

Simon Bence commented on NIFI-12236:


Hi [~joewitt]!

Thanks for your attention on the topic!

My understanding at the time was that your main concerns were about the 
implementation's possible implications on the core functionality of NiFi such 
as handlink different kinds of errors like corrupted database or insufficient 
disk space. So the main focus of the change was to rule out possible situations 
where an unexpected situation with QuestDb could prevent core functionality 
from working properly. I think this part is aimed reasonably and are covered 
with tests. 

As for visualisation, I did not aim for replacing that part however it sound a 
good idea to follow the changes on the backend. From my perspective the main 
benefit is to have some persistent storage behind is to be able to access 
statistics after NiFi has stopped (or shutdown abruptly).

I think in the long run the opportunity to "plug in" different kind of 
statistics repositories just like with processors could (and maybe should) be 
opened, but I did not aim for that this time. I consider this as a step forward 
which might be used as a stepping stone for future developments, including 
non-embedded solutions for persisting and other things. In many environments 
having Datatod or even MySQL backend might be more satifsfying (again: with 
this story I did not aim for that which does not closes out the possibility of 
furute steps towards that either from me or other member of the community) but 
I think, having an embedded solution comes with the merit of adding the 
capability of persistent statistical information out of the box.

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-05 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793412#comment-17793412
 ] 

Joe Witt commented on NIFI-12236:
-

I should clarify as part of that I think was confusing...

Ultimately we should just allow them to have a database instance that they run 
as per normal.  All such desired DBs have installers/tools/to have a healthy 
instance setup.  Then the nifi side would be configured to talk to it for 
sending/querying.  

> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository

2023-12-05 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793409#comment-17793409
 ] 

Joe Witt commented on NIFI-12236:
-

I went back to quickly review the presumably referenced mailing list thread 
titled 'NiFi 2.0 - QuestDB'.

The concerns I noted then remain for me.  It would certainly be nice for the 
users to have access to persistent and longer duration metric data.  That said 
I'm not sure our user interface for this is very good when it comes to holding 
this information and making interesting visualizations of it for longer 
durations.  For the type of usage the in memory durations we can reasonably 
sustain today seem about right.

What I am seeing more and more is that users of NiFi want this type of 
information available in their favorite monitoring or observability tool 
whether that is Datadog or Prometheus or something else.  

If our goal is a short term but durable store then perhaps we ought to give 
them a simple QuestDB based service/process they can run on some node.  Then 
their NiFi nodes are configured to send/query metrics from that service rather 
than it having to live on every node.  This also means it would be better 
externalized such that maybe they dont even use Quest but instead MySQL or 
Postgre or whatever they prefer.  In a k8s based deployment I can certainly see 
such a model working well.

For users that are looking for more robust data retention and query and 
analysis we're better off focusing on getting the data to their preferred tools.


> Improving fault tolerancy of the QuestDB backed metrics repository
> --
>
> Key: NIFI-12236
> URL: https://issues.apache.org/jira/browse/NIFI-12236
> Project: Apache NiFi
>  Issue Type: Improvement
>  Components: Core Framework
>Reporter: Simon Bence
>Assignee: Simon Bence
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Based on the related discussion on the dev email list, the QuestDB handling 
> of the metrics repository needs to be improved to have better fault tolerance 
> in order to be possible to use as a viable option for default metrics data 
> store. This should primarily focus on handling unexpeted database events like 
> corrupted database or loss of space on the disk. Any issues should be handled 
> with an attempt to keep the database service healthy but in case of that is 
> impossible, the priority is to keep NiFi and the core services running, even 
> with the price of metrics collection / presentation outage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)