[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17815864#comment-17815864 ] ASF subversion and git services commented on NIFI-12236: Commit a519585b02f37c654638ece4681928fd3abbe355 in nifi's branch refs/heads/main from Simon Bence [ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a519585b02 ] NIFI-12236 Improved Fault Tolerance in QuestDB Status Repository - Moved QuestDB components to nifi-questdb-bundle This closes #8152 Signed-off-by: David Handermann > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794395#comment-17794395 ] Pierre Villard commented on NIFI-12236: --- Thanks David, this is definitely a nice improvement and will make things much better. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794392#comment-17794392 ] David Handermann commented on NIFI-12236: - To help move things forward, I created a separate Jira issue NIFI-12492 for moving QuestDB to its own NAR and submitted a pull request for review. I realize that moving the components around would require rebasing the changes, but the goal is to address one of the concerns about continued maintenance of the QuestDB implementation. With the StatusHistoryRepository as an existing extension point, having a separate NAR makes it much easier to iterate and improve the implementation without changes to the framework itself. I was able to incorporate upgrading from QuestDB 7.2 to 7.3.7 for NIFI-12435, as the differences were minimal. The move to a separate NAR did not require any code changes, other than the minor adjustments for updating to QuestDB 7.3.7. Although not wanting to slow progress on the more fundamental improvements, moving the implementation to a separate NAR should address some general concerns and provide a better path forward. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794320#comment-17794320 ] David Handermann commented on NIFI-12236: - Regarding the particular point of moving QuestDB support to a separate NAR, it is important to note that the StatusHistoryRepository is already an extension point, so it is open to custom implementations. That highlights the importance of avoiding API changes for this specific implementation. That also means moving the QuestDB implementation to a separate NAR should be straightforward. On that point in particular, I would be willing to move the existing implementation to a separate NAR, which would help keep these changes more focused. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794312#comment-17794312 ] Joe Witt commented on NIFI-12236: - Regarding defaults - thank you. API Changes: These should get the highest degree of scrutiny by reviewers and the most effort and thought by authors. At this stage I do not understand the intention for the change. It appears that concern is shared. If there is a non-API/specific to this implementation option that should be pursued instead. Suggestion about the nar: I am not aware of other cases where the optional implementation of a thing is the light in terms of dependencies path. In this case the optional implementation (QuestDB backed stats) is the heavier one in terms of dependency/operations. By suggesting it be in its own nar I am not of the view it is a lot more work to be clear. I could be wrong. But the more work consideration is also not a strong one. The 'it is already in there...' argument is better since well - that is true. I'm just asking that the effort to do that be strongly considered because then those who would not use this option don't have to carry the component into deployment nor the potential vulnerabilities the libraries might bring. But those that want it are good to go and can use it. If it is baked into the framework nar this is not an option. As evidence of my concern regarding the vulnerability it does give pause that the version of QuestDB was not considered in this PR to this point. Keeping dependencies up to date is *everyones* responsibility. The NiFi 2.0 line is by the way super wildly close to being entirely free of static coding practice and dependency vulnerabilities (excluding hadoop related components) which is in itself a miracle. Please ensure the dependencies in this PR are up to date. On that note do we know what QuestDB version changes mean in terms of breaking changes, changes that might require users to do something to keep their state, etc..? I ask because we learned our lessons the hard way here with H2. H2 served us extremely well over the years but eventually it created some very complicated compelling events. Anyway - I dont want to come off like this thing can't happen. The root idea of this I think we all agree is useful. There is some debate on whether this implementation approach is desirable vs another but even that does not matter. He who does the work for a given implementation ultimately wins. What I am asking then though is please make that implementation as narrow and specific as possible to give others choice on whether they are exposed to it. That is the heart of the replies I am making. Thanks > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794244#comment-17794244 ] Simon Bence commented on NIFI-12236: Hei [~pvillard]! Thanks for mentioning the admin guide! I forgot to add the newly wired out properties indeed. I will add those during the review. 1. I think we agreed on the defaulting, with review changes it will be changed back as well 2. As this looks to be more a nuanced question and I already have a discussion about it on the PR with David Hendermann, I suggest to continue this topic there. I am open for reverting those changes but before just doing that, I would like to be sure that the intentions for the change are clear. 3. I am not sure what Joe meant under moving it into a nar. If it is the already mentioned pluggability (which I totally in line with in the long run) I think it is not something to add to this PR, which is more like a glorified refactor story. If it is about something else, could you please specifiy it [~joewitt] ? (Note: the actual QuestDB related code is in a separate jar. All the code remained in the `nifi-framework-core` is related to the actual repository implementation (counterpart of the Volatile implementation > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794214#comment-17794214 ] Pierre Villard commented on NIFI-12236: --- I definitely agree on the documentation aspect and it'd be nice to have the PR update the admin-guide with as much documentation as possible regarding this implementation (its value, its configuration, etc). There is already a bit here: [https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#persistent-repository] Regarding your points: {quote} # Avoid making this the default. It can be opted into.{quote} Definite agree here. Let's not change the default at this point. {quote}2. Avoid making nifi-api changes. {quote} I believe [~simonbence] gave the reasons as why the changes are making sense and since we're still working towards NiFi 2.0, I believe this is worth discussing but I believe the current work can also be done without API change if that's the real blocker. {quote}3. Move this into its own nar instead of the core framework nar. It can certainly still be included as it is today. {quote} Yeah, ideally every repository in NiFi could be a pluggable endpoint where a NAR can be provided with a given implementation. I'd see a lot of value there and I believe it's been mentioned a few times in the community. However, making the repositories a pluggable endpoint would likely be a significant effort and, in my opinion, it should be a follow up effort, not a prerequisite for the current work to be included (which is what you said, if I understood you correctly). > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793842#comment-17793842 ] Joe Witt commented on NIFI-12236: - I think it would help to better explain what the properties/tunables are. Maybe that is made available elsewhere but i'm not sure what batch size means for this and I am also curious whether the user can control how much data is retained and what the default is for that. A path forward that I think keeps things moving but doesn't create new risks currently not well enough explored 1. Avoid making this the default. It can be opted into. 2. Avoid making nifi-api changes. 3. Move this into its own nar instead of the core framework nar. It can certainly still be included as it is today. Remaining concerns can be addressed later and based on needs and findings from actual usage. Thanks > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793816#comment-17793816 ] Simon Bence commented on NIFI-12236: Thanks for the clarification! Let me give some context for the different questions. The idea: - The funcionality this provides almost identical to the original implementation not talking about error management. This is truth to the possible parametrization which however was not fully exposed to properties before. Now some of these settings (like the mentioned batch size) are exposed to provide the possibility to tune them in the case of need. The implementation options: - I completely agree with you on the part of in the long run it would be beneficial to have a variery of possible storage mechanism. Even so I think opening up the Status History Registry as a pluggable service would be a good step forward. It was not however the aim of this PR. The PR focused mainly on "bulletproofing" the existing implementation not extending the capabilities. The implementation: - I think this is the same case as the previous. The PR did not aim for extending the feature but make it more safe for production use. Personally I think the changes were added can be used as a good basis for future improvements but those should be the subject of further PRs. - In the PR I provided a short reasoning for the API changes to [~exceptionfactory] and I consider it as an ongoing discussion. The same result might be achieved without API changes but please find my thinking process there. I do not inist for this approach however I see some merits of it. - The Retry mechanism is actively used by the QuestDB implementation as part of error management. It is possible that I was not clear of that in the PR summary, but it is not something for a later commit. I separated it from the QuestDB related code in order to 1. have a more focused code structure 2. it is a possible tool for later usage within the project. The default selection: - This topic is already touched on the PR and I am completely okay with moving that part out from the commit. In the long term I would see benefits of making it a default (the mail thread started with this and I still have the same standing) but of course lets be sure about its safeness before moving on I hope I could give some useful information and clear some fog around the PR. Please if you still have conflicted feeling about the approach (and I deliberately do not use the phrase "final result", as I see some possible furute steps), share it. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793786#comment-17793786 ] Joe Witt commented on NIFI-12236: - I want to keep the different threads of concern clear. The idea: * I think we all agree there is good value in having these data points persisted across restarts. The example Pierre gives is a perfect example of why. * The reality though is our user interface is not designed for data to be held for long. It isnt clear to me from the properties mentioned in PR but what is the plan for helping the user configure how long this data is retained/queryable? I see batch size and frequency but not quite sure what those would mean relative to retention. Perhaps that is part of the earlier implementation. The implementation options: * Offer an embedded state storage mechanism. QuestDB is one example. A database per nifi instance or a database per cluster. This embedded/batteries included mode is quite convenient for the user but we then of course have to do it quite thoroughly and consider upgrade scenarios. I think our pucker factor here is far higher given the challenges we've had to work through from H2 in the recent year or two. * Offer the ability to connect to/use a database of the user choosing. Defer installation/durability/security of that database to the user as part of their normal database operations/etc.. This I find is more in-line with deployment styles we see in the Cloud, or automated with Ansible, or how one would deploy in K8S. This implementation: * It is a questdb per nifi node. Does not address the other mode should a user want that. Of note when we offered the Zookeeper embedded mode we also offered to connect to a real zookeeper install. That was to reflect the likely non-prod vs prod usage and I think that holds here as well. * It has a change to the nifi-api. We should strongly avoid any such API changes as part of this activity unless that change has value to various other components and the purpose/meaning of that change is very clear. * It includes a Retry mechanism that the PR suggests might be addressed in a later commit. I don't quite follow what that really means but I recommend making this implementation as simple/straight forward as possible. The default selection: * We should not be changing the default until this model is proven to be stable and recoverable in the same manner to the current in memory implementation. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793687#comment-17793687 ] Pierre Villard commented on NIFI-12236: --- I'd like to add my 2ct here: This feature is already used by some of the NiFi users and has proved to be extremely useful when it comes to debug problematic situations. It's really not about providing a long term monitoring solution but more about being able to troubleshoot things in case something bad happened. In particular, when something bad happens and a node is restarted, all of the data, with the current default implementation is lost, and this is making things really complicated for debugging/troubleshooting. Regarding Datadog/Prometheus, as someone passionate about NiFi's monitoring, I definitely agree that anyone using NiFi in production should be deploying NiFi and have those solutions for the monitoring of the NiFi service. However, what we're talking about here is very different (in my opinion). We all know that NiFi users are most of the time using NiFi in a very "multi tenant" approach (ie. many different use cases / process groups running in the same NiFi environment), while the Prometheus endpoint and the reporting tasks are great to report high level monitoring metrics to tools where you'll have very advanced dashboarding capabilities, it's a completely different story to have "per use case" monitoring. And even if you're sending everything into something like Prometheus, doing the monitoring dashboards per use case will be quite some work. And I think this would be the same if we say: please provide your database and we'll push data there and then it's up to you to build dashboards on top of that. As far as I'm concerned and just to give a single example, when trying to look at performance optimizations in a flow, I may want to look at the graph for a specific processor showing the metrics average task duration / average lineage duration. This level of detail is something that would be quite hard to get if not offered by NiFi. Having a capability persisting all of this data across restarts is really useful. While I can definitely accept the concerns around making it the default for NiFi 2.0, I think the work in the corresponding PR is a tremendous amount of work that would be valuable for the NiFi users. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793614#comment-17793614 ] Simon Bence commented on NIFI-12236: Hi [~joewitt]! Thanks for your attention on the topic! My understanding at the time was that your main concerns were about the implementation's possible implications on the core functionality of NiFi such as handlink different kinds of errors like corrupted database or insufficient disk space. So the main focus of the change was to rule out possible situations where an unexpected situation with QuestDb could prevent core functionality from working properly. I think this part is aimed reasonably and are covered with tests. As for visualisation, I did not aim for replacing that part however it sound a good idea to follow the changes on the backend. From my perspective the main benefit is to have some persistent storage behind is to be able to access statistics after NiFi has stopped (or shutdown abruptly). I think in the long run the opportunity to "plug in" different kind of statistics repositories just like with processors could (and maybe should) be opened, but I did not aim for that this time. I consider this as a step forward which might be used as a stepping stone for future developments, including non-embedded solutions for persisting and other things. In many environments having Datatod or even MySQL backend might be more satifsfying (again: with this story I did not aim for that which does not closes out the possibility of furute steps towards that either from me or other member of the community) but I think, having an embedded solution comes with the merit of adding the capability of persistent statistical information out of the box. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793412#comment-17793412 ] Joe Witt commented on NIFI-12236: - I should clarify as part of that I think was confusing... Ultimately we should just allow them to have a database instance that they run as per normal. All such desired DBs have installers/tools/to have a healthy instance setup. Then the nifi side would be configured to talk to it for sending/querying. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-12236) Improving fault tolerancy of the QuestDB backed metrics repository
[ https://issues.apache.org/jira/browse/NIFI-12236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17793409#comment-17793409 ] Joe Witt commented on NIFI-12236: - I went back to quickly review the presumably referenced mailing list thread titled 'NiFi 2.0 - QuestDB'. The concerns I noted then remain for me. It would certainly be nice for the users to have access to persistent and longer duration metric data. That said I'm not sure our user interface for this is very good when it comes to holding this information and making interesting visualizations of it for longer durations. For the type of usage the in memory durations we can reasonably sustain today seem about right. What I am seeing more and more is that users of NiFi want this type of information available in their favorite monitoring or observability tool whether that is Datadog or Prometheus or something else. If our goal is a short term but durable store then perhaps we ought to give them a simple QuestDB based service/process they can run on some node. Then their NiFi nodes are configured to send/query metrics from that service rather than it having to live on every node. This also means it would be better externalized such that maybe they dont even use Quest but instead MySQL or Postgre or whatever they prefer. In a k8s based deployment I can certainly see such a model working well. For users that are looking for more robust data retention and query and analysis we're better off focusing on getting the data to their preferred tools. > Improving fault tolerancy of the QuestDB backed metrics repository > -- > > Key: NIFI-12236 > URL: https://issues.apache.org/jira/browse/NIFI-12236 > Project: Apache NiFi > Issue Type: Improvement > Components: Core Framework >Reporter: Simon Bence >Assignee: Simon Bence >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Based on the related discussion on the dev email list, the QuestDB handling > of the metrics repository needs to be improved to have better fault tolerance > in order to be possible to use as a viable option for default metrics data > store. This should primarily focus on handling unexpeted database events like > corrupted database or loss of space on the disk. Any issues should be handled > with an attempt to keep the database service healthy but in case of that is > impossible, the priority is to keep NiFi and the core services running, even > with the price of metrics collection / presentation outage. -- This message was sent by Atlassian Jira (v8.20.10#820010)