Thanks Mike for the suggestion on it. I'm looking for a solution that doesn't involve the additional components such as any Beats/Forwarders/Elasticsearch/etc.
Boris, thanks for the link for the Monitoring introduction--I've checked it out multiple times. What I want to avoid is having the need for anything to be set on the Canvas and have the metrics collection via the rest api. I'm thinking that the api in the original question may be the way to go, but unsure of it without a little more information on the data model and how that data is collected/aggregated (such as what the data returned actually represents). I may just dig into the source if this email goes stale. -Ryan On Wed, Jul 25, 2018 at 9:17 AM, Boris Tyukin <[email protected]> wrote: > Ryan, if you have not seen these posts from Pierre, I suggest > starting there. He does a good job explaining different options > https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/ > > I do agree that 5 minute thing is super confusing and pretty useless and > you cannot change that interval. I think it is only useful to check quickly > on your real-time pipelines at the moment. > > I wish NiFi provided nicer out of the box logging/monitoring capabilities > but on a bright side, it seems to me that you can build your own and > customize it as you want. > > > On Tue, Jul 24, 2018 at 10:55 PM Ryan H <[email protected]> > wrote: > >> Hi All, >> >> I am looking for a way to obtain the total amount of data that has been >> processed by a running cluster for a period of time, ideally via the rest >> api. >> >> Example of my use case: >> I have say 50 different process groups, each that have a connection to >> some data source. Each one is continuously pulling data in, doing something >> to it, then sending it out to some other external place. I'd like to >> programmatically gather some metrics about the amount of data flowing thru >> the cluster as a whole (everything that is running across the cluster). >> >> It looks like the following api may be the solution, but I am curious >> about some of the properties: >> "nifi-api/flow/process-groups/root/status?recursive=true". >> >> Looking at the data model (as defined in the rest api documentation) and >> the actual data that is returned, my questions are: >> 1. Would this be the correct way to obtain this information? >> 2. And if so, I'm not sure which properties to look at as it isn't >> immediately clear to me the difference between some of them. Example being >> "bytesSent" vs "bytesOut". >> 3. How is this data updated? It looks like a lot of these metrics are >> supposed to updated every 5 minutes. So would it be that the info I would >> get now is what was collected from the last 5 minute interval and would >> stay the same until the next 5 minute interval? And does the data aggregate >> or is it only representative of a single 5 minute period? Something else? >> >> >> >> { >> "processGroupStatus": { >> ... >> "aggregateSnapshot": { >> ... >> "flowFilesIn": 0, >> "bytesIn": 0, >> "input": "value", >> "flowFilesQueued": 0, >> "bytesQueued": 0, >> "queued": "value", >> "queuedCount": "value", >> "queuedSize": "value", >> "bytesRead": 0, >> "read": "value", >> "bytesWritten": 0, >> "written": "value", >> "flowFilesOut": 0, >> "bytesOut": 0, >> "output": "value", >> "flowFilesTransferred": 0, >> "bytesTransferred": 0, >> "transferred": "value", >> "bytesReceived": 0, // I think this is the one, but not sure >> "flowFilesReceived": 0, >> "received": "value", >> "bytesSent": 0, // I think this is the other one, but not sure >> "flowFilesSent": 0, >> "sent": "value", >> "activeThreadCount": 0, >> "terminatedThreadCount": 0 >> }, >> "nodeSnapshots": [{…}] >> }, >> "canRead": true >> } >> >> >> >> >> Any help or insight is always appreciated! >> >> >> Cheers, >> >> Ryan H. >> >> >>
