Ryan, Understandable. We haven't found a need for Beats or Forwarders here either because S2S gives everything you need to reliably ship the data.
FWIW, if your need changes, I would recommend stripping down the provenance data. We cut out about 66-75% of the fields and dropped the intermediate records in favor of keeping DROP events for our simple dashboarding needs because we figured if a record never made it there something very bad happened. On Wed, Jul 25, 2018 at 8:54 PM Ryan H <[email protected]> wrote: > Thanks Mike for the suggestion on it. I'm looking for a solution that > doesn't involve the additional components such as any > Beats/Forwarders/Elasticsearch/etc. > > Boris, thanks for the link for the Monitoring introduction--I've checked > it out multiple times. What I want to avoid is having the need for anything > to be set on the Canvas and have the metrics collection via the rest api. > I'm thinking that the api in the original question may be the way to go, > but unsure of it without a little more information on the data model and > how that data is collected/aggregated (such as what the data returned > actually represents). I may just dig into the source if this email goes > stale. > > -Ryan > > > On Wed, Jul 25, 2018 at 9:17 AM, Boris Tyukin <[email protected]> > wrote: > >> Ryan, if you have not seen these posts from Pierre, I suggest >> starting there. He does a good job explaining different options >> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/ >> >> I do agree that 5 minute thing is super confusing and pretty useless and >> you cannot change that interval. I think it is only useful to check quickly >> on your real-time pipelines at the moment. >> >> I wish NiFi provided nicer out of the box logging/monitoring capabilities >> but on a bright side, it seems to me that you can build your own and >> customize it as you want. >> >> >> On Tue, Jul 24, 2018 at 10:55 PM Ryan H < >> [email protected]> wrote: >> >>> Hi All, >>> >>> I am looking for a way to obtain the total amount of data that has been >>> processed by a running cluster for a period of time, ideally via the rest >>> api. >>> >>> Example of my use case: >>> I have say 50 different process groups, each that have a connection to >>> some data source. Each one is continuously pulling data in, doing something >>> to it, then sending it out to some other external place. I'd like to >>> programmatically gather some metrics about the amount of data flowing thru >>> the cluster as a whole (everything that is running across the cluster). >>> >>> It looks like the following api may be the solution, but I am curious >>> about some of the properties: >>> "nifi-api/flow/process-groups/root/status?recursive=true". >>> >>> Looking at the data model (as defined in the rest api documentation) and >>> the actual data that is returned, my questions are: >>> 1. Would this be the correct way to obtain this information? >>> 2. And if so, I'm not sure which properties to look at as it isn't >>> immediately clear to me the difference between some of them. Example being >>> "bytesSent" vs "bytesOut". >>> 3. How is this data updated? It looks like a lot of these metrics are >>> supposed to updated every 5 minutes. So would it be that the info I would >>> get now is what was collected from the last 5 minute interval and would >>> stay the same until the next 5 minute interval? And does the data aggregate >>> or is it only representative of a single 5 minute period? Something else? >>> >>> >>> >>> { >>> "processGroupStatus": { >>> ... >>> "aggregateSnapshot": { >>> ... >>> "flowFilesIn": 0, >>> "bytesIn": 0, >>> "input": "value", >>> "flowFilesQueued": 0, >>> "bytesQueued": 0, >>> "queued": "value", >>> "queuedCount": "value", >>> "queuedSize": "value", >>> "bytesRead": 0, >>> "read": "value", >>> "bytesWritten": 0, >>> "written": "value", >>> "flowFilesOut": 0, >>> "bytesOut": 0, >>> "output": "value", >>> "flowFilesTransferred": 0, >>> "bytesTransferred": 0, >>> "transferred": "value", >>> "bytesReceived": 0, // I think this is the one, but not sure >>> "flowFilesReceived": 0, >>> "received": "value", >>> "bytesSent": 0, // I think this is the other one, but not sure >>> "flowFilesSent": 0, >>> "sent": "value", >>> "activeThreadCount": 0, >>> "terminatedThreadCount": 0 >>> }, >>> "nodeSnapshots": [{…}] >>> }, >>> "canRead": true >>> } >>> >>> >>> >>> >>> Any help or insight is always appreciated! >>> >>> >>> Cheers, >>> >>> Ryan H. >>> >>> >>> >
