Mike, would love to see a post about that solution you've mentioned. Thinking about doing something similar for my company and using Elastic/Kibana or Grafana
On Wed, Jul 25, 2018 at 9:17 PM Mike Thomsen <[email protected]> wrote: > Ryan, > > Understandable. We haven't found a need for Beats or Forwarders here > either because S2S gives everything you need to reliably ship the data. > > FWIW, if your need changes, I would recommend stripping down the > provenance data. We cut out about 66-75% of the fields and dropped the > intermediate records in favor of keeping DROP events for our simple > dashboarding needs because we figured if a record never made it there > something very bad happened. > > On Wed, Jul 25, 2018 at 8:54 PM Ryan H <[email protected]> > wrote: > >> Thanks Mike for the suggestion on it. I'm looking for a solution that >> doesn't involve the additional components such as any >> Beats/Forwarders/Elasticsearch/etc. >> >> Boris, thanks for the link for the Monitoring introduction--I've checked >> it out multiple times. What I want to avoid is having the need for anything >> to be set on the Canvas and have the metrics collection via the rest api. >> I'm thinking that the api in the original question may be the way to go, >> but unsure of it without a little more information on the data model and >> how that data is collected/aggregated (such as what the data returned >> actually represents). I may just dig into the source if this email goes >> stale. >> >> -Ryan >> >> >> On Wed, Jul 25, 2018 at 9:17 AM, Boris Tyukin <[email protected]> >> wrote: >> >>> Ryan, if you have not seen these posts from Pierre, I suggest >>> starting there. He does a good job explaining different options >>> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/ >>> >>> I do agree that 5 minute thing is super confusing and pretty useless and >>> you cannot change that interval. I think it is only useful to check quickly >>> on your real-time pipelines at the moment. >>> >>> I wish NiFi provided nicer out of the box logging/monitoring >>> capabilities but on a bright side, it seems to me that you can build your >>> own and customize it as you want. >>> >>> >>> On Tue, Jul 24, 2018 at 10:55 PM Ryan H < >>> [email protected]> wrote: >>> >>>> Hi All, >>>> >>>> I am looking for a way to obtain the total amount of data that has been >>>> processed by a running cluster for a period of time, ideally via the rest >>>> api. >>>> >>>> Example of my use case: >>>> I have say 50 different process groups, each that have a connection to >>>> some data source. Each one is continuously pulling data in, doing something >>>> to it, then sending it out to some other external place. I'd like to >>>> programmatically gather some metrics about the amount of data flowing thru >>>> the cluster as a whole (everything that is running across the cluster). >>>> >>>> It looks like the following api may be the solution, but I am curious >>>> about some of the properties: >>>> "nifi-api/flow/process-groups/root/status?recursive=true". >>>> >>>> Looking at the data model (as defined in the rest api documentation) >>>> and the actual data that is returned, my questions are: >>>> 1. Would this be the correct way to obtain this information? >>>> 2. And if so, I'm not sure which properties to look at as it isn't >>>> immediately clear to me the difference between some of them. Example being >>>> "bytesSent" vs "bytesOut". >>>> 3. How is this data updated? It looks like a lot of these metrics are >>>> supposed to updated every 5 minutes. So would it be that the info I would >>>> get now is what was collected from the last 5 minute interval and would >>>> stay the same until the next 5 minute interval? And does the data aggregate >>>> or is it only representative of a single 5 minute period? Something else? >>>> >>>> >>>> >>>> { >>>> "processGroupStatus": { >>>> ... >>>> "aggregateSnapshot": { >>>> ... >>>> "flowFilesIn": 0, >>>> "bytesIn": 0, >>>> "input": "value", >>>> "flowFilesQueued": 0, >>>> "bytesQueued": 0, >>>> "queued": "value", >>>> "queuedCount": "value", >>>> "queuedSize": "value", >>>> "bytesRead": 0, >>>> "read": "value", >>>> "bytesWritten": 0, >>>> "written": "value", >>>> "flowFilesOut": 0, >>>> "bytesOut": 0, >>>> "output": "value", >>>> "flowFilesTransferred": 0, >>>> "bytesTransferred": 0, >>>> "transferred": "value", >>>> "bytesReceived": 0, // I think this is the one, but not sure >>>> "flowFilesReceived": 0, >>>> "received": "value", >>>> "bytesSent": 0, // I think this is the other one, but not sure >>>> "flowFilesSent": 0, >>>> "sent": "value", >>>> "activeThreadCount": 0, >>>> "terminatedThreadCount": 0 >>>> }, >>>> "nodeSnapshots": [{…}] >>>> }, >>>> "canRead": true >>>> } >>>> >>>> >>>> >>>> >>>> Any help or insight is always appreciated! >>>> >>>> >>>> Cheers, >>>> >>>> Ryan H. >>>> >>>> >>>> >>
