Mike, would love to see a post about that solution you've mentioned.
Thinking about doing something similar for my company and using
Elastic/Kibana or Grafana

On Wed, Jul 25, 2018 at 9:17 PM Mike Thomsen <[email protected]> wrote:

> Ryan,
>
> Understandable. We haven't found a need for Beats or Forwarders here
> either because S2S gives everything you need to reliably ship the data.
>
> FWIW, if your need changes, I would recommend stripping down the
> provenance data. We cut out about 66-75% of the fields and dropped the
> intermediate records in favor of keeping DROP events for our simple
> dashboarding needs because we figured if a record never made it there
> something very bad happened.
>
> On Wed, Jul 25, 2018 at 8:54 PM Ryan H <[email protected]>
> wrote:
>
>> Thanks Mike for the suggestion on it. I'm looking for a solution that
>> doesn't involve the additional components such as any
>> Beats/Forwarders/Elasticsearch/etc.
>>
>> Boris, thanks for the link for the Monitoring introduction--I've checked
>> it out multiple times. What I want to avoid is having the need for anything
>> to be set on the Canvas and have the metrics collection via the rest api.
>> I'm thinking that the api in the original question may be the way to go,
>> but unsure of it without a little more information on the data model and
>> how that data is collected/aggregated (such as what the data returned
>> actually represents). I may just dig into the source if this email goes
>> stale.
>>
>> -Ryan
>>
>>
>> On Wed, Jul 25, 2018 at 9:17 AM, Boris Tyukin <[email protected]>
>> wrote:
>>
>>> Ryan, if you have not seen these posts from Pierre, I suggest
>>> starting there. He does a good job explaining different options
>>> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/
>>>
>>> I do agree that 5 minute thing is super confusing and pretty useless and
>>> you cannot change that interval. I think it is only useful to check quickly
>>> on your real-time pipelines at the moment.
>>>
>>> I wish NiFi provided nicer out of the box logging/monitoring
>>> capabilities but on a bright side, it seems to me that you can build your
>>> own and customize it as you want.
>>>
>>>
>>> On Tue, Jul 24, 2018 at 10:55 PM Ryan H <
>>> [email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am looking for a way to obtain the total amount of data that has been
>>>> processed by a running cluster for a period of time, ideally via the rest
>>>> api.
>>>>
>>>> Example of my use case:
>>>> I have say 50 different process groups, each that have a connection to
>>>> some data source. Each one is continuously pulling data in, doing something
>>>> to it, then sending it out to some other external place. I'd like to
>>>> programmatically gather some metrics about the amount of data flowing thru
>>>> the cluster as a whole (everything that is running across the cluster).
>>>>
>>>> It looks like the following api may be the solution, but I am curious
>>>> about some of the properties:
>>>> "nifi-api/flow/process-groups/root/status?recursive=true".
>>>>
>>>> Looking at the data model (as defined in the rest api documentation)
>>>> and the actual data that is returned, my questions are:
>>>> 1. Would this be the correct way to obtain this information?
>>>> 2. And if so, I'm not sure which properties to look at as it isn't
>>>> immediately clear to me the difference between some of them. Example being
>>>> "bytesSent" vs "bytesOut".
>>>> 3. How is this data updated? It looks like a lot of these metrics are
>>>> supposed to updated every 5 minutes. So would it be that the info I would
>>>> get now is what was collected from the last 5 minute interval and would
>>>> stay the same until the next 5 minute interval? And does the data aggregate
>>>> or is it only representative of a single 5 minute period? Something else?
>>>>
>>>>
>>>>
>>>> {
>>>>     "processGroupStatus": {
>>>>             ...
>>>>     "aggregateSnapshot": {
>>>>     ...
>>>>     "flowFilesIn": 0,
>>>>     "bytesIn": 0,
>>>>     "input": "value",
>>>>     "flowFilesQueued": 0,
>>>>     "bytesQueued": 0,
>>>>     "queued": "value",
>>>>     "queuedCount": "value",
>>>>     "queuedSize": "value",
>>>>     "bytesRead": 0,
>>>>     "read": "value",
>>>>     "bytesWritten": 0,
>>>>     "written": "value",
>>>>     "flowFilesOut": 0,
>>>>     "bytesOut": 0,
>>>>     "output": "value",
>>>>     "flowFilesTransferred": 0,
>>>>     "bytesTransferred": 0,
>>>>     "transferred": "value",
>>>>     "bytesReceived": 0,    // I think this is the one, but not sure
>>>>     "flowFilesReceived": 0,
>>>>     "received": "value",
>>>>     "bytesSent": 0,   // I think this is the other one, but not sure
>>>>     "flowFilesSent": 0,
>>>>     "sent": "value",
>>>>     "activeThreadCount": 0,
>>>>     "terminatedThreadCount": 0
>>>> },
>>>>     "nodeSnapshots": [{…}]
>>>> },
>>>>     "canRead": true
>>>> }
>>>>
>>>>
>>>>
>>>>
>>>> Any help or insight is always appreciated!
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Ryan H.
>>>>
>>>>
>>>>
>>

Reply via email to