Ryan,

Understandable. We haven't found a need for Beats or Forwarders here either
because S2S gives everything you need to reliably ship the data.

FWIW, if your need changes, I would recommend stripping down the provenance
data. We cut out about 66-75% of the fields and dropped the intermediate
records in favor of keeping DROP events for our simple dashboarding needs
because we figured if a record never made it there something very bad
happened.

On Wed, Jul 25, 2018 at 8:54 PM Ryan H <[email protected]>
wrote:

> Thanks Mike for the suggestion on it. I'm looking for a solution that
> doesn't involve the additional components such as any
> Beats/Forwarders/Elasticsearch/etc.
>
> Boris, thanks for the link for the Monitoring introduction--I've checked
> it out multiple times. What I want to avoid is having the need for anything
> to be set on the Canvas and have the metrics collection via the rest api.
> I'm thinking that the api in the original question may be the way to go,
> but unsure of it without a little more information on the data model and
> how that data is collected/aggregated (such as what the data returned
> actually represents). I may just dig into the source if this email goes
> stale.
>
> -Ryan
>
>
> On Wed, Jul 25, 2018 at 9:17 AM, Boris Tyukin <[email protected]>
> wrote:
>
>> Ryan, if you have not seen these posts from Pierre, I suggest
>> starting there. He does a good job explaining different options
>> https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/
>>
>> I do agree that 5 minute thing is super confusing and pretty useless and
>> you cannot change that interval. I think it is only useful to check quickly
>> on your real-time pipelines at the moment.
>>
>> I wish NiFi provided nicer out of the box logging/monitoring capabilities
>> but on a bright side, it seems to me that you can build your own and
>> customize it as you want.
>>
>>
>> On Tue, Jul 24, 2018 at 10:55 PM Ryan H <
>> [email protected]> wrote:
>>
>>> Hi All,
>>>
>>> I am looking for a way to obtain the total amount of data that has been
>>> processed by a running cluster for a period of time, ideally via the rest
>>> api.
>>>
>>> Example of my use case:
>>> I have say 50 different process groups, each that have a connection to
>>> some data source. Each one is continuously pulling data in, doing something
>>> to it, then sending it out to some other external place. I'd like to
>>> programmatically gather some metrics about the amount of data flowing thru
>>> the cluster as a whole (everything that is running across the cluster).
>>>
>>> It looks like the following api may be the solution, but I am curious
>>> about some of the properties:
>>> "nifi-api/flow/process-groups/root/status?recursive=true".
>>>
>>> Looking at the data model (as defined in the rest api documentation) and
>>> the actual data that is returned, my questions are:
>>> 1. Would this be the correct way to obtain this information?
>>> 2. And if so, I'm not sure which properties to look at as it isn't
>>> immediately clear to me the difference between some of them. Example being
>>> "bytesSent" vs "bytesOut".
>>> 3. How is this data updated? It looks like a lot of these metrics are
>>> supposed to updated every 5 minutes. So would it be that the info I would
>>> get now is what was collected from the last 5 minute interval and would
>>> stay the same until the next 5 minute interval? And does the data aggregate
>>> or is it only representative of a single 5 minute period? Something else?
>>>
>>>
>>>
>>> {
>>>     "processGroupStatus": {
>>>             ...
>>>     "aggregateSnapshot": {
>>>     ...
>>>     "flowFilesIn": 0,
>>>     "bytesIn": 0,
>>>     "input": "value",
>>>     "flowFilesQueued": 0,
>>>     "bytesQueued": 0,
>>>     "queued": "value",
>>>     "queuedCount": "value",
>>>     "queuedSize": "value",
>>>     "bytesRead": 0,
>>>     "read": "value",
>>>     "bytesWritten": 0,
>>>     "written": "value",
>>>     "flowFilesOut": 0,
>>>     "bytesOut": 0,
>>>     "output": "value",
>>>     "flowFilesTransferred": 0,
>>>     "bytesTransferred": 0,
>>>     "transferred": "value",
>>>     "bytesReceived": 0,    // I think this is the one, but not sure
>>>     "flowFilesReceived": 0,
>>>     "received": "value",
>>>     "bytesSent": 0,   // I think this is the other one, but not sure
>>>     "flowFilesSent": 0,
>>>     "sent": "value",
>>>     "activeThreadCount": 0,
>>>     "terminatedThreadCount": 0
>>> },
>>>     "nodeSnapshots": [{…}]
>>> },
>>>     "canRead": true
>>> }
>>>
>>>
>>>
>>>
>>> Any help or insight is always appreciated!
>>>
>>>
>>> Cheers,
>>>
>>> Ryan H.
>>>
>>>
>>>
>

Reply via email to