Re: NiFi Data Usage via Rest API

2018-07-26 Thread Mark Payne
Ryan, That is correct. Would just clarify that when you say "SEND events are when they are leaving the system" -- the data is being sent to an external system, but it is not being dropped from NiFi. So you could send the data to 10 different places. A "DROP" event indicates that NiFi is now

Re: NiFi Data Usage via Rest API

2018-07-26 Thread Ryan H
Hi Mark, Thanks for the explanation on this; this is what I was looking for. So it sounds like Provenance info is the way to go (as mentioned by Mike [thanks Mike]). I will have to do a little more research on the Provenance events, but it sounds like RECEIVE events are for when something is

Re: NiFi Data Usage via Rest API

2018-07-26 Thread Mark Payne
Hey Ryan, The stats that you are seeing here is a rolling 5-minute window. The "bytesReceived" indicates the number of bytes that were received from external systems (i.e., the number of bytes reported as Provenance RECEIVE events). The "bytesSent' indicates the number of bytes that were sent

Re: NiFi Data Usage via Rest API

2018-07-26 Thread Ryan H
Hi Matt, The use case that I am investigating is fairly simplistic (and I may be naive about it). I am only looking for the amount of data that has came in to the cluster (across all PG's) and out of the cluster for a given time period (or a way to derive based on a time period). I do not want to

Re: NiFi Data Usage via Rest API

2018-07-26 Thread Mike Thomsen
Matt, Our main use, which provenance data handles well, is figuring out **what** data was handled. We drop everything but DROP out of convenience because we have no known scenarios where data will be removed before it reaches the end of the flow. FWIW, this is what inspired the record stats

Re: NiFi Data Usage via Rest API

2018-07-25 Thread Matt Burgess
Mike, Ryan, Boris et al, I'd like to wrap my head around the kinds of use cases y'all have for provenance data in NiFi: what's good, what's bad, what we need to do to make things better. Are there questions you want to ask of provenance that you can't today? Do the DROP events give you what you

Re: NiFi Data Usage via Rest API

2018-07-25 Thread Boris Tyukin
Mike, would love to see a post about that solution you've mentioned. Thinking about doing something similar for my company and using Elastic/Kibana or Grafana On Wed, Jul 25, 2018 at 9:17 PM Mike Thomsen wrote: > Ryan, > > Understandable. We haven't found a need for Beats or Forwarders here >

Re: NiFi Data Usage via Rest API

2018-07-25 Thread Ryan H
Thanks Mike for the suggestion on it. I'm looking for a solution that doesn't involve the additional components such as any Beats/Forwarders/Elasticsearch/etc. Boris, thanks for the link for the Monitoring introduction--I've checked it out multiple times. What I want to avoid is having the need

Re: NiFi Data Usage via Rest API

2018-07-25 Thread Boris Tyukin
Ryan, if you have not seen these posts from Pierre, I suggest starting there. He does a good job explaining different options https://pierrevillard.com/2017/05/11/monitoring-nifi-introduction/ I do agree that 5 minute thing is super confusing and pretty useless and you cannot change that

Re: NiFi Data Usage via Rest API

2018-07-25 Thread Mike Thomsen
I have a client with a similar use case. They wanted to be able to figure out when they processed a particular data set (they're using batch processing with NiFi). The solution I gave them was based on using Metrics and Provenance reporting to the ELK stack. I know that doesn't directly answer

NiFi Data Usage via Rest API

2018-07-24 Thread Ryan H
Hi All, I am looking for a way to obtain the total amount of data that has been processed by a running cluster for a period of time, ideally via the rest api. Example of my use case: I have say 50 different process groups, each that have a connection to some data source. Each one is continuously