Re: How can we validate a data flow.?

2016-08-15 Thread Matt Burgess
I've used a handful of techniques/scripts to get the data into an analysis tool:

1) NiFi REST API to get the Provenance data
2) Groovy script to transform into Apache Tinkerpop 3.x format (can
provide if desired)
3) Gremlin script to get the Tinkerpop file into Neo4J (can't find it
but probably a couple of lines of Gremlin code)
4) Cypher queries at Neo4J for analysis

The first two should be able to be ported to Apache NiFi proper, with
the SiteToSiteProvenanceReportingTask and ExecuteScript. The third
step might be a sticky wicket, but can be done with ExecuteScript,
adding Module Directories pointing to a Gremlin client install.
Depending on the target graph DB chosen, step 4 becomes the analysis
task at the reporting engine (Neo4J, Titan, OrientDB, e.g.). If graph
traversal/analysis of provenance data is desired, please feel free to
open feature Jiras to cover this (PutGraphSON, ExecuteGremlin, e.g.)
or we can discuss and I'll help with Jiras, etc. as needed.

If you are less interested in lineage and more interested in the
events themselves (OLAP e.g.), you can use the REST API to get the
same tabular information that is shown on the initial provenance page,
and perhaps normalize that into a star schema or something, for
querying later using SQL / Data Warehouse / OLAP techniques.

Regards,
Matt

On Mon, Aug 15, 2016 at 3:22 PM, saikrishnat  wrote:
> Hi,
> But still it would be very helpful if we have a reporting app or log on top
> of it. it is very hard to go in to provenance and click on each activity to
> see the details. and also if i want to check how did my flow run over the
> last week. it would be very difficult if i have hundreds of coming in.
> I was hoping someone gone thru similar situations and hoping to see how they
> did the tests.
>
>
>
>
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/How-can-we-validate-a-data-flow-tp13036p13046.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: How can we validate a data flow.?

2016-08-15 Thread saikrishnat
Hi,
But still it would be very helpful if we have a reporting app or log on top
of it. it is very hard to go in to provenance and click on each activity to
see the details. and also if i want to check how did my flow run over the
last week. it would be very difficult if i have hundreds of coming in.
I was hoping someone gone thru similar situations and hoping to see how they
did the tests.




--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/How-can-we-validate-a-data-flow-tp13036p13046.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: How can we validate a data flow.?

2016-08-12 Thread James Wing
NiFi has several features to provide monitoring and verification of your
data flow:

1.) Provenance Data (
https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#data-provenance)
NiFi's Provenance subsystem tracks every flowfile, how it was modified, and
how it exited NiFi.  It is the "log of all activities" you mentioned.  The
NiFi UI includes a tool for browsing the provenance data to find and
examine individual flowfiles.  However, I am not aware of a tool to compile
aggregate statistics from provenance data in the form you describe.  But
there is an API to query provenance records, and a Reporting Task to export
them to another NiFi for further processing.

REST API - https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
SiteToSiteProvenanceReportingTask -
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.reporting.SiteToSiteProvenanceReportingTask/index.html

2.) Controller Status Reporting
The 5-minute rolling window status that you see in the NiFi UI can also be
logged by the ControllerStatusReportingTask.  It is very easy to set up,
and the ongoing series of these log entries can provide basic monitoring
for the health of your flow.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.controller.ControllerStatusReportingTask/index.html

3.) Flow Design
It is also possible to design some monitoring and validation into the flow
itself.  For example, there is a MonitorActivity processor to detect an
absence of flowfiles.  Also, you mentioned MergeContent, which can output
both the merged and the original flowfiles, so you could compile a list of
IDs from the original files to double-check later.

MonitorActivity -
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MonitorActivity/index.html


Thanks,


James

On Fri, Aug 12, 2016 at 10:48 AM, saikrishnat  wrote:

> Hi,
> I need to find out if there is a way to validate the data flow end-to-end
> and best practises for it.
> lets say if i have thousands of files that i want to move to a different
> destinations based on some attributes and\or contents. after the job is
> finished i want a complete track of all files like how many files are
> processed successfully , how many files are failed and what are those. if i
> use Merge process what files are merged under each bin etc. so that it will
> be easy to compare the source to destination.
> can anything along these lines be done in NiFi.?? kind of log of all
> activities.?
>
> Regards,
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/How-can-we-validate-a-data-flow-tp13036.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>


How can we validate a data flow.?

2016-08-12 Thread saikrishnat
Hi,
I need to find out if there is a way to validate the data flow end-to-end
and best practises for it.
lets say if i have thousands of files that i want to move to a different
destinations based on some attributes and\or contents. after the job is
finished i want a complete track of all files like how many files are
processed successfully , how many files are failed and what are those. if i
use Merge process what files are merged under each bin etc. so that it will
be easy to compare the source to destination.
can anything along these lines be done in NiFi.?? kind of log of all
activities.?

Regards,




--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/How-can-we-validate-a-data-flow-tp13036.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.