I wonder if you saw the Provenance Querying API. Though, it wasn't designed
for bulk dump of data, more for an interactive poking around.

If you want to proactively store everything in an external system, the S2S
provenance reporting task is the way to go, but it's up to you to filter
and make sense of all events as lineage then. Maybe peek into how NiFi
visualizes the graph for ideas?

Andrew

On Wed, Sep 28, 2016, 2:23 AM <[email protected]> wrote:

> Hello Manish
> Thx for the very helpful  answer , but I was thinking that this functional
> perimeter ( ie logging, storing transformations of data, data lineage ) was
> built in Nifi and available  through REST API  ...
> Or internal calls ...
> The point is that I am not ready to  hook devoted logging processors on
> every processor of my DF or on DF developed by others
> -  firstly , it is intrusive in the DF
> - secondly , it cannot be easily hooked with a template approach ..
> because it is very dependent of the chosen processors in the DF
>
>  Ideally (in  a very simple /naïve requirement)  I would like to run my DF
> taking again my example :
>  (File1 (in) --> Processor1 --> flow1 --> Processor2 --> flow2 --> File2
> (out))
> And then store all the stuff in a Database and says :
>
> getTrace  (Processor1, beforeProcessing)  -> returning ( Attributes ,
> flowfile)
> getTrace ( Processor2, afterProcessing)  ........................
>
> phil
> best regards
>
>
> -----Original Message-----
> From: Manish Gupta 8 [mailto:[email protected]]
> Sent: mardi 27 septembre 2016 16:46
> To: [email protected]
> Subject: RE: logging all transformed flowfiles
>
> Hi Phil,
>
> We are also doing a similar thing but not keeping all the content after
> each transformation externally. What we do is, only send the flow file
> attributes to an external storage (like file / Event Hub / Database/NoSQL)
> using AttributesToJSON processor and then send it for logging after every
> logical step where we want to log (after adding couple of additional
> details like - step name, #of rows in file, hascode etc.).
>
> For your scenario, I think you can simply clone the output relationship
> from each of your processors and send it to a single/multiple logging/sink
> processors. For keeping the lineage, you have couple of options:
> 1. Use different sink/folder/table for each step (with corresponding name)
> 2. Keep file name consistent to track the lineage 3. Modify the Flow file
> content to make sure you can track the lineage from the metadata content.
>
>
> Regards,
> Manish
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> Sent: Tuesday, September 27, 2016 7:33 PM
> To: [email protected]
> Subject: logging all transformed flowfiles
>
> Hello,
> My SW context : standalone  NiFi  1.0.0
>
> My Problem  : I would like to log all the different transformations
> applied to an initial file ( input) up to exiting the  DF ( output) :
> If imagine this simple DF :
> File1 (in) --> Processor1 --> flow1 --> Processor2 --> flow2 --> File2
> (out)
> I would like to store outside of Nifi  ( in my own  external DB) ->
>  File1, flow1, flow2, File2
> Are  there some simple  REST API to help to accomplish this ( I looked at
> Data provenance and SiteToSiteProvenanceReportingTask but not clearly found
> the right way to implement this) Any idea ?
>
> Phil
> Best regards
>
>
>
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
> recu ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.
>
>

Reply via email to