Re: Proposal: standard record metadata attributes for data sources

2018-05-24 Thread Otto Fowler
I commented on the PR, but I’ll add this to the thread here. Wouldn’t something like this lend itself to a ReportingTask? If not the current structure, a like structure for records? That would allow the destination to do time series analysis etc. That is not saying there isn’t a case to have it

Re: Proposal: standard record metadata attributes for data sources

2018-05-24 Thread Mike Thomsen
I wrote a processor that's inspired by one of the Groovy scripts we use at that client. PR is here if anyone wants to take a look: https://github.com/apache/nifi/pull/2737 It's called "RecordStats" and provides both a general record count attribute and lets you specify record path operations to

Re: Proposal: standard record metadata attributes for data sources

2018-05-15 Thread Koji Kawamura
Hi Mike, I agree with the approach that enrich provenance events. In order to do so, we can use several places to embed meta-data: - FlowFile attributes: automatically mapped to a provenance event, but as Andy mentioned, we need to be careful not to put sensitive data. - Transit URI: when I

Re: Proposal: standard record metadata attributes for data sources

2018-05-14 Thread Andy LoPresto
Maybe an ADDINFO event or FORK event could be used and a new flowfile with the relevant attributes/content could be created. The flowfiles would be linked, but the “sensitive” information wouldn’t travel with the original. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP

Re: Proposal: standard record metadata attributes for data sources

2018-05-14 Thread Mike Thomsen
Does the provenance system have the ability to add user-defined key/value pairs to a flowfile's provenance record at a particular processor? On Mon, May 14, 2018 at 6:11 PM Andy LoPresto wrote: > I would actually propose that this is added to the provenance but not >

Re: Proposal: standard record metadata attributes for data sources

2018-05-14 Thread Andy LoPresto
I would actually propose that this is added to the provenance but not always put into the flowfile attributes. There are many scenarios in which the data retrieval should be separated from the analysis/follow-on, both for visibility, responsibility, and security concerns. While I understand a

Re: Proposal: standard record metadata attributes for data sources

2018-05-13 Thread Mike Thomsen
@Joe @Matt This is kinda related to the point that Joe made in the graph DB thread about provenance. My thought here was that we need some standards on enriching the metadata about what was fetched so that no matter how you store the provenance, you can find some way to query it for questions