I commented on the PR, but I’ll add this to the thread here.
Wouldn’t something like this lend itself to a ReportingTask? If not the
current structure, a like structure
for records?
That would allow the destination to do time series analysis etc.
That is not saying there isn’t a case to have it
I wrote a processor that's inspired by one of the Groovy scripts we use at
that client. PR is here if anyone wants to take a look:
https://github.com/apache/nifi/pull/2737
It's called "RecordStats" and provides both a general record count
attribute and lets you specify record path operations to
Hi Mike,
I agree with the approach that enrich provenance events. In order to
do so, we can use several places to embed meta-data:
- FlowFile attributes: automatically mapped to a provenance event, but
as Andy mentioned, we need to be careful not to put sensitive data.
- Transit URI: when I
Maybe an ADDINFO event or FORK event could be used and a new flowfile with the
relevant attributes/content could be created. The flowfiles would be linked,
but the “sensitive” information wouldn’t travel with the original.
Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP
Does the provenance system have the ability to add user-defined key/value
pairs to a flowfile's provenance record at a particular processor?
On Mon, May 14, 2018 at 6:11 PM Andy LoPresto wrote:
> I would actually propose that this is added to the provenance but not
>
I would actually propose that this is added to the provenance but not always
put into the flowfile attributes. There are many scenarios in which the data
retrieval should be separated from the analysis/follow-on, both for visibility,
responsibility, and security concerns. While I understand a
@Joe @Matt
This is kinda related to the point that Joe made in the graph DB thread
about provenance. My thought here was that we need some standards on
enriching the metadata about what was fetched so that no matter how you
store the provenance, you can find some way to query it for questions