Andrew, Mark, etc.
A new contributor alerted me on Jira that he did his own take on this
processor. I encouraged him to join the dev list so we can discuss the use
case in more depth and sort out what is the best way forward.
See https://issues.apache.org/jira/browse/NIFI-6047
I'll give him a
PR if anyone is interested:
https://github.com/apache/nifi/pull/3298
On Fri, Feb 8, 2019 at 5:34 PM Mike Thomsen wrote:
> With Redis and HBase you can set a TTL on the data itself in the lookup
> table. Were you thinking something more than that?
>
> On Fri, Feb 8, 2019 at 4:42 PM Andrew
With Redis and HBase you can set a TTL on the data itself in the lookup
table. Were you thinking something more than that?
On Fri, Feb 8, 2019 at 4:42 PM Andrew Grande wrote:
> Can I suggest a time-based option for specifying the window? I think we
> only mentioned the number of records.
>
>
Can I suggest a time-based option for specifying the window? I think we
only mentioned the number of records.
Andrew
On Fri, Feb 8, 2019, 8:22 AM Mike Thomsen wrote:
> Thanks. That answers it succinctly for me. I'll build out a
> DetectDuplicateRecord processor to handle this.
>
> On Fri, Feb
Thanks. That answers it succinctly for me. I'll build out a
DetectDuplicateRecord processor to handle this.
On Fri, Feb 8, 2019 at 11:17 AM Mark Payne wrote:
> Matt,
>
> That would work if you want to select distinct records in a given FlowFIle
> but not across FlowFiles.
> PartitionRecord ->
Matt,
That would work if you want to select distinct records in a given FlowFIle but
not across FlowFiles.
PartitionRecord -> UpdateAttribute (optionally to combine multiple attributes
into one) -> DetectDuplicate
would work, but given that you expect the records to be unique generally, this
We do not. I've thought about it, but I have not had a chance to put any work
towards it. My vision of how it would work would be to
allow user to specify N number of RecordPath values as user-defined properties.
Then have those values extracted out and another
Record would be considered a
Mike,
I don't think so, but you could try a SELECT DISTINCT in QueryRecord,
might be a bit of a pain if you want to select all columns and there
are lots of them.
Alternatively you could try PartitionRecord -> QueryRecord (select *
limit 1). Neither PartitionRecord nor QueryRecord keeps state so
Do we have anything like DetectDuplicate for the Record API already? Didn't
see anything, but wanted to ask before reinventing the wheel.
Thanks,
Mike