Re: Record-oriented DetectDuplicate?

2019-02-16 Thread Mike Thomsen
Andrew, Mark, etc. A new contributor alerted me on Jira that he did his own take on this processor. I encouraged him to join the dev list so we can discuss the use case in more depth and sort out what is the best way forward. See https://issues.apache.org/jira/browse/NIFI-6047 I'll give him a

Re: Record-oriented DetectDuplicate?

2019-02-09 Thread Mike Thomsen
PR if anyone is interested: https://github.com/apache/nifi/pull/3298 On Fri, Feb 8, 2019 at 5:34 PM Mike Thomsen wrote: > With Redis and HBase you can set a TTL on the data itself in the lookup > table. Were you thinking something more than that? > > On Fri, Feb 8, 2019 at 4:42 PM Andrew

Re: Record-oriented DetectDuplicate?

2019-02-08 Thread Mike Thomsen
With Redis and HBase you can set a TTL on the data itself in the lookup table. Were you thinking something more than that? On Fri, Feb 8, 2019 at 4:42 PM Andrew Grande wrote: > Can I suggest a time-based option for specifying the window? I think we > only mentioned the number of records. > >

Re: Record-oriented DetectDuplicate?

2019-02-08 Thread Andrew Grande
Can I suggest a time-based option for specifying the window? I think we only mentioned the number of records. Andrew On Fri, Feb 8, 2019, 8:22 AM Mike Thomsen wrote: > Thanks. That answers it succinctly for me. I'll build out a > DetectDuplicateRecord processor to handle this. > > On Fri, Feb

Re: Record-oriented DetectDuplicate?

2019-02-08 Thread Mike Thomsen
Thanks. That answers it succinctly for me. I'll build out a DetectDuplicateRecord processor to handle this. On Fri, Feb 8, 2019 at 11:17 AM Mark Payne wrote: > Matt, > > That would work if you want to select distinct records in a given FlowFIle > but not across FlowFiles. > PartitionRecord ->

Re: Record-oriented DetectDuplicate?

2019-02-08 Thread Mark Payne
Matt, That would work if you want to select distinct records in a given FlowFIle but not across FlowFiles. PartitionRecord -> UpdateAttribute (optionally to combine multiple attributes into one) -> DetectDuplicate would work, but given that you expect the records to be unique generally, this

Re: Record-oriented DetectDuplicate?

2019-02-08 Thread Mark Payne
We do not. I've thought about it, but I have not had a chance to put any work towards it. My vision of how it would work would be to allow user to specify N number of RecordPath values as user-defined properties. Then have those values extracted out and another Record would be considered a

Re: Record-oriented DetectDuplicate?

2019-02-08 Thread Matt Burgess
Mike, I don't think so, but you could try a SELECT DISTINCT in QueryRecord, might be a bit of a pain if you want to select all columns and there are lots of them. Alternatively you could try PartitionRecord -> QueryRecord (select * limit 1). Neither PartitionRecord nor QueryRecord keeps state so

Record-oriented DetectDuplicate?

2019-02-08 Thread Mike Thomsen
Do we have anything like DetectDuplicate for the Record API already? Didn't see anything, but wanted to ask before reinventing the wheel. Thanks, Mike