Andrew, Mark, etc. A new contributor alerted me on Jira that he did his own take on this processor. I encouraged him to join the dev list so we can discuss the use case in more depth and sort out what is the best way forward.
See https://issues.apache.org/jira/browse/NIFI-6047 I'll give him a little while to join and announce he's ready to go over it before I move forward with a discussion on this. On Sat, Feb 9, 2019 at 12:34 PM Mike Thomsen <mikerthom...@gmail.com> wrote: > PR if anyone is interested: > > https://github.com/apache/nifi/pull/3298 > > On Fri, Feb 8, 2019 at 5:34 PM Mike Thomsen <mikerthom...@gmail.com> > wrote: > >> With Redis and HBase you can set a TTL on the data itself in the lookup >> table. Were you thinking something more than that? >> >> On Fri, Feb 8, 2019 at 4:42 PM Andrew Grande <apere...@gmail.com> wrote: >> >>> Can I suggest a time-based option for specifying the window? I think we >>> only mentioned the number of records. >>> >>> Andrew >>> >>> On Fri, Feb 8, 2019, 8:22 AM Mike Thomsen <mikerthom...@gmail.com> >>> wrote: >>> >>>> Thanks. That answers it succinctly for me. I'll build out a >>>> DetectDuplicateRecord processor to handle this. >>>> >>>> On Fri, Feb 8, 2019 at 11:17 AM Mark Payne <marka...@hotmail.com> >>>> wrote: >>>> >>>>> Matt, >>>>> >>>>> That would work if you want to select distinct records in a given >>>>> FlowFIle but not across FlowFiles. >>>>> PartitionRecord -> UpdateAttribute (optionally to combine multiple >>>>> attributes into one) -> DetectDuplicate >>>>> would work, but given that you expect the records to be unique >>>>> generally, this would have the effect of >>>>> splitting each FlowFile into Record-per-FlowFile, which is certainly >>>>> not ideal. >>>>> >>>>> Thanks >>>>> -Mark >>>>> >>>>> >>>>> > On Feb 8, 2019, at 11:14 AM, Matt Burgess <mattyb...@apache.org> >>>>> wrote: >>>>> > >>>>> > Mike, >>>>> > >>>>> > I don't think so, but you could try a SELECT DISTINCT in QueryRecord, >>>>> > might be a bit of a pain if you want to select all columns and there >>>>> > are lots of them. >>>>> > >>>>> > Alternatively you could try PartitionRecord -> QueryRecord (select * >>>>> > limit 1). Neither PartitionRecord nor QueryRecord keeps state so >>>>> you'd >>>>> > likely need to use distributed cache or UpdateAttribute. >>>>> > >>>>> > Regards, >>>>> > Matt >>>>> > >>>>> > On Fri, Feb 8, 2019 at 11:08 AM Mike Thomsen <mikerthom...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> Do we have anything like DetectDuplicate for the Record API >>>>> already? Didn't see anything, but wanted to ask before reinventing the >>>>> wheel. >>>>> >> >>>>> >> Thanks, >>>>> >> >>>>> >> Mike >>>>> >>>>>