I think the SQL processors other than PutDatabaseRecord also support "upsert" functionality, so that might also help.
On Wed, Feb 20, 2019 at 8:33 AM Mike Thomsen <[email protected]> wrote: > The easiest way to do this would be to create a UNIQUE constraint on the > project name and just send one insert at a time. Then each individual > failed insert will get routed to failure. > > For the sake of safety here, if you have multiple flows that feed into a > common SQL ingest point, you might want to put this functional on a > separate flow so that you can assume "failure = duplicate according to the > UNIQUE constraint." > > On Wed, Feb 20, 2019 at 7:56 AM Adam Fisher <[email protected]> wrote: > >> Maybe you could use something like SplitRecord, DetectDuplicate, >> MergeRecord to get the file how you want it. This would split it into >> smaller FlowFiles, check if the record has been seen before and keep only >> unique ones and then merge them back into one file. >> >> I'm actually collaborating with someone on a DetectDuplicateRecord >> processor so eventually you won't need to break your input into separate >> files. This is expected to come out in a later release of NiFi. >> >> >> Adam >> >> On Wed, Feb 20, 2019, 1:08 AM Beutel, Maximilian < >> [email protected] wrote: >> >>> Hello! >>> >>> >>> >>> Since I’m new to Nifi I’m still trying to wrap my head around certain >>> best practices. >>> >>> >>> >>> My use case is as follows: I need to ingest a list of projects via http >>> API. The API returns a list in a format like: >>> >>> >>> >>> [ >>> >>> { id: "1", name: "Project A"}, >>> >>> { id: "2",name: "Project B"} >>> >>> ] >>> >>> >>> >>> Now, the problem with this API is, it always returns the _*full*_ list >>> of projects, no delta. So if since the last run some more projects are >>> added, it will included project id 1, 2 also, like so: >>> >>> >>> >>> [ >>> >>> { id: "1", name: "Project A"}, >>> >>> { id: "2",name: "Project B"}, >>> >>> { id: "3",name: "Project C"} >>> >>> ] >>> >>> >>> >>> >>> >>> I need to insert this list in a database table, and I thought I use >>> nifi. My initial flow looked like: >>> >>> >>> >>> InvokeHTTP (fetch JSON HTTP response) >>> into JoltTransformJSON (make the returned response a bit nicer) >>> >>> into ConvertJSONToSQL (insert the whole lot into db) >>> >>> >>> >>> The issue I have is the insert statement in ConvertJSONToSQL will fail, >>> since the database table has a unique key on the project name field and the >>> payload I want to insert will always include all projects, including some >>> which are already there. >>> >>> >>> >>> My question is how to people usually handle such a use case in Nifi? I >>> can either think of filtering the API response against the list of already >>> existing project names (not sure how), but I would rather do an insert >>> ignore or something that just ignores duplicate record errors - which is >>> not supported by ConvertJSONToSQL as far as I’m aware. >>> >>> >>> >>> Maybe I’m also approaching this problem from the wrong side, so I would >>> be grateful to receive feedback/recommendations. >>> >>> >>> >>> Thanks! >>> >>> Max >>> >>> >>> >>> ============================================================================== >>> Please access the attached hyperlink for an important electronic >>> communications disclaimer: >>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html >>> >>> ============================================================================== >>> >>
