The easiest way to do this would be to create a UNIQUE constraint on the
project name and just send one insert at a time. Then each individual
failed insert will get routed to failure.

For the sake of safety here, if you have multiple flows that feed into a
common SQL ingest point, you might want to put this functional on a
separate flow so that you can assume "failure = duplicate according to the
UNIQUE constraint."

On Wed, Feb 20, 2019 at 7:56 AM Adam Fisher <[email protected]> wrote:

> Maybe you could use something like SplitRecord, DetectDuplicate,
> MergeRecord to get the file how you want it. This would split it into
> smaller FlowFiles, check if the record has been seen before and keep only
> unique ones and then merge them back into one file.
>
> I'm actually collaborating with someone on a DetectDuplicateRecord
> processor so eventually you won't need to break your input into separate
> files. This is expected to come out in a later release of NiFi.
>
>
> Adam
>
> On Wed, Feb 20, 2019, 1:08 AM Beutel, Maximilian <
> [email protected] wrote:
>
>> Hello!
>>
>>
>>
>> Since I’m new to Nifi I’m still trying to wrap my head around certain
>> best practices.
>>
>>
>>
>> My use case is as follows: I need to ingest a list of projects via http
>> API. The API returns a list in a format like:
>>
>>
>>
>> [
>>
>> { id: "1", name: "Project A"},
>>
>> { id: "2",name: "Project B"}
>>
>> ]
>>
>>
>>
>> Now, the problem with this API is, it always returns the _*full*_ list
>> of projects, no delta. So if since the last run some more projects are
>> added, it will included project id 1, 2 also, like so:
>>
>>
>>
>> [
>>
>> { id: "1", name: "Project A"},
>>
>> { id: "2",name: "Project B"},
>>
>> { id: "3",name: "Project C"}
>>
>> ]
>>
>>
>>
>>
>>
>> I need to insert this list in a database table, and I thought I use nifi.
>> My initial flow looked like:
>>
>>
>>
>> InvokeHTTP (fetch JSON HTTP response)
>> into JoltTransformJSON (make the returned response a bit nicer)
>>
>> into ConvertJSONToSQL (insert the whole lot into db)
>>
>>
>>
>> The issue I have is the insert statement in ConvertJSONToSQL will fail,
>> since the database table has a unique key on the project name field and the
>> payload I want to insert will always include all projects, including some
>> which are already there.
>>
>>
>>
>> My question is how to people usually handle such a use case in Nifi? I
>> can either think of filtering the API response against the list of already
>> existing project names (not sure how), but I would rather do an insert
>> ignore or something that just ignores duplicate record errors  - which is
>> not supported by ConvertJSONToSQL as far as I’m aware.
>>
>>
>>
>> Maybe I’m also approaching this problem from the wrong side, so I would
>> be grateful to receive feedback/recommendations.
>>
>>
>>
>> Thanks!
>>
>> Max
>>
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic
>> communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>
>> ==============================================================================
>>
>

Reply via email to