I think the SQL processors other than PutDatabaseRecord also support
"upsert" functionality, so that might also help.

On Wed, Feb 20, 2019 at 8:33 AM Mike Thomsen <[email protected]> wrote:

> The easiest way to do this would be to create a UNIQUE constraint on the
> project name and just send one insert at a time. Then each individual
> failed insert will get routed to failure.
>
> For the sake of safety here, if you have multiple flows that feed into a
> common SQL ingest point, you might want to put this functional on a
> separate flow so that you can assume "failure = duplicate according to the
> UNIQUE constraint."
>
> On Wed, Feb 20, 2019 at 7:56 AM Adam Fisher <[email protected]> wrote:
>
>> Maybe you could use something like SplitRecord, DetectDuplicate,
>> MergeRecord to get the file how you want it. This would split it into
>> smaller FlowFiles, check if the record has been seen before and keep only
>> unique ones and then merge them back into one file.
>>
>> I'm actually collaborating with someone on a DetectDuplicateRecord
>> processor so eventually you won't need to break your input into separate
>> files. This is expected to come out in a later release of NiFi.
>>
>>
>> Adam
>>
>> On Wed, Feb 20, 2019, 1:08 AM Beutel, Maximilian <
>> [email protected] wrote:
>>
>>> Hello!
>>>
>>>
>>>
>>> Since I’m new to Nifi I’m still trying to wrap my head around certain
>>> best practices.
>>>
>>>
>>>
>>> My use case is as follows: I need to ingest a list of projects via http
>>> API. The API returns a list in a format like:
>>>
>>>
>>>
>>> [
>>>
>>> { id: "1", name: "Project A"},
>>>
>>> { id: "2",name: "Project B"}
>>>
>>> ]
>>>
>>>
>>>
>>> Now, the problem with this API is, it always returns the _*full*_ list
>>> of projects, no delta. So if since the last run some more projects are
>>> added, it will included project id 1, 2 also, like so:
>>>
>>>
>>>
>>> [
>>>
>>> { id: "1", name: "Project A"},
>>>
>>> { id: "2",name: "Project B"},
>>>
>>> { id: "3",name: "Project C"}
>>>
>>> ]
>>>
>>>
>>>
>>>
>>>
>>> I need to insert this list in a database table, and I thought I use
>>> nifi. My initial flow looked like:
>>>
>>>
>>>
>>> InvokeHTTP (fetch JSON HTTP response)
>>> into JoltTransformJSON (make the returned response a bit nicer)
>>>
>>> into ConvertJSONToSQL (insert the whole lot into db)
>>>
>>>
>>>
>>> The issue I have is the insert statement in ConvertJSONToSQL will fail,
>>> since the database table has a unique key on the project name field and the
>>> payload I want to insert will always include all projects, including some
>>> which are already there.
>>>
>>>
>>>
>>> My question is how to people usually handle such a use case in Nifi? I
>>> can either think of filtering the API response against the list of already
>>> existing project names (not sure how), but I would rather do an insert
>>> ignore or something that just ignores duplicate record errors  - which is
>>> not supported by ConvertJSONToSQL as far as I’m aware.
>>>
>>>
>>>
>>> Maybe I’m also approaching this problem from the wrong side, so I would
>>> be grateful to receive feedback/recommendations.
>>>
>>>
>>>
>>> Thanks!
>>>
>>> Max
>>>
>>>
>>>
>>> ==============================================================================
>>> Please access the attached hyperlink for an important electronic
>>> communications disclaimer:
>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>>
>>> ==============================================================================
>>>
>>

Reply via email to