Re: Upsert

Matt Burgess Tue, 22 Aug 2017 14:48:25 -0700

If you expect more updates for a given row than its insert, you can do the 
logic backwards, the first time the row is seen it will fail but all updates 
should go right through :)



> On Aug 22, 2017, at 3:56 PM, Austin Duncan <[email protected]> wrote:
> 
> I think its working? i havent had a file fail yet to check so im just letting 
> it run but it should work. I just am updating any files that fail and routing 
> it back to the putdatabaserecord processor. In my head it seems like that 
> should work.
> 
>> On Tue, Aug 22, 2017 at 3:12 PM, Matt Burgess <[email protected]> wrote:
>> It's not ideal, that's for sure :) However "upsert" is far from a standard 
>> operation (no matter what the DB docs tell you lol). One thing that might be 
>> a viable improvement is to implement upsert as a "try insert, then try 
>> update" as the workaround does. Depending on the actual reason for flowfile 
>> failure, this might not be any better, and then the improvement would be to 
>> choose a dialect and implement upsert where possible, falling back to either 
>> failure if unsupported, or the try-insert-then-update behavior. It adds lots 
>> of complexity and still might not solve all cases for ACID DBs.
>> 
>> Regards,
>> Matt
>> 
>>> On Aug 22, 2017, at 2:06 PM, Austin Duncan <[email protected]> wrote:
>>> 
>>> That's what we had originally but we felt it was kind of hacky. 
>>> 
>>>> On Tue, Aug 22, 2017 at 2:05 PM, Matt Burgess <[email protected]> wrote:
>>>> If the order of operations doesn't matter then you could route flow files 
>>>> that failed because the record existed back to an UpdateAttribute that 
>>>> sets the statement.type to update and try them again. Not sure what 
>>>> additional logic is involved (I'm not at my computer ATM) but that's where 
>>>> PutDatabaseRecord offers an advantage.
>>>> 
>>>> Regards,
>>>> Matt
>>>> 
>>>> 
>>>>> On Aug 22, 2017, at 1:48 PM, Austin Duncan <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> All of the records have the same schema though I think. It's all the same 
>>>>> kind of data and the same format. We were originally using a PutSql 
>>>>> processor and were having trouble with it. The current setup actually 
>>>>> runs pretty well im just trying to figure out how to do an insert and 
>>>>> then if the data already exists in the table via the rfidnumber do an 
>>>>> update instead.
>>>>> 
>>>>>> On Tue, Aug 22, 2017 at 12:05 PM, Matt Burgess <[email protected]> 
>>>>>> wrote:
>>>>>> Your use case will be more complex, because you need to form SQL from 
>>>>>> your JSON. You can try ConvertJSONToSQL and then ReplaceText to add the 
>>>>>> ON CONFLICT to the end.
>>>>>> 
>>>>>> Also if you are using SplitJson then PutDatabaseRecord won't be as 
>>>>>> efficient as it could be; it is meant to work on a number of records in 
>>>>>> a single flow file with the same schema. In your case you might be 
>>>>>> better off with PutSQL as it won't require schemas or statement types. 
>>>>>> The Split -> convert to SQL -> PutSQL pattern was the original way to do 
>>>>>> things, but for multiple records with the same schema, PutDatabaseRecord 
>>>>>> was added.
>>>>>> 
>>>>>> Regards,
>>>>>> Matt
>>>>>> 
>>>>>>> On Aug 22, 2017, at 11:43 AM, Austin Duncan <[email protected]> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> The example you listed sounds like a solution I am just having trouble 
>>>>>>> fully understanding. I want to make my own sql so that i can do the 
>>>>>>> "insert into on conflict"
>>>>>>>  I am just having trouble really understanding what it is that I have 
>>>>>>> to do so that the query and the data will be understood by the 
>>>>>>> processor. Do I need different schemas? One like I sent and one for the 
>>>>>>> query? 
>>>>>>> 
>>>>>>>> On Tue, Aug 22, 2017 at 11:38 AM, Matt Burgess <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> If your incoming data is already in fields (JSON, e.g.) and not a SQL
>>>>>>>> statement, then the statement.type should be "insert" rather than
>>>>>>>> "sql". The "sql" type is for passing in explicit SQL statements to
>>>>>>>> execute rather than taking a record and generating the appropriate SQL
>>>>>>>> statement from the fields and the statement.type.
>>>>>>>> 
>>>>>>>> If you are trying to generate your own SQL and execute that, then
>>>>>>>> you'd want to try something like the solution I outlined before, where
>>>>>>>> you put the SQL in a field such as "statement", set statement.type to
>>>>>>>> "sql" and Field Containing SQL to "statement".
>>>>>>>> 
>>>>>>>> Does that make sense? Or am I misunderstanding what you are trying to 
>>>>>>>> do?
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Matt
>>>>>>>> 
>>>>>>>> On Tue, Aug 22, 2017 at 11:32 AM, Austin Duncan
>>>>>>>> <[email protected]> wrote:
>>>>>>>> > Matt,
>>>>>>>> >
>>>>>>>> > I am using JsonPathReader and using the 'Schema Text' Property with 
>>>>>>>> > a schema
>>>>>>>> > defined in there. I could never figure out how to use any of the 
>>>>>>>> > other
>>>>>>>> > Access Strategies. It's an inventory system so we are extracting the 
>>>>>>>> > Json
>>>>>>>> > and splitting it into a json file for each row. Then using this 
>>>>>>>> > schema: {
>>>>>>>> >  "name": "insertSql",
>>>>>>>> >  "type": "record",
>>>>>>>> >  "fields": [
>>>>>>>> >   {
>>>>>>>> >    "name": "RfidNumber",
>>>>>>>> >    "type": "string"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >    "name": "CabinetName",
>>>>>>>> >    "type": "string"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >    "name": "ItemNumber",
>>>>>>>> >    "type": "string"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >    "name": "LotNumber",
>>>>>>>> >    "type": "string"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >    "name": "PurchaseOrderNumber",
>>>>>>>> >    "type": "string"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >    "name": "PurchaseOrderPrice",
>>>>>>>> >    "type": "float"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >    "name": "SupplierId",
>>>>>>>> >    "type": "string"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >    "name": "SupplierName",
>>>>>>>> >    "type": "string"
>>>>>>>> >   },
>>>>>>>> >   {
>>>>>>>> >     "name": "updatedate",
>>>>>>>> >     "type": "string"
>>>>>>>> >     }
>>>>>>>> >  ]
>>>>>>>> > }
>>>>>>>> > I inserted the data into the table. I am not 100% on the uses of 
>>>>>>>> > schemas so
>>>>>>>> > I am not quite sure what you mean by using a schema to define the 
>>>>>>>> > query.
>>>>>>>> >
>>>>>>>> > On Tue, Aug 22, 2017 at 11:23 AM, Matt Burgess 
>>>>>>>> > <[email protected]> wrote:
>>>>>>>> >>
>>>>>>>> >> Austin,
>>>>>>>> >>
>>>>>>>> >> What are you using for a record reader and schema for
>>>>>>>> >> PutDatabaseRecord?  In order to execute SQL using PutDatabaseRecord,
>>>>>>>> >> you have to specify a "Field containing SQL", and the incoming
>>>>>>>> >> record(s) must have a field with that name. The value of that field
>>>>>>>> >> (for each record) will be executed.
>>>>>>>> >>
>>>>>>>> >> What I've done in the past is to put the whole SQL statement in a 
>>>>>>>> >> JSON
>>>>>>>> >> doc: {"query": "INSERT INTO table (column1 ,column2, column3, 
>>>>>>>> >> column4)
>>>>>>>> >> VALUES() ON CONFLICT (rfidnumber) DO UPDATE"} then I set Field
>>>>>>>> >> Containing SQL to "query", and use a JsonPathReader specifying a
>>>>>>>> >> "query" field with a path of $.query, or just a JsonTreeReader, 
>>>>>>>> >> either
>>>>>>>> >> reader using a schema with a single string field called "query".
>>>>>>>> >>
>>>>>>>> >> IMO an improvement would be nice to add a SQLReader that could split
>>>>>>>> >> on newlines or semicolons or whatever, and each "record" would 
>>>>>>>> >> contain
>>>>>>>> >> a SQL statement with the specified field name. That would save the
>>>>>>>> >> trouble of having to temporarily convert your SQL statement(s) into 
>>>>>>>> >> a
>>>>>>>> >> format that an existing Reader can recognize.
>>>>>>>> >>
>>>>>>>> >> Regards,
>>>>>>>> >> Matt
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Tue, Aug 22, 2017 at 11:12 AM, Austin Duncan
>>>>>>>> >> <[email protected]> wrote:
>>>>>>>> >> > In my flow I am pulling data from a Json, splitting the Json and 
>>>>>>>> >> > then
>>>>>>>> >> > inserting that into a postgres table using the putdatabaserecord
>>>>>>>> >> > processor.
>>>>>>>> >> > I have been using the insert statement option and it has been 
>>>>>>>> >> > working
>>>>>>>> >> > fine
>>>>>>>> >> > but now I am trying to figure out how to do a INSERT INTO table ON
>>>>>>>> >> > CONFLICT
>>>>>>>> >> > UPDATE statement. I have the statement.type attribute set to SQL 
>>>>>>>> >> > and am
>>>>>>>> >> > trying to do the query:
>>>>>>>> >> >
>>>>>>>> >> > INSERT INTO table (column1 ,column2, column3, column4)
>>>>>>>> >> >
>>>>>>>> >> > VALUES()
>>>>>>>> >> >
>>>>>>>> >> > ON CONFLICT (rfidnumber) DO UPDATE;
>>>>>>>> >> >
>>>>>>>> >> > I am getting the error 'Record schema does not contain filed 
>>>>>>>> >> > containing
>>>>>>>> >> > SQL'. So two th   Any help would be appreciated.
>>>>>>>> >> >
>>>>>>>> >> > Thanks,
>>>>>>>> >> >
>>>>>>>> >> > Austin
>>>>>>>> >
>>>>>>>> >
>>>>>>> 
>>>>> 
>>> 
>

Re: Upsert

Reply via email to