Yes, that's what I had in mind, using a CQD as the syntax: UPSERT handling aligned format non-aligned format ------------------------------ -------------------- ------------------------------- default behavior replace row replace row (create all values) Phoenix behavior (via CQD): transform to MERGE insert only specified cols (*)
(*) One issue here is with "default current". In that case we may also need to transform the statement into a MERGE. >From a performance point of view, the "default behavior" would work better for aligned format, the Phoenix behavior would work better for non-aligned format. In some cases it won't matter. Selva's code will detect many of these and automatically choose the faster implementation. Thanks, Hans On Tue, Mar 15, 2016 at 3:41 PM, Dave Birdsall <[email protected]> wrote: > <Cringe> Not sure we want the logical semantics of an operation to depend > on the physical layout of the row. > > > > Would be better to have different syntax for each. With an explanation > that one works faster on one format, and the other faster on the other > format. > > > > *From:* Eric Owhadi [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 3:38 PM > > *To:* [email protected] > *Subject:* RE: Upsert semantics > > > > Would there be a problem if we implemented the phoenix semantic for non > align format, and the upsert semantic proposed by Hans in align format? > > This would allow speed optimization without having the user to know about > subtle differences? > > Eric > > > > > > *From:* Anoop Sharma [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 5:14 PM > *To:* [email protected] > *Subject:* RE: Upsert semantics > > > > Phoenix has upsert command and from what can tell, they originally came up > with upsert syntax. > > Their semantic is to insert if not present and update the specified > columns with the specified values if present. > > We did do an experiment and upsert only updates the specified columns. > > Maybe we can add a cqd so full row update vs. specified column update > behavior could be chosen. > > > > Here is their specification. > > Inserts if not present and updates otherwise the value in the table. The > list of columns is optional and if not present, the values will map to the > column in the order they are declared in the schema. The values must > evaluate to constants. > > Example: > > UPSERT INTO TEST VALUES('foo','bar',3); > UPSERT INTO TEST(NAME,ID) VALUES('foo',123); > > > > > > *From:* Dave Birdsall [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 2:55 PM > *To:* [email protected] > *Subject:* RE: Upsert semantics > > > > Hi, > > > > It seems that when ANSI first added MERGE to the standard, it was > portrayed as “upsert” (see https://en.wikipedia.org/wiki/Merge_(SQL)). > > > > I agree though that we are free to define our UPSERT to mean anything we > want. > > > > I like what you suggest. Since our UPSERT syntax already specifies values > for all the columns, it makes sense for it to have “replace” semantics. > That is, if the row exists, replace it with all the new stuff (with > defaults for columns omitted). If the row doesn’t exist, it’s just a > straight insert (with defaults for omitted columns). > > > > And if one really wants UPDATE semantics as opposed to “replace” > semantics, then the ANSI MERGE statement (which Trafodion also supports) is > the way to go. > > > > There is an analogy to this in linguistic theory. Whenever a language has > two words that at a point in time mean the same thing, there is a tendency > for the meanings to change over time so they diverge. For example, English > “shirt” and “skirt”, originally from the same root, but one via Anglo-Saxon > the other via Old Norse. > > > > Dave > > > > > > *From:* Hans Zeller [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 2:40 PM > *To:* [email protected] > *Subject:* Upsert semantics > > > > Hi, > > > > Here is a question on how we should define the meaning of an UPSERT > statement. UPSERT is not part of the ISO/ANSI SQL standard, so we have some > leeway to define it. > > > > My personal feeling is that UPSERT should either insert a brand-new row or > it should completely replace an existing row, but it should never combine > columns from a new and an existing row. If users want the latter then they > should use the MERGE command. > > > > We should probably follow what other DBMSs do. I could not yet find a DBMS > that had an UPSERT command, although there probably is one. > > > > - PostgreSQL: Has an insert with a conflict clause, similar to our > MERGE: http://www.postgresql.org/docs/current/static/sql-insert.html > > > - MySQL: Has an insert with ON DUPLICATE KEY clause, similar to our > MERGE: http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > > > - Teradata: Has an update ... else insert ... command, similar to > MERGE: > https://forums.teradata.com/forum/enterprise/problem-using-upsert > > > - Oracle just seems to have the MERGE statement and various methods to > do upsert in PL/SQL: > > http://stackoverflow.com/questions/237327/oracle-how-to-upsert-update-or-insert-into-a-table > > This seems to support indirectly what I'm proposing. If we want to merge > old and new row then we should use syntax specifying how to merge, which is > what the other DBMSs have done. > > > > See also the discussion in > https://github.com/apache/incubator-trafodion/pull/380. I wanted to see > whether the user group has any input on this. > > > > Thanks, > > > Hans >
