Re: Upsert semantics

Hans Zeller Tue, 15 Mar 2016 16:01:48 -0700

Yes, that's what I had in mind, using a CQD as the syntax:

UPSERT handling                 aligned format        non-aligned format
------------------------------  --------------------
 -------------------------------
default behavior                replace row           replace row (create
all values)
Phoenix behavior (via CQD):     transform to MERGE    insert only specified
cols (*)


(*) One issue here is with "default current". In that case we may also need
to transform the statement into a MERGE.

>From a performance point of view, the "default behavior" would work better
for aligned format, the Phoenix behavior would work better for non-aligned
format.

In some cases it won't matter. Selva's code will detect many of these and
automatically choose the faster implementation.

Thanks,

Hans

On Tue, Mar 15, 2016 at 3:41 PM, Dave Birdsall <[email protected]>
wrote:

> <Cringe> Not sure we want the logical semantics of an operation to depend
> on the physical layout of the row.
>
>
>
> Would be better to have different syntax for each. With an explanation
> that one works faster on one format, and the other faster on the other
> format.
>
>
>
> *From:* Eric Owhadi [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 3:38 PM
>
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Would there be a problem if we implemented the phoenix semantic for non
> align format, and the  upsert semantic proposed by Hans in align format?
>
> This would allow speed optimization without having the user to know about
> subtle differences?
>
> Eric
>
>
>
>
>
> *From:* Anoop Sharma [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 5:14 PM
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Phoenix has upsert command and from what can tell, they originally came up
> with upsert syntax.
>
> Their semantic is to insert if not present and update the specified
> columns with the specified values if present.
>
> We did do an experiment and upsert only updates the specified columns.
>
> Maybe we can add a cqd so full row update vs. specified column update
> behavior could be chosen.
>
>
>
> Here is their specification.
>
> Inserts if not present and updates otherwise the value in the table. The
> list of columns is optional and if not present, the values will map to the
> column in the order they are declared in the schema. The values must
> evaluate to constants.
>
> Example:
>
> UPSERT INTO TEST VALUES('foo','bar',3);
> UPSERT INTO TEST(NAME,ID) VALUES('foo',123);
>
>
>
>
>
> *From:* Dave Birdsall [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 2:55 PM
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Hi,
>
>
>
> It seems that when ANSI first added MERGE to the standard, it was
> portrayed as “upsert” (see https://en.wikipedia.org/wiki/Merge_(SQL)).
>
>
>
> I agree though that we are free to define our UPSERT to mean anything we
> want.
>
>
>
> I like what you suggest. Since our UPSERT syntax already specifies values
> for all the columns, it makes sense for it to have “replace” semantics.
> That is, if the row exists, replace it with all the new stuff (with
> defaults for columns omitted). If the row doesn’t exist, it’s just a
> straight insert (with defaults for omitted columns).
>
>
>
> And if one really wants UPDATE semantics as opposed to “replace”
> semantics, then the ANSI MERGE statement (which Trafodion also supports) is
> the way to go.
>
>
>
> There is an analogy to this in linguistic theory. Whenever a language has
> two words that at a point in time mean the same thing, there is a tendency
> for the meanings to change over time so they diverge. For example, English
> “shirt” and “skirt”, originally from the same root, but one via Anglo-Saxon
> the other via Old Norse.
>
>
>
> Dave
>
>
>
>
>
> *From:* Hans Zeller [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 2:40 PM
> *To:* [email protected]
> *Subject:* Upsert semantics
>
>
>
> Hi,
>
>
>
> Here is a question on how we should define the meaning of an UPSERT
> statement. UPSERT is not part of the ISO/ANSI SQL standard, so we have some
> leeway to define it.
>
>
>
> My personal feeling is that UPSERT should either insert a brand-new row or
> it should completely replace an existing row, but it should never combine
> columns from a new and an existing row. If users want the latter then they
> should use the MERGE command.
>
>
>
> We should probably follow what other DBMSs do. I could not yet find a DBMS
> that had an UPSERT command, although there probably is one.
>
>
>
>    - PostgreSQL: Has an insert with a conflict clause, similar to our
>    MERGE: http://www.postgresql.org/docs/current/static/sql-insert.html
>
>
>    - MySQL: Has an insert with ON DUPLICATE KEY clause, similar to our
>    MERGE: http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
>
>
>    - Teradata: Has an update ... else insert ... command, similar to
>    MERGE:
>    https://forums.teradata.com/forum/enterprise/problem-using-upsert
>
>
>    - Oracle just seems to have the MERGE statement and various methods to
>    do upsert in PL/SQL:
>    
> http://stackoverflow.com/questions/237327/oracle-how-to-upsert-update-or-insert-into-a-table
>
> This seems to support indirectly what I'm proposing. If we want to merge
> old and new row then we should use syntax specifying how to merge, which is
> what the other DBMSs have done.
>
>
>
> See also the discussion in
> https://github.com/apache/incubator-trafodion/pull/380. I wanted to see
> whether the user group has any input on this.
>
>
>
> Thanks,
>
>
> Hans
>

Re: Upsert semantics

Reply via email to