Re: Upsert semantics

Hans Zeller Tue, 15 Mar 2016 17:58:57 -0700

Again, IMHO that's the wrong way to go, but I hope others will chime in.
Dave gave the best reason, it's a bad idea to make the semantics of UPSERT
depend on the internal format. Here is what I would suggest, using Selva's
table (proposed changes in red - hope Apache won't mangle them):


                                 Aligned Format                    Aligned
format with            Non-Aligned with                     Non-Aligned with

                                   With no omitted
                   omitted columns               with no omitted
                      omitted current default

                                    columns
                                                        / omitted
non-current columns



CQD off                          Replaces row
MERGE                         Replace the given columns
MERGE

CQD on (default)             Replaces row                          Replaces
row                Replace all columns                  Replace all columns


Hans

On Tue, Mar 15, 2016 at 5:36 PM, Selva Govindarajan <
[email protected]> wrote:

> I believe phoenix doesn’t support insert semantics or the non-null default
> value columns.  Trafodion supports insert, upsert, non-null default value
> columns as well as current default values like current timestamp and
> current user.
>
>
>
> Upsert handling in Trafodion is same as phoenix for non-aligned format.
> For aligned format it can be controlled via CQD.
>
>
>
>                                  Aligned Format                    Aligned
> format with            Non-Aligned with                     Non-Aligned with
>
>                                    With no omitted
>                    omitted columns               with no omitted
>                       omitted current default
>
>                                     columns
>                                                         / omitted
> non-current columns
>
>
>
> Default behavior             Replaces row
> MERGE                         Replace the given columns
> MERGE
>
> With the CQD                 Replaces row
> Replaces row                Replace the given columns                 MERGE
>
>
>          set to on
>
>
>
> The CQD to be used is TRAF_UPSERT_WITH_INSERT_DEFAULT_SEMANTICS (Default
> is off). In short, this CQD is a no-op for non-aligned format.
>
>
>
> The behavior of the non-aligned format can’t be controlled by the CQD
> because we don’t store values for the omitted columns in hbase and hence
> when the user switches the CQD settings for upserts with different sets of
> omitted columns, we could end up with non-deterministic values for these
> columns.
>
> For eq. upsert with the cqd set to ‘on’ with a set of omitted columns
>
> Upsert with the cqd set to ‘off’ with a different set of omitted columns
>
> If we switch to insert all column values all the time for non-aligned
> format, then we can let user to control what value needs to be put in for
> the omitted column.
>
>
>
> Selva
>
>
>
> *From:* Hans Zeller [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 4:01 PM
> *To:* [email protected]
> *Subject:* Re: Upsert semantics
>
>
>
> Yes, that's what I had in mind, using a CQD as the syntax:
>
>
>
> UPSERT handling                 aligned format        non-aligned format
>
> ------------------------------  --------------------
>  -------------------------------
>
> default behavior                replace row           replace row (create
> all values)
>
> Phoenix behavior (via CQD):     transform to MERGE    insert only
> specified cols (*)
>
>
>
> (*) One issue here is with "default current". In that case we may also
> need to transform the statement into a MERGE.
>
>
>
> From a performance point of view, the "default behavior" would work better
> for aligned format, the Phoenix behavior would work better for non-aligned
> format.
>
>
>
> In some cases it won't matter. Selva's code will detect many of these and
> automatically choose the faster implementation.
>
>
>
> Thanks,
>
>
> Hans
>
>
>
> On Tue, Mar 15, 2016 at 3:41 PM, Dave Birdsall <[email protected]>
> wrote:
>
> <Cringe> Not sure we want the logical semantics of an operation to depend
> on the physical layout of the row.
>
>
>
> Would be better to have different syntax for each. With an explanation
> that one works faster on one format, and the other faster on the other
> format.
>
>
>
> *From:* Eric Owhadi [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 3:38 PM
>
>
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Would there be a problem if we implemented the phoenix semantic for non
> align format, and the  upsert semantic proposed by Hans in align format?
>
> This would allow speed optimization without having the user to know about
> subtle differences?
>
> Eric
>
>
>
>
>
> *From:* Anoop Sharma [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 5:14 PM
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Phoenix has upsert command and from what can tell, they originally came up
> with upsert syntax.
>
> Their semantic is to insert if not present and update the specified
> columns with the specified values if present.
>
> We did do an experiment and upsert only updates the specified columns.
>
> Maybe we can add a cqd so full row update vs. specified column update
> behavior could be chosen.
>
>
>
> Here is their specification.
>
> Inserts if not present and updates otherwise the value in the table. The
> list of columns is optional and if not present, the values will map to the
> column in the order they are declared in the schema. The values must
> evaluate to constants.
>
> Example:
>
> UPSERT INTO TEST VALUES('foo','bar',3);
> UPSERT INTO TEST(NAME,ID) VALUES('foo',123);
>
>
>
>
>
> *From:* Dave Birdsall [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 2:55 PM
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Hi,
>
>
>
> It seems that when ANSI first added MERGE to the standard, it was
> portrayed as “upsert” (see https://en.wikipedia.org/wiki/Merge_(SQL)).
>
>
>
> I agree though that we are free to define our UPSERT to mean anything we
> want.
>
>
>
> I like what you suggest. Since our UPSERT syntax already specifies values
> for all the columns, it makes sense for it to have “replace” semantics.
> That is, if the row exists, replace it with all the new stuff (with
> defaults for columns omitted). If the row doesn’t exist, it’s just a
> straight insert (with defaults for omitted columns).
>
>
>
> And if one really wants UPDATE semantics as opposed to “replace”
> semantics, then the ANSI MERGE statement (which Trafodion also supports) is
> the way to go.
>
>
>
> There is an analogy to this in linguistic theory. Whenever a language has
> two words that at a point in time mean the same thing, there is a tendency
> for the meanings to change over time so they diverge. For example, English
> “shirt” and “skirt”, originally from the same root, but one via Anglo-Saxon
> the other via Old Norse.
>
>
>
> Dave
>
>
>
>
>
> *From:* Hans Zeller [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 2:40 PM
> *To:* [email protected]
> *Subject:* Upsert semantics
>
>
>
> Hi,
>
>
>
> Here is a question on how we should define the meaning of an UPSERT
> statement. UPSERT is not part of the ISO/ANSI SQL standard, so we have some
> leeway to define it.
>
>
>
> My personal feeling is that UPSERT should either insert a brand-new row or
> it should completely replace an existing row, but it should never combine
> columns from a new and an existing row. If users want the latter then they
> should use the MERGE command.
>
>
>
> We should probably follow what other DBMSs do. I could not yet find a DBMS
> that had an UPSERT command, although there probably is one.
>
>
>
>    - PostgreSQL: Has an insert with a conflict clause, similar to our
>    MERGE: http://www.postgresql.org/docs/current/static/sql-insert.html
>
>
>    - MySQL: Has an insert with ON DUPLICATE KEY clause, similar to our
>    MERGE: http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
>
>
>    - Teradata: Has an update ... else insert ... command, similar to
>    MERGE:
>    https://forums.teradata.com/forum/enterprise/problem-using-upsert
>
>
>    - Oracle just seems to have the MERGE statement and various methods to
>    do upsert in PL/SQL:
>    
> http://stackoverflow.com/questions/237327/oracle-how-to-upsert-update-or-insert-into-a-table
>
> This seems to support indirectly what I'm proposing. If we want to merge
> old and new row then we should use syntax specifying how to merge, which is
> what the other DBMSs have done.
>
>
>
> See also the discussion in
> https://github.com/apache/incubator-trafodion/pull/380. I wanted to see
> whether the user group has any input on this.
>
>
>
> Thanks,
>
>
> Hans
>
>
>

Re: Upsert semantics

Reply via email to