Re: Upsert semantics

Hans Zeller Tue, 15 Mar 2016 18:37:01 -0700

Thank you, Selva. The JIRA is
https://issues.apache.org/jira/browse/TRAFODION-1896.


Hans

On Tue, Mar 15, 2016 at 6:15 PM, Selva Govindarajan <
[email protected]> wrote:

> Hans,
>
>
>
> It didn’t occur to me your proposed change would work. I was always
> thinking we shouldn’t be adding the omitted columns in non-aligned format.
>   You can file a JIRA and I will fix it.
>
>
>
> Selva
>
>
>
> *From:* Anoop Sharma [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 6:03 PM
>
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> yes, one cqd to switch between one or the other behavior in all formats is
> the right way to go.
>
>
>
> Doing the other way based on the row format would cause more issues when we
>
> support hybrid format rows where some columns are in aligned format and
> others
>
> are not.
>
>
>
> anoop
>
>
>
> *From:* Hans Zeller [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 5:58 PM
> *To:* [email protected]
> *Subject:* Re: Upsert semantics
>
>
>
> Again, IMHO that's the wrong way to go, but I hope others will chime in.
> Dave gave the best reason, it's a bad idea to make the semantics of UPSERT
> depend on the internal format. Here is what I would suggest, using Selva's
> table (proposed changes in red - hope Apache won't mangle them):
>
>
>
>                                  Aligned Format                    Aligned
> format with            Non-Aligned with                     Non-Aligned with
>
>                                    With no omitted
>                    omitted columns               with no omitted
>                       omitted current default
>
>                                     columns
>                                                         / omitted
> non-current columns
>
>
>
> CQD off                          Replaces row
> MERGE                         Replace the given columns
> MERGE
>
> CQD on (default)             Replaces row
> Replaces row                Replace all columns                  Replace all 
> columns
>
>
>
> Hans
>
>
>
> On Tue, Mar 15, 2016 at 5:36 PM, Selva Govindarajan <
> [email protected]> wrote:
>
> I believe phoenix doesn’t support insert semantics or the non-null default
> value columns.  Trafodion supports insert, upsert, non-null default value
> columns as well as current default values like current timestamp and
> current user.
>
>
>
> Upsert handling in Trafodion is same as phoenix for non-aligned format.
> For aligned format it can be controlled via CQD.
>
>
>
>                                  Aligned Format                    Aligned
> format with            Non-Aligned with                     Non-Aligned with
>
>                                    With no omitted
>                    omitted columns               with no omitted
>                       omitted current default
>
>                                     columns
>                                                         / omitted
> non-current columns
>
>
>
> Default behavior             Replaces row
> MERGE                         Replace the given columns
> MERGE
>
> With the CQD                 Replaces row
> Replaces row                Replace the given columns                 MERGE
>
>
>          set to on
>
>
>
> The CQD to be used is TRAF_UPSERT_WITH_INSERT_DEFAULT_SEMANTICS (Default
> is off). In short, this CQD is a no-op for non-aligned format.
>
>
>
> The behavior of the non-aligned format can’t be controlled by the CQD
> because we don’t store values for the omitted columns in hbase and hence
> when the user switches the CQD settings for upserts with different sets of
> omitted columns, we could end up with non-deterministic values for these
> columns.
>
> For eq. upsert with the cqd set to ‘on’ with a set of omitted columns
>
> Upsert with the cqd set to ‘off’ with a different set of omitted columns
>
> If we switch to insert all column values all the time for non-aligned
> format, then we can let user to control what value needs to be put in for
> the omitted column.
>
>
>
> Selva
>
>
>
> *From:* Hans Zeller [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 4:01 PM
> *To:* [email protected]
> *Subject:* Re: Upsert semantics
>
>
>
> Yes, that's what I had in mind, using a CQD as the syntax:
>
>
>
> UPSERT handling                 aligned format        non-aligned format
>
> ------------------------------  --------------------
>  -------------------------------
>
> default behavior                replace row           replace row (create
> all values)
>
> Phoenix behavior (via CQD):     transform to MERGE    insert only
> specified cols (*)
>
>
>
> (*) One issue here is with "default current". In that case we may also
> need to transform the statement into a MERGE.
>
>
>
> From a performance point of view, the "default behavior" would work better
> for aligned format, the Phoenix behavior would work better for non-aligned
> format.
>
>
>
> In some cases it won't matter. Selva's code will detect many of these and
> automatically choose the faster implementation.
>
>
>
> Thanks,
>
>
> Hans
>
>
>
> On Tue, Mar 15, 2016 at 3:41 PM, Dave Birdsall <[email protected]>
> wrote:
>
> <Cringe> Not sure we want the logical semantics of an operation to depend
> on the physical layout of the row.
>
>
>
> Would be better to have different syntax for each. With an explanation
> that one works faster on one format, and the other faster on the other
> format.
>
>
>
> *From:* Eric Owhadi [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 3:38 PM
>
>
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Would there be a problem if we implemented the phoenix semantic for non
> align format, and the  upsert semantic proposed by Hans in align format?
>
> This would allow speed optimization without having the user to know about
> subtle differences?
>
> Eric
>
>
>
>
>
> *From:* Anoop Sharma [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 5:14 PM
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Phoenix has upsert command and from what can tell, they originally came up
> with upsert syntax.
>
> Their semantic is to insert if not present and update the specified
> columns with the specified values if present.
>
> We did do an experiment and upsert only updates the specified columns.
>
> Maybe we can add a cqd so full row update vs. specified column update
> behavior could be chosen.
>
>
>
> Here is their specification.
>
> Inserts if not present and updates otherwise the value in the table. The
> list of columns is optional and if not present, the values will map to the
> column in the order they are declared in the schema. The values must
> evaluate to constants.
>
> Example:
>
> UPSERT INTO TEST VALUES('foo','bar',3);
> UPSERT INTO TEST(NAME,ID) VALUES('foo',123);
>
>
>
>
>
> *From:* Dave Birdsall [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 2:55 PM
> *To:* [email protected]
> *Subject:* RE: Upsert semantics
>
>
>
> Hi,
>
>
>
> It seems that when ANSI first added MERGE to the standard, it was
> portrayed as “upsert” (see https://en.wikipedia.org/wiki/Merge_(SQL)).
>
>
>
> I agree though that we are free to define our UPSERT to mean anything we
> want.
>
>
>
> I like what you suggest. Since our UPSERT syntax already specifies values
> for all the columns, it makes sense for it to have “replace” semantics.
> That is, if the row exists, replace it with all the new stuff (with
> defaults for columns omitted). If the row doesn’t exist, it’s just a
> straight insert (with defaults for omitted columns).
>
>
>
> And if one really wants UPDATE semantics as opposed to “replace”
> semantics, then the ANSI MERGE statement (which Trafodion also supports) is
> the way to go.
>
>
>
> There is an analogy to this in linguistic theory. Whenever a language has
> two words that at a point in time mean the same thing, there is a tendency
> for the meanings to change over time so they diverge. For example, English
> “shirt” and “skirt”, originally from the same root, but one via Anglo-Saxon
> the other via Old Norse.
>
>
>
> Dave
>
>
>
>
>
> *From:* Hans Zeller [mailto:[email protected]]
> *Sent:* Tuesday, March 15, 2016 2:40 PM
> *To:* [email protected]
> *Subject:* Upsert semantics
>
>
>
> Hi,
>
>
>
> Here is a question on how we should define the meaning of an UPSERT
> statement. UPSERT is not part of the ISO/ANSI SQL standard, so we have some
> leeway to define it.
>
>
>
> My personal feeling is that UPSERT should either insert a brand-new row or
> it should completely replace an existing row, but it should never combine
> columns from a new and an existing row. If users want the latter then they
> should use the MERGE command.
>
>
>
> We should probably follow what other DBMSs do. I could not yet find a DBMS
> that had an UPSERT command, although there probably is one.
>
>
>
>    - PostgreSQL: Has an insert with a conflict clause, similar to our
>    MERGE: http://www.postgresql.org/docs/current/static/sql-insert.html
>
>
>    - MySQL: Has an insert with ON DUPLICATE KEY clause, similar to our
>    MERGE: http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
>
>
>    - Teradata: Has an update ... else insert ... command, similar to
>    MERGE:
>    https://forums.teradata.com/forum/enterprise/problem-using-upsert
>
>
>    - Oracle just seems to have the MERGE statement and various methods to
>    do upsert in PL/SQL:
>    
> http://stackoverflow.com/questions/237327/oracle-how-to-upsert-update-or-insert-into-a-table
>
> This seems to support indirectly what I'm proposing. If we want to merge
> old and new row then we should use syntax specifying how to merge, which is
> what the other DBMSs have done.
>
>
>
> See also the discussion in
> https://github.com/apache/incubator-trafodion/pull/380. I wanted to see
> whether the user group has any input on this.
>
>
>
> Thanks,
>
>
> Hans
>
>
>
>
>

Re: Upsert semantics

Reply via email to