Thank you, Selva. The JIRA is https://issues.apache.org/jira/browse/TRAFODION-1896.
Hans On Tue, Mar 15, 2016 at 6:15 PM, Selva Govindarajan < [email protected]> wrote: > Hans, > > > > It didn’t occur to me your proposed change would work. I was always > thinking we shouldn’t be adding the omitted columns in non-aligned format. > You can file a JIRA and I will fix it. > > > > Selva > > > > *From:* Anoop Sharma [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 6:03 PM > > *To:* [email protected] > *Subject:* RE: Upsert semantics > > > > yes, one cqd to switch between one or the other behavior in all formats is > the right way to go. > > > > Doing the other way based on the row format would cause more issues when we > > support hybrid format rows where some columns are in aligned format and > others > > are not. > > > > anoop > > > > *From:* Hans Zeller [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 5:58 PM > *To:* [email protected] > *Subject:* Re: Upsert semantics > > > > Again, IMHO that's the wrong way to go, but I hope others will chime in. > Dave gave the best reason, it's a bad idea to make the semantics of UPSERT > depend on the internal format. Here is what I would suggest, using Selva's > table (proposed changes in red - hope Apache won't mangle them): > > > > Aligned Format Aligned > format with Non-Aligned with Non-Aligned with > > With no omitted > omitted columns with no omitted > omitted current default > > columns > / omitted > non-current columns > > > > CQD off Replaces row > MERGE Replace the given columns > MERGE > > CQD on (default) Replaces row > Replaces row Replace all columns Replace all > columns > > > > Hans > > > > On Tue, Mar 15, 2016 at 5:36 PM, Selva Govindarajan < > [email protected]> wrote: > > I believe phoenix doesn’t support insert semantics or the non-null default > value columns. Trafodion supports insert, upsert, non-null default value > columns as well as current default values like current timestamp and > current user. > > > > Upsert handling in Trafodion is same as phoenix for non-aligned format. > For aligned format it can be controlled via CQD. > > > > Aligned Format Aligned > format with Non-Aligned with Non-Aligned with > > With no omitted > omitted columns with no omitted > omitted current default > > columns > / omitted > non-current columns > > > > Default behavior Replaces row > MERGE Replace the given columns > MERGE > > With the CQD Replaces row > Replaces row Replace the given columns MERGE > > > set to on > > > > The CQD to be used is TRAF_UPSERT_WITH_INSERT_DEFAULT_SEMANTICS (Default > is off). In short, this CQD is a no-op for non-aligned format. > > > > The behavior of the non-aligned format can’t be controlled by the CQD > because we don’t store values for the omitted columns in hbase and hence > when the user switches the CQD settings for upserts with different sets of > omitted columns, we could end up with non-deterministic values for these > columns. > > For eq. upsert with the cqd set to ‘on’ with a set of omitted columns > > Upsert with the cqd set to ‘off’ with a different set of omitted columns > > If we switch to insert all column values all the time for non-aligned > format, then we can let user to control what value needs to be put in for > the omitted column. > > > > Selva > > > > *From:* Hans Zeller [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 4:01 PM > *To:* [email protected] > *Subject:* Re: Upsert semantics > > > > Yes, that's what I had in mind, using a CQD as the syntax: > > > > UPSERT handling aligned format non-aligned format > > ------------------------------ -------------------- > ------------------------------- > > default behavior replace row replace row (create > all values) > > Phoenix behavior (via CQD): transform to MERGE insert only > specified cols (*) > > > > (*) One issue here is with "default current". In that case we may also > need to transform the statement into a MERGE. > > > > From a performance point of view, the "default behavior" would work better > for aligned format, the Phoenix behavior would work better for non-aligned > format. > > > > In some cases it won't matter. Selva's code will detect many of these and > automatically choose the faster implementation. > > > > Thanks, > > > Hans > > > > On Tue, Mar 15, 2016 at 3:41 PM, Dave Birdsall <[email protected]> > wrote: > > <Cringe> Not sure we want the logical semantics of an operation to depend > on the physical layout of the row. > > > > Would be better to have different syntax for each. With an explanation > that one works faster on one format, and the other faster on the other > format. > > > > *From:* Eric Owhadi [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 3:38 PM > > > *To:* [email protected] > *Subject:* RE: Upsert semantics > > > > Would there be a problem if we implemented the phoenix semantic for non > align format, and the upsert semantic proposed by Hans in align format? > > This would allow speed optimization without having the user to know about > subtle differences? > > Eric > > > > > > *From:* Anoop Sharma [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 5:14 PM > *To:* [email protected] > *Subject:* RE: Upsert semantics > > > > Phoenix has upsert command and from what can tell, they originally came up > with upsert syntax. > > Their semantic is to insert if not present and update the specified > columns with the specified values if present. > > We did do an experiment and upsert only updates the specified columns. > > Maybe we can add a cqd so full row update vs. specified column update > behavior could be chosen. > > > > Here is their specification. > > Inserts if not present and updates otherwise the value in the table. The > list of columns is optional and if not present, the values will map to the > column in the order they are declared in the schema. The values must > evaluate to constants. > > Example: > > UPSERT INTO TEST VALUES('foo','bar',3); > UPSERT INTO TEST(NAME,ID) VALUES('foo',123); > > > > > > *From:* Dave Birdsall [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 2:55 PM > *To:* [email protected] > *Subject:* RE: Upsert semantics > > > > Hi, > > > > It seems that when ANSI first added MERGE to the standard, it was > portrayed as “upsert” (see https://en.wikipedia.org/wiki/Merge_(SQL)). > > > > I agree though that we are free to define our UPSERT to mean anything we > want. > > > > I like what you suggest. Since our UPSERT syntax already specifies values > for all the columns, it makes sense for it to have “replace” semantics. > That is, if the row exists, replace it with all the new stuff (with > defaults for columns omitted). If the row doesn’t exist, it’s just a > straight insert (with defaults for omitted columns). > > > > And if one really wants UPDATE semantics as opposed to “replace” > semantics, then the ANSI MERGE statement (which Trafodion also supports) is > the way to go. > > > > There is an analogy to this in linguistic theory. Whenever a language has > two words that at a point in time mean the same thing, there is a tendency > for the meanings to change over time so they diverge. For example, English > “shirt” and “skirt”, originally from the same root, but one via Anglo-Saxon > the other via Old Norse. > > > > Dave > > > > > > *From:* Hans Zeller [mailto:[email protected]] > *Sent:* Tuesday, March 15, 2016 2:40 PM > *To:* [email protected] > *Subject:* Upsert semantics > > > > Hi, > > > > Here is a question on how we should define the meaning of an UPSERT > statement. UPSERT is not part of the ISO/ANSI SQL standard, so we have some > leeway to define it. > > > > My personal feeling is that UPSERT should either insert a brand-new row or > it should completely replace an existing row, but it should never combine > columns from a new and an existing row. If users want the latter then they > should use the MERGE command. > > > > We should probably follow what other DBMSs do. I could not yet find a DBMS > that had an UPSERT command, although there probably is one. > > > > - PostgreSQL: Has an insert with a conflict clause, similar to our > MERGE: http://www.postgresql.org/docs/current/static/sql-insert.html > > > - MySQL: Has an insert with ON DUPLICATE KEY clause, similar to our > MERGE: http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html > > > - Teradata: Has an update ... else insert ... command, similar to > MERGE: > https://forums.teradata.com/forum/enterprise/problem-using-upsert > > > - Oracle just seems to have the MERGE statement and various methods to > do upsert in PL/SQL: > > http://stackoverflow.com/questions/237327/oracle-how-to-upsert-update-or-insert-into-a-table > > This seems to support indirectly what I'm proposing. If we want to merge > old and new row then we should use syntax specifying how to merge, which is > what the other DBMSs have done. > > > > See also the discussion in > https://github.com/apache/incubator-trafodion/pull/380. I wanted to see > whether the user group has any input on this. > > > > Thanks, > > > Hans > > > > >
