[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874431#comment-16874431 ] Vihang Karajgaonkar commented on HIVE-21115: Don't think this is being worked on anymore. Resolving this as wont fix. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Bharathkrishna Guruvayoor Murali >Priority: Major > Attachments: HIVE-21115.1.patch, HIVE-21115.2.patch > > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750428#comment-16750428 ] Hive QA commented on HIVE-21115: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 26s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 2m 11s{color} | {color:blue} standalone-metastore/metastore-common in master has 29 extant Findbugs warnings. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 1m 1s{color} | {color:blue} standalone-metastore/metastore-server in master has 184 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 14s{color} | {color:red} standalone-metastore/metastore-server generated 1 new + 184 unchanged - 0 fixed = 185 total (was 184) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:standalone-metastore/metastore-server | | | Nullcheck of newt at line 4172 of value previously dereferenced in org.apache.hadoop.hive.metastore.ObjectStore.alterTable(String, String, String, Table, String) At ObjectStore.java:4172 of value previously dereferenced in org.apache.hadoop.hive.metastore.ObjectStore.alterTable(String, String, String, Table, String) At ObjectStore.java:[line 4172] | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-15764/dev-support/hive-personality.sh | | git revision | master / a7e704c | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | whitespace | http://104.198.109.242/logs//PreCommit-HIVE-Build-15764/yetus/whitespace-eol.txt | | findbugs | http://104.198.109.242/logs//PreCommit-HIVE-Build-15764/yetus/new-findbugs-standalone-metastore_metastore-server.html | | modules | C: standalone-metastore/metastore-common metastore standalone-metastore/metastore-server U: . | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-15764/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750280#comment-16750280 ] Bharathkrishna Guruvayoor Murali commented on HIVE-21115: - Hi Alan, sure we can proceed after discussing and getting the consensus. The main idea of putting this patch is for everyone to get an idea of how we are currently thinking of implementing this and to see if any unexpected tests fail to detect early if there are any problems with the approach. Updating the versions via datanucleus has a problem that the updated version number is not reflected in the MetaStoreConf notifications for transactional listeners because the notifications are issued before commitTransaction(). Hence, the logic in the prototype patch attached involves select..for update on the respective table/partition row to update version. As I mentioned, let's go ahead and put more thoughts into this, and the patch just serves to give more clarity on the idea. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Bharathkrishna Guruvayoor Murali >Priority: Major > Attachments: HIVE-21115.1.patch, HIVE-21115.2.patch > > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750269#comment-16750269 ] Alan Gates commented on HIVE-21115: --- I'm surprised to see a patch here already, as we don't have consensus yet on how to proceed. Given this is a major change, we need to get consensus. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Bharathkrishna Guruvayoor Murali >Priority: Major > Attachments: HIVE-21115.1.patch, HIVE-21115.2.patch > > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749653#comment-16749653 ] Hive QA commented on HIVE-21115: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12955902/HIVE-21115.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/15748/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15748/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15748/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2019-01-23 08:23:17.264 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-15748/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2019-01-23 08:23:17.267 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at dfd63d9 HIVE-20776 : Run HMS filterHooks on server-side in addition to client-side (Na Li reviewed by Karthik, Sergio, Morio, Adam and Vihang Karajgaonkar) + git clean -f -d Removing standalone-metastore/metastore-server/src/gen/ + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at dfd63d9 HIVE-20776 : Run HMS filterHooks on server-side in addition to client-side (Na Li reviewed by Karthik, Sergio, Morio, Adam and Vihang Karajgaonkar) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2019-01-23 08:23:17.917 + rm -rf ../yetus_PreCommit-HIVE-Build-15748 + mkdir ../yetus_PreCommit-HIVE-Build-15748 + git gc + cp -R . ../yetus_PreCommit-HIVE-Build-15748 + mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-15748/yetus + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql:8 Falling back to three-way merge... Applied patch to 'standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql' with conflicts. Going to apply patch with: git apply -p0 /data/hiveptest/working/scratch/build.patch:81: trailing whitespace. tmpMap.put(_Fields.VERSION, new org.apache.thrift.meta_data.FieldMetaData("version", org.apache.thrift.TFieldRequirementType.OPTIONAL, /data/hiveptest/working/scratch/build.patch:241: trailing whitespace. } else { /data/hiveptest/working/scratch/build.patch:355: trailing whitespace. tmpMap.put(_Fields.VERSION, new org.apache.thrift.meta_data.FieldMetaData("version", org.apache.thrift.TFieldRequirementType.OPTIONAL, /data/hiveptest/working/scratch/build.patch:515: trailing whitespace. } else { error: patch failed: standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql:8 Falling back to three-way merge... Applied patch to 'standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql' with conflicts. U standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql warning: 4 lines add whitespace errors. + result=1 + '[' 1 -ne 0 ']' + rm -rf yetus_PreCommit-HIVE-Build-15748 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12955902 - PreCommit-HIVE-Build > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee:
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749551#comment-16749551 ] Bharathkrishna Guruvayoor Murali commented on HIVE-21115: - Attaching a prototype for adding "version" to Table and Partition objects. Also to see the test failures. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Bharathkrishna Guruvayoor Murali >Priority: Major > Attachments: HIVE-21115.1.patch > > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743500#comment-16743500 ] Vihang Karajgaonkar commented on HIVE-21115: Thanks [~odraese]. You may a fair point. I took a look the you pointed above and here are lines which are interesting with respect to generating the version numbers using transaction handler. https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L589 {code} String s = sqlGenerator.addForUpdateClause("select ntxn_next from NEXT_TXN_ID"); LOG.debug("Going to execute query <" + s + ">"); rs = stmt.executeQuery(s); if (!rs.next()) { throw new MetaException("Transaction database not properly " + "configured, can't find next transaction id."); } long first = rs.getLong(1); s = "update NEXT_TXN_ID set ntxn_next = " + (first + numTxns); LOG.debug("Going to execute update <" + s + ">"); stmt.executeUpdate(s); {code} Although in theory we can use this same logic to generate object versions but this is different which can have potentially severe performance degradation. Note that the transaction_ids are unique across all the objects and instances of metastores. This means all the transactions are serialized to generate these ids in metastore. While this could be used to generate a globally incrementing version number it may have some adverse performance implications. Having a version at a per-object level instead reduces the scope of the lock and only the transactions which are operating on the same object will be serialized. I think as a mid-way we can implement the versioning for non-acid enabled tables. If the table is ACID enabled, it should use the transaction ids for its versioning. I am hesitant to use this logic above for all the objects which indirectly is going to serialize all the DML transactions in metastore. Perhaps someone more familiar with this code can comment on my thoughts above. [~ekoifman] [~alangates]? > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742629#comment-16742629 ] Oliver Draese commented on HIVE-21115: -- Hi [~vihangk1]. Just to clarify my comment from the 11th: My concerns are that we come up with a new version field for each of the objects that you listed above while at the same time, we would need versioning for the objects for DDL execution (at a later stage). I agree with you that there are right now problems with WriterID as not everyone is using transaction support. But whatever we come up here with, we should avoid having multiple version fields (one for optimistic locking and another one for transaction ID) when we introduce DDL transactions. To your question regarding current TX ID generation. I believe it is in TxhHandler#openTxns, where the code updates NEXT_TXN_ID but I would need to verify that to be certain > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742472#comment-16742472 ] Vihang Karajgaonkar commented on HIVE-21115: While working on the POC of this patch, we realized that there is fundamental issue with going with a datanucleus based approach. The issue is that the any version numbers generated are available post-commit which is acceptable for certain cases. However, if a client wishes to sync to metastore using {{NotificationEvent}} API, the version number in the before and after objects in the alter events will not have the updated versions. This happens because the event is generated before actual commit using the data of before and after thrift objects. So we could either fetch the object from the database again to get the after object or generate the version number in the metastore instead of relying on datanucleus so that we know what the new updated version number would be during event creation time. In general there are following advantages of having metastore generate the version number: 1. Metastore has complete control on the version generation logic. If we rely on datanucleus we don't really control the code which generates the version numbers. Hence any anomalies or bugs in that code would cause a problem. 2. It is consistent since all the other fields of a thrift object are generated by the metastore itself like createTime, lastDDL time etc. We don't rely on datanucleus to generate any application data elsewhere except for the unique ids used to identify each M*Objects (MDatabase, MTable etc) which should be seen as internal mechanism of datanclues. On the flip side, it complicates the logic to generate the version numbers. At the very least we need to store one value (for version) for each table/database/partition in the database and we need to make sure the version increment logic works when HMS-HA is enabled without any race conditions. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740880#comment-16740880 ] Vihang Karajgaonkar commented on HIVE-21115: Hi [~odraese] are you suggesting the HiveQL transactions. There a couple of issues with using the writerID. It assumes that all the clients support transactions and it is enabled. That said I am not super familiar with this part of the code. Can you point me to the code location of the writerId? Specifically how does it get generated and when does it get updated? > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740861#comment-16740861 ] Oliver Draese commented on HIVE-21115: -- One thing that might be worth considering here is the requirement to include DDL in general transaction management. If we add support for multiple statements within a single transaction and allow DDL statements to be part of these transactions, then we probably have the WriterID of the transaction already as a version identifier. The WriterID could also be used to detect optimistic locking conflicts and the WriterID column could then also be used as JDO version field. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740857#comment-16740857 ] Vihang Karajgaonkar commented on HIVE-21115: Thanks [~ekoifman] for the suggestion. The only direct SQL path which I know which modifies the HMS objects is dropTable which probably doesn't apply in this case since the object is deleted. All the alter calls use JDO to the best of my knowledge. On-update trigger is interesting idea. How do you envision the transfer of the updated version value from the trigger to translate to the thrift object. I think the trigger will execute as part of the transaction so the updated value will only visible once the transaction commits. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740793#comment-16740793 ] Eugene Koifman commented on HIVE-21115: --- Isn't there a direct SQL path somewhere that modifies HMS objects w/o using DataNucleus? Could this be expressed via some on-update trigger instead? > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-21115) Add support for object versions in metastore
[ https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740045#comment-16740045 ] Vihang Karajgaonkar commented on HIVE-21115: [~thejas] [~alangates] Do you have any thoughts or concerns with the above approach? If there are easier ways to do this I would be happy to use them instead of datanucleus based approach. > Add support for object versions in metastore > > > Key: HIVE-21115 > URL: https://issues.apache.org/jira/browse/HIVE-21115 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Priority: Major > > Currently, metastore objects are identified uniquely by their names (eg. > catName, dbName and tblName for a table is unique). Once a table or partition > is created it could be altered in many ways. There is no good way currently > to identify the version of the object once it is altered. For example, > suppose there are two clients (Hive and Impala) using the same metastore. > Once some alter operations are performed by a client, another client which > wants to do a alter operation has no good way to know if the object which it > has is the same as the one stored in metastore. Metastore updates the > {{transient_lastDdlTime}} every time there is a DDL operation on the object. > However, this value cannot be relied for all the clients since after > HIVE-1768 metastore updates the value only when it is not set in the > parameters. It is possible that a client which alters the object state, does > not remove the {{transient_lastDdlTime}} and metastore will not update it. > Secondly, if there is a clock skew between multiple HMS instances when HMS-HA > is configured, time values cannot be relied on to find out the sequence of > alter operations on a given object. > This JIRA propose to use JDO versioning support by Datanucleus > http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to > generate a incrementing sequence number every time a object is altered. The > value of this object can be set as one of the values in the parameters. The > advantage of using Datanucleus the versioning can be done across HMS > instances as part of the database transaction and it should work for all the > supported databases. > In theory such a version can be used to detect if the client is presenting a > object which is "stale" when issuing a alter request. Metastore can choose to > reject such a alter request since the client may be caching a old version of > the object and any alter operation on such stale object can potentially > overwrite previous operations. However, this is can be done in a separate > JIRA. -- This message was sent by Atlassian JIRA (v7.6.3#76005)