[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-06-27 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874431#comment-16874431
 ] 

Vihang Karajgaonkar commented on HIVE-21115:


Don't think this is being worked on anymore. Resolving this as wont fix.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-21115.1.patch, HIVE-21115.2.patch
>
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-23 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750428#comment-16750428
 ] 

Hive QA commented on HIVE-21115:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
26s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  2m 
11s{color} | {color:blue} standalone-metastore/metastore-common in master has 
29 extant Findbugs warnings. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  1m  
1s{color} | {color:blue} standalone-metastore/metastore-server in master has 
184 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
19s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 4 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
14s{color} | {color:red} standalone-metastore/metastore-server generated 1 new 
+ 184 unchanged - 0 fixed = 185 total (was 184) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:standalone-metastore/metastore-server |
|  |  Nullcheck of newt at line 4172 of value previously dereferenced in 
org.apache.hadoop.hive.metastore.ObjectStore.alterTable(String, String, String, 
Table, String)  At ObjectStore.java:4172 of value previously dereferenced in 
org.apache.hadoop.hive.metastore.ObjectStore.alterTable(String, String, String, 
Table, String)  At ObjectStore.java:[line 4172] |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-15764/dev-support/hive-personality.sh
 |
| git revision | master / a7e704c |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15764/yetus/whitespace-eol.txt
 |
| findbugs | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15764/yetus/new-findbugs-standalone-metastore_metastore-server.html
 |
| modules | C: standalone-metastore/metastore-common metastore 
standalone-metastore/metastore-server U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-15764/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>

[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-23 Thread Bharathkrishna Guruvayoor Murali (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750280#comment-16750280
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-21115:
-

Hi Alan, sure we can proceed after discussing and getting the consensus. The 
main idea of putting this patch is for everyone to get an idea of how we are 
currently thinking of implementing this and to see if any unexpected tests fail 
to detect early if there are any problems with the approach. 
Updating the versions via datanucleus has a problem that the updated version 
number is not reflected in the MetaStoreConf notifications for transactional 
listeners because the notifications are issued before commitTransaction(). 
Hence, the logic in the prototype patch attached involves select..for update on 
the respective table/partition row to update version. As I mentioned, let's go 
ahead and put more thoughts into this, and the patch just serves to give more 
clarity on the idea.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-21115.1.patch, HIVE-21115.2.patch
>
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-23 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750269#comment-16750269
 ] 

Alan Gates commented on HIVE-21115:
---

I'm surprised to see a patch here already, as we don't have consensus yet on 
how to proceed.  Given this is a major change, we need to get consensus.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-21115.1.patch, HIVE-21115.2.patch
>
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-23 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749653#comment-16749653
 ] 

Hive QA commented on HIVE-21115:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12955902/HIVE-21115.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/15748/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/15748/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-15748/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2019-01-23 08:23:17.264
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-15748/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2019-01-23 08:23:17.267
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at dfd63d9 HIVE-20776 : Run HMS filterHooks on server-side in 
addition to client-side (Na Li reviewed by Karthik, Sergio, Morio, Adam and 
Vihang Karajgaonkar)
+ git clean -f -d
Removing standalone-metastore/metastore-server/src/gen/
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at dfd63d9 HIVE-20776 : Run HMS filterHooks on server-side in 
addition to client-side (Na Li reviewed by Karthik, Sergio, Morio, Adam and 
Vihang Karajgaonkar)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2019-01-23 08:23:17.917
+ rm -rf ../yetus_PreCommit-HIVE-Build-15748
+ mkdir ../yetus_PreCommit-HIVE-Build-15748
+ git gc
+ cp -R . ../yetus_PreCommit-HIVE-Build-15748
+ mkdir /data/hiveptest/logs/PreCommit-HIVE-Build-15748/yetus
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql:8
Falling back to three-way merge...
Applied patch to 
'standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql'
 with conflicts.
Going to apply patch with: git apply -p0
/data/hiveptest/working/scratch/build.patch:81: trailing whitespace.
tmpMap.put(_Fields.VERSION, new 
org.apache.thrift.meta_data.FieldMetaData("version", 
org.apache.thrift.TFieldRequirementType.OPTIONAL, 
/data/hiveptest/working/scratch/build.patch:241: trailing whitespace.
} else { 
/data/hiveptest/working/scratch/build.patch:355: trailing whitespace.
tmpMap.put(_Fields.VERSION, new 
org.apache.thrift.meta_data.FieldMetaData("version", 
org.apache.thrift.TFieldRequirementType.OPTIONAL, 
/data/hiveptest/working/scratch/build.patch:515: trailing whitespace.
} else { 
error: patch failed: 
standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql:8
Falling back to three-way merge...
Applied patch to 
'standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql'
 with conflicts.
U 
standalone-metastore/metastore-server/src/main/sql/derby/upgrade-3.2.0-to-4.0.0.derby.sql
warning: 4 lines add whitespace errors.
+ result=1
+ '[' 1 -ne 0 ']'
+ rm -rf yetus_PreCommit-HIVE-Build-15748
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12955902 - PreCommit-HIVE-Build

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: 

[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-22 Thread Bharathkrishna Guruvayoor Murali (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749551#comment-16749551
 ] 

Bharathkrishna Guruvayoor Murali commented on HIVE-21115:
-

Attaching a prototype for adding "version" to Table and Partition objects.
Also to see the test failures.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Bharathkrishna Guruvayoor Murali
>Priority: Major
> Attachments: HIVE-21115.1.patch
>
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-15 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743500#comment-16743500
 ] 

Vihang Karajgaonkar commented on HIVE-21115:


Thanks [~odraese]. You may a fair point. I took a look the you pointed above 
and here are lines which are interesting with respect to generating the version 
numbers using transaction handler.

https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java#L589

{code}
 String s = sqlGenerator.addForUpdateClause("select ntxn_next from 
NEXT_TXN_ID");
  LOG.debug("Going to execute query <" + s + ">");
  rs = stmt.executeQuery(s);
  if (!rs.next()) {
throw new MetaException("Transaction database not properly " +
"configured, can't find next transaction id.");
  }
  long first = rs.getLong(1);
  s = "update NEXT_TXN_ID set ntxn_next = " + (first + numTxns);
  LOG.debug("Going to execute update <" + s + ">");
  stmt.executeUpdate(s);
{code}

Although in theory we can use this same logic to generate object versions but 
this is different which can have potentially severe performance degradation. 
Note that the transaction_ids are unique across all the objects and instances 
of metastores. This means all the transactions are serialized to generate these 
ids in metastore. While this could be used to generate a globally incrementing 
version number it may have some adverse performance implications. Having a 
version at a per-object level instead reduces the scope of the lock and only 
the transactions which are operating on the same object will be serialized. I 
think as a mid-way we can implement the versioning for non-acid enabled tables. 
If the table is ACID enabled, it should use the transaction ids for its 
versioning. I am hesitant to use this logic above for all the objects which 
indirectly is going to serialize all the DML transactions in metastore. Perhaps 
someone more familiar with this code can comment on my thoughts above. 
[~ekoifman] [~alangates]?

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-14 Thread Oliver Draese (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742629#comment-16742629
 ] 

Oliver Draese commented on HIVE-21115:
--

Hi [~vihangk1]. Just to clarify my comment from the 11th:

My concerns are that we come up with a new version field for each of the 
objects that you listed above while at the same time, we would need versioning 
for the objects for DDL execution (at a later stage). I agree with you that 
there are right now problems with WriterID as not everyone is using transaction 
support. But whatever we come up here with, we should avoid having multiple 
version fields (one for optimistic locking and another one for transaction ID) 
when we introduce DDL transactions.

To your question regarding current TX ID generation. I believe it is in 
TxhHandler#openTxns, where the code updates NEXT_TXN_ID but I would need to 
verify that to be certain 

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-14 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742472#comment-16742472
 ] 

Vihang Karajgaonkar commented on HIVE-21115:


While working on the POC of this patch, we realized that there is fundamental 
issue with going with a datanucleus based approach. The issue is that the any 
version numbers generated are available post-commit which is acceptable for 
certain cases. However, if a client wishes to sync to metastore using 
{{NotificationEvent}} API, the version number in the before and after objects 
in the alter events will not have the updated versions. This happens because 
the event is generated before actual commit using the data of before and after 
thrift objects. So we could either fetch the object from the database again to 
get the after object or generate the version number in the metastore instead of 
relying on datanucleus so that we know what the new updated version number 
would be during event creation time.

In general there are following advantages of having metastore generate the 
version number:
1. Metastore has complete control on the version generation logic. If we rely 
on datanucleus we don't really control the code which generates the version 
numbers. Hence any anomalies or bugs in that code would cause a problem.
2. It is consistent since all the other fields of a thrift object are generated 
by the metastore itself like createTime, lastDDL time etc. We don't rely on 
datanucleus to generate any application data elsewhere except for the unique 
ids used to identify each M*Objects (MDatabase, MTable etc) which should be 
seen as internal mechanism of datanclues.

On the flip side, it complicates the logic to generate the version numbers. At 
the very least we need to store one value (for version) for each 
table/database/partition in the database and we need to make sure the version 
increment logic works when HMS-HA is enabled without any race conditions.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-11 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740880#comment-16740880
 ] 

Vihang Karajgaonkar commented on HIVE-21115:


Hi [~odraese] are you suggesting the HiveQL transactions. There a couple of 
issues with using the writerID. It assumes that all the clients support 
transactions and it is enabled. That said I am not super familiar with this 
part of the code. Can you point me to the code location of the writerId? 
Specifically how does it get generated and when does it get updated?

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-11 Thread Oliver Draese (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740861#comment-16740861
 ] 

Oliver Draese commented on HIVE-21115:
--

One thing that might be worth considering here is the requirement to include 
DDL in general transaction management. If we add support for multiple 
statements within a single transaction and allow DDL statements to be part of 
these transactions, then we probably have the WriterID of the transaction 
already as a version identifier. The WriterID could also be used to detect 
optimistic locking conflicts and the WriterID column could then also be used as 
JDO version field.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-11 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740857#comment-16740857
 ] 

Vihang Karajgaonkar commented on HIVE-21115:


Thanks [~ekoifman] for the suggestion. The only direct SQL path which I know 
which modifies the HMS objects is dropTable which probably doesn't apply in 
this case since the object is deleted. All the alter calls use JDO to the best 
of my knowledge. On-update trigger is interesting idea. How do you envision the 
transfer of the updated version value from the trigger to translate to the 
thrift object. I think the trigger will execute as part of the transaction so 
the updated value will only visible once the transaction commits.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-11 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740793#comment-16740793
 ] 

Eugene Koifman commented on HIVE-21115:
---

Isn't there a direct SQL path somewhere that modifies HMS objects w/o using 
DataNucleus?
Could this be expressed via some on-update trigger instead?


> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21115) Add support for object versions in metastore

2019-01-10 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740045#comment-16740045
 ] 

Vihang Karajgaonkar commented on HIVE-21115:


[~thejas] [~alangates] Do you have any thoughts or concerns with the above 
approach? If there are easier ways to do this I would be happy to use them 
instead of datanucleus based approach.

> Add support for object versions in metastore
> 
>
> Key: HIVE-21115
> URL: https://issues.apache.org/jira/browse/HIVE-21115
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> Currently, metastore objects are identified uniquely by their names (eg. 
> catName, dbName and tblName for a table is unique). Once a table or partition 
> is created it could be altered in many ways. There is no good way currently 
> to identify the version of the object once it is altered. For example, 
> suppose there are two clients (Hive and Impala) using the same metastore. 
> Once some alter operations are performed by a client, another client which 
> wants to do a alter operation has no good way to know if the object which it 
> has is the same as the one stored in metastore. Metastore updates the 
> {{transient_lastDdlTime}} every time there is a DDL operation on the object. 
> However, this value cannot be relied for all the clients since after 
> HIVE-1768 metastore updates the value only when it is not set in the 
> parameters. It is possible that a client which alters the object state, does 
> not remove the {{transient_lastDdlTime}} and metastore will not update it. 
> Secondly, if there is a clock skew between multiple HMS instances when HMS-HA 
> is configured, time values cannot be relied on to find out the sequence of 
> alter operations on a given object.
> This JIRA propose to use JDO versioning support by Datanucleus  
> http://www.datanucleus.org/products/accessplatform_4_2/jdo/versioning.html to 
> generate a incrementing sequence number every time a object is altered. The 
> value of this object can be set as one of the values in the parameters. The 
> advantage of using Datanucleus the versioning can be done across HMS 
> instances as part of the database transaction and it should work for all the 
> supported databases.
> In theory such a version can be used to detect if the client is presenting a 
> object which is "stale" when issuing a alter request. Metastore can choose to 
> reject such a alter request since the client may be caching a old version of 
> the object and any alter operation on such stale object can potentially 
> overwrite previous operations. However, this is can be done in a separate 
> JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)