[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache

2020-06-30 Thread GitBox


zabetak commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447999634



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -1566,160 +1643,6 @@ private Table createUnpartitionedTableObject(Database 
db) {
 return tbl;
   }
 
-  private TableAndColStats createUnpartitionedTableObjectWithColStats(Database 
db) {

Review comment:
   This is not a test case it is just a private unused method.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache

2020-06-30 Thread GitBox


zabetak commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447999118



##
File path: 
standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
##
@@ -136,17 +129,19 @@
 MetaStoreTestUtils.setConfForStandloneMode(conf);
 ObjectStore objectStore = new ObjectStore();
 objectStore.setConf(conf);
-objectStore.dropTable(DEFAULT_CATALOG_NAME, db1Utbl1.getDbName(), 
db1Utbl1.getTableName());
-Deadline.startTimer("");
-objectStore.dropPartitions(DEFAULT_CATALOG_NAME, db1Ptbl1.getDbName(), 
db1Ptbl1.getTableName(), db1Ptbl1PtnNames);
-objectStore.dropTable(DEFAULT_CATALOG_NAME, db1Ptbl1.getDbName(), 
db1Ptbl1.getTableName());
-objectStore.dropTable(DEFAULT_CATALOG_NAME, db2Utbl1.getDbName(), 
db2Utbl1.getTableName());
-Deadline.startTimer("");
-objectStore.dropPartitions(DEFAULT_CATALOG_NAME, db2Ptbl1.getDbName(), 
db2Ptbl1.getTableName(), db2Ptbl1PtnNames);
-objectStore.dropTable(DEFAULT_CATALOG_NAME, db2Ptbl1.getDbName(), 
db2Ptbl1.getTableName());
-objectStore.dropDatabase(DEFAULT_CATALOG_NAME, db1.getName());
-objectStore.dropDatabase(DEFAULT_CATALOG_NAME, db2.getName());
+for (String clg : objectStore.getCatalogs()) {

Review comment:
   The `setUp` method adds things in the Metastore that are not necessarily 
needed by every test and this was the case even before my changes. In the new 
test, I added some more tables/columns cause I think it made things easier to 
understand at least for me. If I was to follow the current pattern I would have 
to add the tables, cols, partitions, etc., in the `setUp` method and then drop 
them in the `teardown` method but then they would be visible to all other tests 
as well. 
   
   I like the philosophy where unit tests are minimal (least possible state) 
thus I would prefer if every test starts with a clean Metastore and ends with a 
clean Metastore. I didn't change the `setUp` in order not to introduce too many 
unrelated changes but I made one step forward in the `teardown`.  
   
   If every test in this class was using the same state then I would definitely 
agree that the `setUp` and `teardown` methods should be the way they are right 
now. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache

2020-06-30 Thread GitBox


zabetak commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447985504



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore 
rawStore, String catName, Str
   rawStore.getPartitionColumnStatistics(catName, dbName, tblName, 
partNames, colNames, CacheUtils.HIVE_ENGINE);
   Deadline.stopTimer();
   sharedCache.refreshPartitionColStatsInCache(catName, dbName, 
tblName, partitionColStats);
-  Deadline.startTimer("getPartitionsByNames");
-  List parts = rawStore.getPartitionsByNames(catName, 
dbName, tblName, partNames);
-  Deadline.stopTimer();
-  // Also save partitions for consistency as they have the stats state.
-  for (Partition part : parts) {
-sharedCache.alterPartitionInCache(catName, dbName, tblName, 
part.getValues(), part);

Review comment:
   Indeed some times its confusing :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache

2020-06-30 Thread GitBox


zabetak commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447985240



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -2264,6 +2255,8 @@ private MergedColumnStatsForPartitions 
mergeColStatsForPartitions(String catName
   if (colStatsMap.size() < 1) {
 LOG.debug("No stats data found for: dbName={} tblName= {} partNames= 
{} colNames= ", dbName, tblName, partNames,
 colNames);
+// TODO: If we don't find any stats then most likely we should return 
null. Returning an empty object will not
+// trigger the lookup in the raw store and we will end up with missing 
stats.

Review comment:
   Yeap, check https://issues.apache.org/jira/browse/HIVE-23781. I was 
planning to push a new commit now to update the comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache

2020-06-30 Thread GitBox


zabetak commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447983144



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore 
rawStore, String catName, Str
   rawStore.getPartitionColumnStatistics(catName, dbName, tblName, 
partNames, colNames, CacheUtils.HIVE_ENGINE);
   Deadline.stopTimer();
   sharedCache.refreshPartitionColStatsInCache(catName, dbName, 
tblName, partitionColStats);
-  Deadline.startTimer("getPartitionsByNames");
-  List parts = rawStore.getPartitionsByNames(catName, 
dbName, tblName, partNames);
-  Deadline.stopTimer();
-  // Also save partitions for consistency as they have the stats state.
-  for (Partition part : parts) {
-sharedCache.alterPartitionInCache(catName, dbName, tblName, 
part.getValues(), part);

Review comment:
   They are already removed aren't they? Maybe, I didn't understand what 
you mean.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache

2020-06-30 Thread GitBox


zabetak commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447550132



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -851,8 +851,6 @@ private void updateTableColStats(RawStore rawStore, String 
catName, String dbNam
 
sharedCache.refreshTableColStatsInCache(StringUtils.normalizeIdentifier(catName),
 StringUtils.normalizeIdentifier(dbName), 
StringUtils.normalizeIdentifier(tblName),
 tableColStats.getStatsObj());
-// Update the table to get consistent stats state.
-sharedCache.alterTableInCache(catName, dbName, tblName, table);

Review comment:
   If my understanding is correct the cache is already updated by the time 
we arrive here. Check my comment regarding `alterPartitionsInCache` and if 
something is not clear I can elaborate more.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org



[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache

2020-06-30 Thread GitBox


zabetak commented on a change in pull request #1186:
URL: https://github.com/apache/hive/pull/1186#discussion_r447547869



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore 
rawStore, String catName, Str
   rawStore.getPartitionColumnStatistics(catName, dbName, tblName, 
partNames, colNames, CacheUtils.HIVE_ENGINE);
   Deadline.stopTimer();
   sharedCache.refreshPartitionColStatsInCache(catName, dbName, 
tblName, partitionColStats);
-  Deadline.startTimer("getPartitionsByNames");
-  List parts = rawStore.getPartitionsByNames(catName, 
dbName, tblName, partNames);
-  Deadline.stopTimer();
-  // Also save partitions for consistency as they have the stats state.
-  for (Partition part : parts) {
-sharedCache.alterPartitionInCache(catName, dbName, tblName, 
part.getValues(), part);

Review comment:
   The explanation is hidden in the commit message :)
   
   Using alterPartitionInCache is problematic:
   1. It empties all relevant stats about partitions even those updated just 
now.
   2. It is not necessary since partitions were updated/refreshed just before 
calling the method updateTablePartitionColStats via 
[updateTablePartitions](https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L870).
   3. It is not meant to be used for syncing the cache but rather handle 
updates in partitions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org