[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache
zabetak commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447999634 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ## @@ -1566,160 +1643,6 @@ private Table createUnpartitionedTableObject(Database db) { return tbl; } - private TableAndColStats createUnpartitionedTableObjectWithColStats(Database db) { Review comment: This is not a test case it is just a private unused method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache
zabetak commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447999118 ## File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java ## @@ -136,17 +129,19 @@ MetaStoreTestUtils.setConfForStandloneMode(conf); ObjectStore objectStore = new ObjectStore(); objectStore.setConf(conf); -objectStore.dropTable(DEFAULT_CATALOG_NAME, db1Utbl1.getDbName(), db1Utbl1.getTableName()); -Deadline.startTimer(""); -objectStore.dropPartitions(DEFAULT_CATALOG_NAME, db1Ptbl1.getDbName(), db1Ptbl1.getTableName(), db1Ptbl1PtnNames); -objectStore.dropTable(DEFAULT_CATALOG_NAME, db1Ptbl1.getDbName(), db1Ptbl1.getTableName()); -objectStore.dropTable(DEFAULT_CATALOG_NAME, db2Utbl1.getDbName(), db2Utbl1.getTableName()); -Deadline.startTimer(""); -objectStore.dropPartitions(DEFAULT_CATALOG_NAME, db2Ptbl1.getDbName(), db2Ptbl1.getTableName(), db2Ptbl1PtnNames); -objectStore.dropTable(DEFAULT_CATALOG_NAME, db2Ptbl1.getDbName(), db2Ptbl1.getTableName()); -objectStore.dropDatabase(DEFAULT_CATALOG_NAME, db1.getName()); -objectStore.dropDatabase(DEFAULT_CATALOG_NAME, db2.getName()); +for (String clg : objectStore.getCatalogs()) { Review comment: The `setUp` method adds things in the Metastore that are not necessarily needed by every test and this was the case even before my changes. In the new test, I added some more tables/columns cause I think it made things easier to understand at least for me. If I was to follow the current pattern I would have to add the tables, cols, partitions, etc., in the `setUp` method and then drop them in the `teardown` method but then they would be visible to all other tests as well. I like the philosophy where unit tests are minimal (least possible state) thus I would prefer if every test starts with a clean Metastore and ends with a clean Metastore. I didn't change the `setUp` in order not to introduce too many unrelated changes but I made one step forward in the `teardown`. If every test in this class was using the same state then I would definitely agree that the `setUp` and `teardown` methods should be the way they are right now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache
zabetak commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447985504 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore rawStore, String catName, Str rawStore.getPartitionColumnStatistics(catName, dbName, tblName, partNames, colNames, CacheUtils.HIVE_ENGINE); Deadline.stopTimer(); sharedCache.refreshPartitionColStatsInCache(catName, dbName, tblName, partitionColStats); - Deadline.startTimer("getPartitionsByNames"); - List parts = rawStore.getPartitionsByNames(catName, dbName, tblName, partNames); - Deadline.stopTimer(); - // Also save partitions for consistency as they have the stats state. - for (Partition part : parts) { -sharedCache.alterPartitionInCache(catName, dbName, tblName, part.getValues(), part); Review comment: Indeed some times its confusing :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache
zabetak commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447985240 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -2264,6 +2255,8 @@ private MergedColumnStatsForPartitions mergeColStatsForPartitions(String catName if (colStatsMap.size() < 1) { LOG.debug("No stats data found for: dbName={} tblName= {} partNames= {} colNames= ", dbName, tblName, partNames, colNames); +// TODO: If we don't find any stats then most likely we should return null. Returning an empty object will not +// trigger the lookup in the raw store and we will end up with missing stats. Review comment: Yeap, check https://issues.apache.org/jira/browse/HIVE-23781. I was planning to push a new commit now to update the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache
zabetak commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447983144 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore rawStore, String catName, Str rawStore.getPartitionColumnStatistics(catName, dbName, tblName, partNames, colNames, CacheUtils.HIVE_ENGINE); Deadline.stopTimer(); sharedCache.refreshPartitionColStatsInCache(catName, dbName, tblName, partitionColStats); - Deadline.startTimer("getPartitionsByNames"); - List parts = rawStore.getPartitionsByNames(catName, dbName, tblName, partNames); - Deadline.stopTimer(); - // Also save partitions for consistency as they have the stats state. - for (Partition part : parts) { -sharedCache.alterPartitionInCache(catName, dbName, tblName, part.getValues(), part); Review comment: They are already removed aren't they? Maybe, I didn't understand what you mean. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache
zabetak commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447550132 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -851,8 +851,6 @@ private void updateTableColStats(RawStore rawStore, String catName, String dbNam sharedCache.refreshTableColStatsInCache(StringUtils.normalizeIdentifier(catName), StringUtils.normalizeIdentifier(dbName), StringUtils.normalizeIdentifier(tblName), tableColStats.getStatsObj()); -// Update the table to get consistent stats state. -sharedCache.alterTableInCache(catName, dbName, tblName, table); Review comment: If my understanding is correct the cache is already updated by the time we arrive here. Check my comment regarding `alterPartitionsInCache` and if something is not clear I can elaborate more. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org
[GitHub] [hive] zabetak commented on a change in pull request #1186: Hive 23768: Metastore's update service wrongly strips partition column stats from the cache
zabetak commented on a change in pull request #1186: URL: https://github.com/apache/hive/pull/1186#discussion_r447547869 ## File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java ## @@ -900,13 +898,6 @@ private void updateTablePartitionColStats(RawStore rawStore, String catName, Str rawStore.getPartitionColumnStatistics(catName, dbName, tblName, partNames, colNames, CacheUtils.HIVE_ENGINE); Deadline.stopTimer(); sharedCache.refreshPartitionColStatsInCache(catName, dbName, tblName, partitionColStats); - Deadline.startTimer("getPartitionsByNames"); - List parts = rawStore.getPartitionsByNames(catName, dbName, tblName, partNames); - Deadline.stopTimer(); - // Also save partitions for consistency as they have the stats state. - for (Partition part : parts) { -sharedCache.alterPartitionInCache(catName, dbName, tblName, part.getValues(), part); Review comment: The explanation is hidden in the commit message :) Using alterPartitionInCache is problematic: 1. It empties all relevant stats about partitions even those updated just now. 2. It is not necessary since partitions were updated/refreshed just before calling the method updateTablePartitionColStats via [updateTablePartitions](https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L870). 3. It is not meant to be used for syncing the cache but rather handle updates in partitions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org