[ https://issues.apache.org/jira/browse/SPARK-45373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770220#comment-17770220 ]
Asif commented on SPARK-45373: ------------------------------ Will be generating a PR for this. > Minimizing calls to HiveMetaStore layer for getting partitions, when tables > are repeated > ----------------------------------------------------------------------------------------- > > Key: SPARK-45373 > URL: https://issues.apache.org/jira/browse/SPARK-45373 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.1 > Reporter: Asif > Priority: Minor > Fix For: 3.5.1 > > > In the rule PruneFileSourcePartitions where the CatalogFileIndex gets > converted to InMemoryFileIndex, the HMS calls can get very expensive if : > 1) The translated filter string for push down to HMS layer becomes empty , > resulting in fetching of all partitions and same table is referenced multiple > times in the query. > 2) Or just in case same table is referenced multiple times in the query with > different partition filters. > In such cases current code would result in multiple calls to HMS layer. > This can be avoided by grouping the tables based on CatalogFileIndex and > passing a common minimum filter ( filter1 || filter2) and getting a base > PrunedInmemoryFileIndex which can become a basis for each of the specific > table. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org