sunchao commented on pull request #33382:
URL: https://github.com/apache/spark/pull/33382#issuecomment-883134666
I feel adding one more config on top of the existing `tryDirectSql` may make
it too complex. What if we introduce a new config and use that to decide
whether Spark should fallback to call `getAllPartitionsMethod`? and default it
to true but with a helpful message to tell users that they can switch it off to
fail the query instead if they really want so?
This should only affect those queries in a few scenarios which they used to
fail but now can succeed, which IMO is a better outcome.
For instance:
```scala
val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
val shouldFallback =
SQLConf.get.metastorePartitionPruningFallbackOnException
try {
getPartitionsByFilterMethod.invoke(hive, table, filter)
.asInstanceOf[JArrayList[Partition]]
} catch {
case ex: InvocationTargetException if
ex.getCause.isInstanceOf[MetaException] &&
shouldFallback =>
logWarning("Caught Hive MetaException attempting to get
partition metadata by " +
"filter from Hive. Falling back to fetching all partition
metadata, which will " +
"degrade performance. Modifying your Hive metastore
configuration to set " +
s"${tryDirectSqlConfVar.varname} to true (if it is not true
already) may resolve " +
"this problem. Otherwise, you can set " +
s"${SQLConf.HIVE_METASTORE_PARTITION_PRUNING_FALLBACK_ON_EXCEPTION.key} " +
" to false and let the query fail instead.", ex)
getAllPartitionsMethod.invoke(hive,
table).asInstanceOf[JSet[Partition]]
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]