sunchao commented on a change in pull request #33382:
URL: https://github.com/apache/spark/pull/33382#discussion_r676795099
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
##########
@@ -876,29 +876,24 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
} else {
logDebug(s"Hive metastore filter is '$filter'.")
val tryDirectSqlConfVar = HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
- // We should get this config value from the metaStore. otherwise hit
SPARK-18681.
- // To be compatible with hive-0.12 and hive-0.13, In the future we can
achieve this by:
- // val tryDirectSql =
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
- val tryDirectSql =
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
- tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
try {
// Hive may throw an exception when calling this method in some
circumstances, such as
// when filtering on a non-string partition column when the hive
config key
- // hive.metastore.try.direct.sql is false
+ // hive.metastore.try.direct.sql is false. In some cases the remote
metastore will throw
+ // exceptions even if the config is true, due to various reasons
including the
+ // underlying RDBMS, Hive bugs when generating the filter, etc. For
this reason we
+ // always fallback to use `Hive.getAllPartitionsOf` here when the
exception happens.
getPartitionsByFilterMethod.invoke(hive, table, filter)
.asInstanceOf[JArrayList[Partition]]
} catch {
- case ex: InvocationTargetException if
ex.getCause.isInstanceOf[MetaException] &&
- !tryDirectSql =>
+ case ex: InvocationTargetException if
ex.getCause.isInstanceOf[MetaException] =>
logWarning("Caught Hive MetaException attempting to get partition
metadata by " +
"filter from Hive. Falling back to fetching all partition
metadata, which will " +
Review comment:
@cloud-fan thanks for your input. What do you think of the proposal
[here](https://github.com/apache/spark/pull/33382#issuecomment-883134666)? We
can introduce a flag `SQLConf.get.metastorePartitionPruningFallbackOnException`
which default to false. As result, the code will no longer depend on the
directSQL flag (currently when the flag is turned off on the remote HMS, Spark
will always fallback to list all partitions).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]