Github user kmanamcheri commented on a diff in the pull request:
https://github.com/apache/spark/pull/22614#discussion_r222359233
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
getAllPartitionsMethod.invoke(hive,
table).asInstanceOf[JSet[Partition]]
} else {
logDebug(s"Hive metastore filter is '$filter'.")
- val tryDirectSqlConfVar =
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
- // We should get this config value from the metaStore. otherwise
hit SPARK-18681.
- // To be compatible with hive-0.12 and hive-0.13, In the future we
can achieve this by:
- // val tryDirectSql =
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
- val tryDirectSql =
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
- tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
try {
// Hive may throw an exception when calling this method in some
circumstances, such as
- // when filtering on a non-string partition column when the hive
config key
- // hive.metastore.try.direct.sql is false
+ // when filtering on a non-string partition column.
getPartitionsByFilterMethod.invoke(hive, table, filter)
.asInstanceOf[JArrayList[Partition]]
} catch {
- case ex: InvocationTargetException if
ex.getCause.isInstanceOf[MetaException] &&
- !tryDirectSql =>
+ case ex: InvocationTargetException if
ex.getCause.isInstanceOf[MetaException] =>
logWarning("Caught Hive MetaException attempting to get
partition metadata by " +
"filter from Hive. Falling back to fetching all partition
metadata, which will " +
- "degrade performance. Modifying your Hive metastore
configuration to set " +
- s"${tryDirectSqlConfVar.varname} to true may resolve this
problem.", ex)
+ "degrade performance. Enable direct SQL mode in hive
metastore to attempt " +
+ "to improve performance. However, Hive's direct SQL mode is
an optimistic " +
+ "optimization and does not guarantee improved performance.")
--- End diff --
@mallman I haven't tried using that config option. If I am understanding
[the documentation for
HIVE_MANAGE_FILESOURCE_PARTITIONS](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L547)
correctly, if I set that value to false, partitions will not be stored in HMS.
That sounds like it is addressing a different issue, no?
If that's the suggested way to deal with non-supported partition filters,
then this code should always fail if getPartitionsByFilter fails, no? Why even
have a fallback (as we do currently)? SPARK-17992 seems to say that Spark
should handle certain cases of partition pushdown failures (such as HMS ORM
mode). My argument is that the case should be expanded to include even if
hive.metastore.try.direct.sql is enabled to be true.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]