[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

kmanamcheri Wed, 03 Oct 2018 08:07:35 -0700

Github user kmanamcheri commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22614#discussion_r222348679
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
             getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
           } else {
             logDebug(s"Hive metastore filter is '$filter'.")
    -        val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
    -        // We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
    -        // To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
    -        // val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
    -        val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
    -          tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
             try {
               // Hive may throw an exception when calling this method in some 
circumstances, such as
    -          // when filtering on a non-string partition column when the hive 
config key
    -          // hive.metastore.try.direct.sql is false
    +          // when filtering on a non-string partition column.
               getPartitionsByFilterMethod.invoke(hive, table, filter)
                 .asInstanceOf[JArrayList[Partition]]
             } catch {
    -          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
    -              !tryDirectSql =>
    +          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
                 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
                   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
    -              "degrade performance. Modifying your Hive metastore 
configuration to set " +
    -              s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
    +              "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
    +              "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
    +              "optimization and does not guarantee improved performance.")
    --- End diff --
    
    @mallman The key point to note here is that setting direct sql on HMS "may" 
resolve the problem. It is not guaranteed. HMS only optimistically optimizes 
this. If direct sql on HMS fails, it will fall back on ORM and then fail again. 
Spark'ss behavior should not be inconsistent depending on HMS config value. 
    
    My suggested fix would still call getPartitionsByFilter and if that fails, 
will call getAllPartitions. We won't be calling getAllPartitions in all cases. 
It is a fallback mechanism. 
    
    @gatorsmile hmm.. why not? We know it might be slow and hence the warning. 
Maybe the warning message should read that this could be slow depending on the 
number of partitions since partition push-down to HMS failed.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

Reply via email to