[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

vanzin Wed, 03 Oct 2018 09:14:23 -0700

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22614#discussion_r222373627
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
    @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
             getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]]
           } else {
             logDebug(s"Hive metastore filter is '$filter'.")
    -        val tryDirectSqlConfVar = 
HiveConf.ConfVars.METASTORE_TRY_DIRECT_SQL
    -        // We should get this config value from the metaStore. otherwise 
hit SPARK-18681.
    -        // To be compatible with hive-0.12 and hive-0.13, In the future we 
can achieve this by:
    -        // val tryDirectSql = 
hive.getMetaConf(tryDirectSqlConfVar.varname).toBoolean
    -        val tryDirectSql = 
hive.getMSC.getConfigValue(tryDirectSqlConfVar.varname,
    -          tryDirectSqlConfVar.defaultBoolVal.toString).toBoolean
             try {
               // Hive may throw an exception when calling this method in some 
circumstances, such as
    -          // when filtering on a non-string partition column when the hive 
config key
    -          // hive.metastore.try.direct.sql is false
    +          // when filtering on a non-string partition column.
               getPartitionsByFilterMethod.invoke(hive, table, filter)
                 .asInstanceOf[JArrayList[Partition]]
             } catch {
    -          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] &&
    -              !tryDirectSql =>
    +          case ex: InvocationTargetException if 
ex.getCause.isInstanceOf[MetaException] =>
                 logWarning("Caught Hive MetaException attempting to get 
partition metadata by " +
                   "filter from Hive. Falling back to fetching all partition 
metadata, which will " +
    -              "degrade performance. Modifying your Hive metastore 
configuration to set " +
    -              s"${tryDirectSqlConfVar.varname} to true may resolve this 
problem.", ex)
    +              "degrade performance. Enable direct SQL mode in hive 
metastore to attempt " +
    +              "to improve performance. However, Hive's direct SQL mode is 
an optimistic " +
    +              "optimization and does not guarantee improved performance.")
    --- End diff --
    
    One option if we want to get all fancy is to add a configurable timeout in 
the fallback case - assuming it's possible to cancel an ongoing call (run in a 
separate thread + interrupt maybe?).
    
    My main concern with the fallback, really, isn't the slowness, but that in 
the case where it would be slow (= too many partitions), the HMS might just run 
itself out of memory trying to serve the request.
    
    Reza mentions the Hive config which I think is the right thing to do by the 
HMS admin, since it avoids apps DoS'ing the server. Not sure what's the 
behavior there, but I hope if fails the call if there are too many partitions 
(instead of returning a subset). IMO that config seems to cover all the 
concerns here assuming the call will fail when you have too many partitions, no?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

Reply via email to