[GitHub] spark pull request #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to b...

cloud-fan Sat, 05 May 2018 01:08:37 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21122#discussion_r186256056
  
    --- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
    @@ -1354,7 +1354,8 @@ class HiveDDLSuite
         val indexName = tabName + "_index"
         withTable(tabName) {
           // Spark SQL does not support creating index. Thus, we have to use 
Hive client.
    -      val client = 
spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
    +      val client =
    +        
spark.sharedState.externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client
    --- End diff --
    
    Ideally Hive should not be a first-class citizen in Spark, it's just a data 
source and a catalog, and nothing more. We want to narrow down the scope of 
hive usage in Spark, and keeping it only in `HiveExternalCatalog` is a good 
choice.
    
    This is still an on-going effort, as @rdblue pointed out, there are still 2 
places Spark uses Hive directly:
    1. `HiveSessionStateBuilder`. It's mostly for the ADD JAR, once we move the 
ADD JAR functionality to `ExternalCatalog`, we can fix it
    2. `SaveAsHiveFile`. It's the data source part so it should be allowed to 
use Hive there. One thing we can improve is the hive client reuse. We cast 
`HiveExternalCatalog` to get the existing hive client, maybe there is a better 
way to do it without the ugly casting.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21122: [SPARK-24017] [SQL] Refactor ExternalCatalog to b...

Reply via email to