[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

yhuai Wed, 11 May 2016 12:28:49 -0700

Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12935#discussion_r62910625
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -71,66 +71,7 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
           override def load(in: QualifiedTableName): LogicalPlan = {
             logDebug(s"Creating new cached data source for $in")
             val table = client.getTable(in.database, in.name)
    -
    -        def schemaStringFromParts: Option[String] = {
    -          table.properties.get("spark.sql.sources.schema.numParts").map { 
numParts =>
    -            val parts = (0 until numParts.toInt).map { index =>
    -              val part = 
table.properties.get(s"spark.sql.sources.schema.part.$index").orNull
    -              if (part == null) {
    -                throw new AnalysisException(
    -                  "Could not read schema from the metastore because it is 
corrupted " +
    -                    s"(missing part $index of the schema, $numParts parts 
are expected).")
    -              }
    -
    -              part
    -            }
    -            // Stick all parts back to a single schema string.
    -            parts.mkString
    -          }
    -        }
    -
    -        def getColumnNames(colType: String): Seq[String] = {
    -          
table.properties.get(s"spark.sql.sources.schema.num${colType.capitalize}Cols").map
 {
    -            numCols => (0 until numCols.toInt).map { index =>
    -              
table.properties.getOrElse(s"spark.sql.sources.schema.${colType}Col.$index",
    -                throw new AnalysisException(
    -                  s"Could not read $colType columns from the metastore 
because it is corrupted " +
    -                    s"(missing part $index of it, $numCols parts are 
expected)."))
    -            }
    -          }.getOrElse(Nil)
    -        }
    -
    -        // Originally, we used spark.sql.sources.schema to store the 
schema of a data source table.
    -        // After SPARK-6024, we removed this flag.
    -        // Although we are not using spark.sql.sources.schema any more, we 
need to still support.
    -        val schemaString =
    -          
table.properties.get("spark.sql.sources.schema").orElse(schemaStringFromParts)
    -
    -        val userSpecifiedSchema =
    -          schemaString.map(s => 
DataType.fromJson(s).asInstanceOf[StructType])
    -
    -        // We only need names at here since userSpecifiedSchema we loaded 
from the metastore
    -        // contains partition columns. We can always get datatypes of 
partitioning columns
    -        // from userSpecifiedSchema.
    -        val partitionColumns = getColumnNames("part")
    -
    -        val bucketSpec = 
table.properties.get("spark.sql.sources.schema.numBuckets").map { n =>
    -          BucketSpec(n.toInt, getColumnNames("bucket"), 
getColumnNames("sort"))
    -        }
    -
    -        val options = table.storage.serdeProperties
    -        val dataSource =
    -          DataSource(
    -            sparkSession,
    -            userSpecifiedSchema = userSpecifiedSchema,
    -            partitionColumns = partitionColumns,
    -            bucketSpec = bucketSpec,
    -            className = table.properties("spark.sql.sources.provider"),
    -            options = options)
    -
    -        LogicalRelation(
    -          dataSource.resolveRelation(),
    -          metastoreTableIdentifier = Some(TableIdentifier(in.name, 
Some(in.database))))
    --- End diff --
    
    I am very inclined to revert the changes in this file. This part of the 
code is on the critical path of resolving a table and the code we add is for a 
new code path and a command. It will be great if we can isolate these two parts 
for now. (we will remove HiveMetastoreCatalog later. We can do the cleanup this 
part. We can leave a comment at here)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-15160][SQL] support data source table i...

Reply via email to