[GitHub] spark pull request: [SPARK-6923][SPARK-7550][SQL] Hive MetaStore A...

chenghao-intel Tue, 28 Jul 2015 21:47:56 -0700

Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5733#discussion_r35727609
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -40,10 +41,72 @@ import org.apache.spark.sql.execution.datasources
     import 
org.apache.spark.sql.execution.datasources.{CreateTableUsingAsSelect, 
LogicalRelation, Partition => ParquetPartition, PartitionSpec, 
ResolvedDataSource}
     import org.apache.spark.sql.hive.client._
     import org.apache.spark.sql.parquet.ParquetRelation
    +import org.apache.spark.sql.sources._
     import org.apache.spark.sql.types._
     import org.apache.spark.sql.{AnalysisException, SQLContext, SaveMode}
     
     
    +private[hive] case class HiveSerDe(
    +    inputFormat: Option[String] = None,
    +    outputFormat: Option[String] = None,
    +    serde: Option[String] = None)
    +
    +private[hive] object HiveSerDe {
    +  /**
    +   * Get the Hive SerDe information from the data source abbreviation 
string or classname.
    +   *
    +   * @param source Currently the source abbreviation can be one of the 
following:
    +   *               SequenceFile, RCFile, ORC, PARQUET, and case 
insensitive.
    +   * @param hiveConf Hive Conf
    +   * @param returnDefaultFormat if true, when no matched source found,
    +   *                            the default format will be retrieved.
    +   *         Default input/output format are
    +   *         InputFormat:  org.apache.hadoop.mapred.TextInputFormat
    +   *         OutputFormat: 
org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
    +   * @return HiveSerDe associated with the specified source
    +   */
    +  def sourceToSerDe(source: String, hiveConf: HiveConf, 
returnDefaultFormat: Boolean)
    +  : Option[HiveSerDe] = {
    +    val serde = if ("SequenceFile".equalsIgnoreCase(source)) {
    +      HiveSerDe(
    +        inputFormat = 
Option("org.apache.hadoop.mapred.SequenceFileInputFormat"),
    +        outputFormat = 
Option("org.apache.hadoop.mapred.SequenceFileOutputFormat"))
    +    } else if ("RCFile".equalsIgnoreCase(source)) {
    +      HiveSerDe(
    +        inputFormat = 
Option("org.apache.hadoop.hive.ql.io.RCFileInputFormat"),
    +        outputFormat = 
Option("org.apache.hadoop.hive.ql.io.RCFileOutputFormat"),
    +        serde = 
Option(hiveConf.getVar(HiveConf.ConfVars.HIVEDEFAULTRCFILESERDE)))
    +    } else if ("ORC".equalsIgnoreCase(source) ||
    +               "org.apache.spark.sql.hive.orc.DefaultSource" == source) {
    +      HiveSerDe(
    +        inputFormat = 
Option("org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"),
    +        outputFormat = 
Option("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"),
    +        serde = Option("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
    +    } else if ("PARQUET".equalsIgnoreCase(source) ||
    +               "org.apache.spark.sql.parquet.DefaultSource" == source) {
    +      HiveSerDe(
    +        inputFormat =
    +          
Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"),
    +        outputFormat =
    +          
Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"),
    +        serde =
    +          
Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"))
    +    } else if (returnDefaultFormat) {
    +      // return default file format
    +      HiveSerDe(
    +        inputFormat =
    +          Option("org.apache.hadoop.mapred.TextInputFormat"),
    +        outputFormat =
    +          Option("org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat"))
    +    } else {
    +      // TODO we probably need to provide SerDe for the built-in format, 
like json.
    +      null
    +    }
    --- End diff --
    
    Yes, but this function will also be called by `HiveQl`, I just make it a 
shared function.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6923][SPARK-7550][SQL] Hive MetaStore A...

Reply via email to