[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

gatorsmile Sun, 21 Jan 2018 15:20:00 -0800

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20087#discussion_r162802793
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ---
    @@ -55,18 +55,28 @@ private[hive] trait SaveAsHiveFile extends 
DataWritingCommand {
           customPartitionLocations: Map[TablePartitionSpec, String] = 
Map.empty,
           partitionAttributes: Seq[Attribute] = Nil): Set[String] = {
     
    -    val isCompressed = hadoopConf.get("hive.exec.compress.output", 
"false").toBoolean
    +    val isCompressed =
    +      
fileSinkConf.getTableInfo.getOutputFileFormatClassName.toLowerCase(Locale.ROOT) 
match {
    +        case formatName if formatName.endsWith("orcoutputformat") =>
    +          // For ORC,"mapreduce.output.fileoutputformat.compress",
    +          // "mapreduce.output.fileoutputformat.compress.codec", and
    +          // "mapreduce.output.fileoutputformat.compress.type"
    +          // have no impact because it uses table properties to store 
compression information.
    --- End diff --
    
    Although this is the existing behavior, but could you investigate how Hive 
behaves when `Parquet.Compress`  is set. 
https://issues.apache.org/jira/browse/HIVE-7858 Is it the same as ORC?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compres...

Reply via email to