[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

dmateusp Mon, 30 Jul 2018 01:08:35 -0700

Github user dmateusp commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21909#discussion_r206045407
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
             input => rawParser.parse(input, createParser, 
UTF8String.fromString),
             parsedOptions.parseMode,
             schema,
    -        parsedOptions.columnNameOfCorruptRecord)
    +        parsedOptions.columnNameOfCorruptRecord,
    +        optimizeEmptySchema = true)
    --- End diff --
    
    No, no I'm just wondering since you made it a parameter that you can turn 
off and on, what would be the case to turn it off?
    
    If there is none, shouldn't we just get rid of the parameter altogether ?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

Reply via email to