[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

MaxGekk Sun, 29 Jul 2018 08:25:06 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21909#discussion_r205977956
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -450,7 +450,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
             input => rawParser.parse(input, createParser, 
UTF8String.fromString),
             parsedOptions.parseMode,
             schema,
    -        parsedOptions.columnNameOfCorruptRecord)
    +        parsedOptions.columnNameOfCorruptRecord,
    +        optimizeEmptySchema = true)
    --- End diff --
    
    Here can be only one JSON object of struct type per input string. Don't see 
any reasons to turn the optimization off. Maybe you have some examples when the 
optimization doesn't work correctly?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

Reply via email to