[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

gatorsmile Mon, 06 Aug 2018 03:54:42 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21909#discussion_r207850329
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
    @@ -2225,19 +2225,21 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
     
     
       test("SPARK-23723: specified encoding is not matched to actual 
encoding") {
    -    val fileName = "test-data/utf16LE.json"
    -    val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
    -    val exception = intercept[SparkException] {
    -      spark.read.schema(schema)
    -        .option("mode", "FAILFAST")
    -        .option("multiline", "true")
    -        .options(Map("encoding" -> "UTF-16BE"))
    -        .json(testFile(fileName))
    -        .count()
    -    }
    -    val errMsg = exception.getMessage
    +    withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> "false") {
    --- End diff --
    
    How about CSV? Could you add the same one too?
    
    Also, we need to add the verification logic when the conf is true. 
    ```
    Seq(true, false).foreach { optimizeEmptySchema =>
      withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> 
optimizeEmptySchema.toString) {
      ...
    }
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

Reply via email to