[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

gatorsmile Thu, 16 Aug 2018 16:07:09 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21909#discussion_r210767018
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
    @@ -2223,21 +2223,31 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
         checkAnswer(jsonDF, Seq(Row("Chris", "Baird")))
       }
     
    -
       test("SPARK-23723: specified encoding is not matched to actual 
encoding") {
    -    val fileName = "test-data/utf16LE.json"
    -    val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
    -    val exception = intercept[SparkException] {
    -      spark.read.schema(schema)
    -        .option("mode", "FAILFAST")
    -        .option("multiline", "true")
    -        .options(Map("encoding" -> "UTF-16BE"))
    -        .json(testFile(fileName))
    -        .count()
    +    def doCount(bypassParser: Boolean, multiLine: Boolean): Long = {
    +      var result: Long = -1
    +      withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> 
bypassParser.toString) {
    +        val fileName = "test-data/utf16LE.json"
    +        val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
    +        result = spark.read.schema(schema)
    +          .option("mode", "FAILFAST")
    --- End diff --
    
    This sounds good! Let us enable it only when PERMISSIVE is on. You know, 
our default mode is PERMISSIVE. This should benefit most users.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

Reply via email to