[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

MaxGekk Sun, 29 Jul 2018 08:36:26 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21909#discussion_r205978291
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
    @@ -2233,7 +2233,7 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
             .option("multiline", "true")
             .options(Map("encoding" -> "UTF-16BE"))
             .json(testFile(fileName))
    -        .count()
    +        .collect()
    --- End diff --
    
    The test has to really touch JSON to detect encoding even without parsing. 
With this optimization `jackson` parser is not called at all in the case of 
`count()`. `collect()` guarantees that JSON parser will be invoked with wrong 
`encoding`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...

Reply via email to