[GitHub] spark pull request #22152: [SPARK-25159][SQL] json schema inference should o...

mgaido91 Tue, 21 Aug 2018 07:33:14 -0700

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22152#discussion_r211626457
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
    @@ -2528,4 +2529,27 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
           checkAnswer(aggPlusFilter1, aggPlusFilter2.collect())
         }
       }
    +
    +  test("SPARK-25159: json schema inference should only trigger one job") {
    +    withTempPath { path =>
    +      // This test is to prove that the `JsonInferSchema` does not use 
`RDD#toLocalIterator` which
    +      // triggers one Spark job per RDD partition.
    +      Seq(1 -> "a", 2 -> "b").toDF("i", "p")
    +        // The data set has 2 partitions, so Spark will write at least 2 
json files.
    +        // Use a non-splittable compression (gzip), to make sure the json 
scan RDD has at lease 2
    --- End diff --
    
    nit: `at least`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22152: [SPARK-25159][SQL] json schema inference should o...

Reply via email to