[GitHub] spark pull request #21296: [SPARK-24244][SQL] Passing only required columns ...

MaxGekk Sun, 13 May 2018 12:10:14 -0700

Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21296#discussion_r187810542
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
    @@ -1322,4 +1322,31 @@ class CSVSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils with Te
         val sampled = spark.read.option("inferSchema", 
true).option("samplingRatio", 1.0).csv(ds)
         assert(sampled.count() == ds.count())
       }
    +
    +  test("SPARK-24244: Select a little of many columns") {
    --- End diff --
    
    I added the test to check that requesting only subset of all columns works 
correctly. And to check the case when ordering of fields in required schema is 
different from the data schema. Previously I had a concern that if I select 
columns in different order  like `select('f15, 'f10, 'f5)`, I will get the 
required schema with the same field order. It seems the required schema has the 
same order as data schema. That's why I removed 
https://github.com/apache/spark/pull/21296/files/a4a0a549156a15011c33c7877a35f244d75b7a4f#diff-d19881aceddcaa5c60620fdcda99b4c4L214



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21296: [SPARK-24244][SQL] Passing only required columns ...

Reply via email to