Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20280#discussion_r182569705
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2306,18 +2306,20 @@ def test_toDF_with_schema_string(self):
             self.assertEqual(df.schema.simpleString(), 
"struct<key:string,value:string>")
             self.assertEqual(df.collect(), [Row(key=str(i), value=str(i)) for 
i in range(100)])
     
    -        # field names can differ.
    -        df = rdd.toDF(" a: int, b: string ")
    --- End diff --
    
    I still think this test is invalid because it only works as the `Row`s get 
serialized and lose the `__from_dict__` flag.  This should be the same think, 
but would fail:
    
    ```
    df = spark.createDataFrame([Row(key=i, value=str(i)) for i in range(100)], 
schema=" a: int, b: string ")
    ```
    And it would also fail if the named args of the `Row` objects were not in 
the same alphabetical order as the schema:
    
    ```
    data = [Row(z=i, y=str(i)) for i in range(100)]
    rdd = self.sc.parallelize(data, 5)
    
    df = rdd.toDF(" a: int, b: string ")
    ```
    This fails because the Row fields would be sorted in a different order, 
switching the type order.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to