Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/20280#discussion_r182569705
--- Diff: python/pyspark/sql/tests.py ---
@@ -2306,18 +2306,20 @@ def test_toDF_with_schema_string(self):
self.assertEqual(df.schema.simpleString(),
"struct<key:string,value:string>")
self.assertEqual(df.collect(), [Row(key=str(i), value=str(i)) for
i in range(100)])
- # field names can differ.
- df = rdd.toDF(" a: int, b: string ")
--- End diff --
I still think this test is invalid because it only works as the `Row`s get
serialized and lose the `__from_dict__` flag. This should be the same think,
but would fail:
```
df = spark.createDataFrame([Row(key=i, value=str(i)) for i in range(100)],
schema=" a: int, b: string ")
```
And it would also fail if the named args of the `Row` objects were not in
the same alphabetical order as the schema:
```
data = [Row(z=i, y=str(i)) for i in range(100)]
rdd = self.sc.parallelize(data, 5)
df = rdd.toDF(" a: int, b: string ")
```
This fails because the Row fields would be sorted in a different order,
switching the type order.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]