Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20280#discussion_r182569705 --- Diff: python/pyspark/sql/tests.py --- @@ -2306,18 +2306,20 @@ def test_toDF_with_schema_string(self): self.assertEqual(df.schema.simpleString(), "struct<key:string,value:string>") self.assertEqual(df.collect(), [Row(key=str(i), value=str(i)) for i in range(100)]) - # field names can differ. - df = rdd.toDF(" a: int, b: string ") --- End diff -- I still think this test is invalid because it only works as the `Row`s get serialized and lose the `__from_dict__` flag. This should be the same think, but would fail: ``` df = spark.createDataFrame([Row(key=i, value=str(i)) for i in range(100)], schema=" a: int, b: string ") ``` And it would also fail if the named args of the `Row` objects were not in the same alphabetical order as the schema: ``` data = [Row(z=i, y=str(i)) for i in range(100)] rdd = self.sc.parallelize(data, 5) df = rdd.toDF(" a: int, b: string ") ``` This fails because the Row fields would be sorted in a different order, switching the type order.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org