[GitHub] spark pull request #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to ...

BryanCutler Wed, 18 Apr 2018 14:07:25 -0700

Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20280#discussion_r182569705
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -2306,18 +2306,20 @@ def test_toDF_with_schema_string(self):
             self.assertEqual(df.schema.simpleString(), 
"struct<key:string,value:string>")
             self.assertEqual(df.collect(), [Row(key=str(i), value=str(i)) for 
i in range(100)])
     
    -        # field names can differ.
    -        df = rdd.toDF(" a: int, b: string ")
    --- End diff --
    
    I still think this test is invalid because it only works as the `Row`s get 
serialized and lose the `__from_dict__` flag.  This should be the same think, 
but would fail:
    
    ```
    df = spark.createDataFrame([Row(key=i, value=str(i)) for i in range(100)], 
schema=" a: int, b: string ")
    ```
    And it would also fail if the named args of the `Row` objects were not in 
the same alphabetical order as the schema:
    
    ```
    data = [Row(z=i, y=str(i)) for i in range(100)]
    rdd = self.sc.parallelize(data, 5)
    
    df = rdd.toDF(" a: int, b: string ")
    ```
    This fails because the Row fields would be sorted in a different order, 
switching the type order.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to ...

Reply via email to