GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/20280

    [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include __from_dict__ flag

    ## What changes were proposed in this pull request?
    
    When a `Row` object is created using kwargs, the order of the keywords can 
not be relied upon  (except for Python 3.5 that uses an OrderedDict).  The 
fields are sorted in the constructor and a flag `__from_dict__` is set to 
indicate that this object was created from kwargs so that other areas in Spark 
can access row data using field names instead of by position.  This change 
includes the `__from_dict__` flag only when pickling a Row that was made from 
kwargs so that the behavior is preserved if the Row becomes pickled.
    
    ## How was this patch tested?
    
    Fixed existing tests that relied on fields and schema being in the same 
alphabetical order.  Added new test to create `Row` from positional arguments 
where order matters.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark 
pyspark-Row-serialize-SPARK-22232

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20280
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to