GitHub user BryanCutler opened a pull request:
https://github.com/apache/spark/pull/20280
[SPARK-22232][PYTHON][SQL] Fixed Row pickling to include __from_dict__ flag
## What changes were proposed in this pull request?
When a `Row` object is created using kwargs, the order of the keywords can
not be relied upon (except for Python 3.5 that uses an OrderedDict). The
fields are sorted in the constructor and a flag `__from_dict__` is set to
indicate that this object was created from kwargs so that other areas in Spark
can access row data using field names instead of by position. This change
includes the `__from_dict__` flag only when pickling a Row that was made from
kwargs so that the behavior is preserved if the Row becomes pickled.
## How was this patch tested?
Fixed existing tests that relied on fields and schema being in the same
alphabetical order. Added new test to create `Row` from positional arguments
where order matters.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BryanCutler/spark
pyspark-Row-serialize-SPARK-22232
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20280.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20280
----
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]