HyukjinKwon commented on a change in pull request #26496:
[WIP][SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark
URL: https://github.com/apache/spark/pull/26496#discussion_r352393084
##########
File path: python/pyspark/sql/types.py
##########
@@ -1463,32 +1474,43 @@ class Row(tuple):
Row(name='Alice', age=11)
This form can also be used to create rows as tuple values, i.e. with
unnamed
- fields. Beware that such Row objects have different equality semantics:
+ fields. Row objects are evaluated for equality by data values in each
+ position, field names are not compared:
>>> row1 = Row("Alice", 11)
>>> row2 = Row(name="Alice", age=11)
>>> row1 == row2
- False
- >>> row3 = Row(a="Alice", b=11)
- >>> row1 == row3
True
+ >>> row3 = Row(age=11, name="Alice")
+ >>> row2 == row3
+ False
"""
- def __new__(self, *args, **kwargs):
+ def __new__(cls, *args, **kwargs):
+ if _legacy_row_enabled:
+ return _LegacyRow(args, kwargs)
if args and kwargs:
raise ValueError("Can not use both args "
"and kwargs to create Row")
+ if sys.version_info[:2] < (3, 6):
+ # Remove after Python < 3.6 dropped
+ from collections import OrderedDict
+ if kwargs:
+ raise ValueError("Named arguments are not allowed for Python
version < 3.6, "
+ "use a collections.OrderedDict instead. To
enable Spark 2.x "
+ "compatible Rows, set the environment
variable "
+ "'PYSPARK_LEGACY_ROW_ENABLED' to 'true'.")
+ elif len(args) == 1 and isinstance(args[0], OrderedDict):
+ kwargs = args[0]
+
if kwargs:
# create row objects
- names = sorted(kwargs.keys())
Review comment:
Actually, after a second thought, why don't we just have an env to switch on
and off the sorting, and disable it in Spark 3.1? I think it will cause less
changes I suspect.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]