HyukjinKwon commented on a change in pull request #27853:
[SPARK-30941][PySpark] Add a note to asDict to document its behavior when there
are duplicate fields
URL: https://github.com/apache/spark/pull/27853#discussion_r389506003
##########
File path: python/pyspark/sql/types.py
##########
@@ -1528,6 +1528,12 @@ def asDict(self, recursive=False):
:param recursive: turns the nested Rows to dict (default: False).
+ NOTE: If a row contains duplicate field names, e.g., the rows of a join
+ between two :class:`DataFrame` that both have the fields of same names,
+ ``asDict`` will return the rightmost value among the duplicate fields.
In
+ contrast, ``__getitem__`` will return the leftmost value among the
duplicate
Review comment:
@viirya, this is a nice description but what about saying like one of
duplicate fields will be selected? I was wondering if we should fix such
rightmost or leftmost order between `asDict` and `__getitem__` in the future,
and if it's better to avoid saying the rightmost and leftmost.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]