[GitHub] [spark] HyukjinKwon commented on a change in pull request #27853: [SPARK-30941][PySpark] Add a note to asDict to document its behavior when there are duplicate fields

GitBox Mon, 09 Mar 2020 00:59:47 -0700

HyukjinKwon commented on a change in pull request #27853: 
[SPARK-30941][PySpark] Add a note to asDict to document its behavior when there 
are duplicate fields
URL: https://github.com/apache/spark/pull/27853#discussion_r389506003


 ##########
 File path: python/pyspark/sql/types.py
 ##########
 @@ -1528,6 +1528,12 @@ def asDict(self, recursive=False):
 
         :param recursive: turns the nested Rows to dict (default: False).
 
+        NOTE: If a row contains duplicate field names, e.g., the rows of a join
+        between two :class:`DataFrame` that both have the fields of same names,
+        ``asDict`` will return the rightmost value among the duplicate fields. 
In
+        contrast, ``__getitem__`` will return the leftmost value among the 
duplicate
 
 Review comment:
   @viirya, this is a nice description but what about saying like one of 
duplicate fields will be selected? I was wondering if we should fix such 
rightmost or leftmost order between `asDict` and `__getitem__` in the future, 
and if it's better to avoid saying the rightmost and leftmost.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27853: [SPARK-30941][PySpark] Add a note to asDict to document its behavior when there are duplicate fields

Reply via email to