[GitHub] [spark] itholic commented on a change in pull request #33634: [SPARK-36369] Fix Index.union to follow pandas 1.3

GitBox Wed, 04 Aug 2021 16:55:53 -0700


itholic commented on a change in pull request #33634:
URL: https://github.com/apache/spark/pull/33634#discussion_r683033391




##########
File path: python/pyspark/pandas/indexes/base.py
##########
@@ -2235,6 +2235,24 @@ def union(
         """
         Form the union of two Index objects.
 
+        .. note:: For duplicated values, pandas chooses the number of 
duplicates of self or other
+            with more duplicates. But counting all duplicates is very 
expensive for large data,
+            so pandas-on-Spark always chooses the number of duplicates in self.

Review comment:
       They mentioned it's bug fix in their release note at 
https://pandas.pydata.org/pandas-docs/dev/whatsnew/v1.3.0.html#indexing.
   
   Let me try the solution @ueshin commented 
https://github.com/apache/spark/pull/33634#discussion_r682859453




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a change in pull request #33634: [SPARK-36369] Fix Index.union to follow pandas 1.3

Reply via email to