[GitHub] [spark] xinrong-databricks commented on a change in pull request #35782: [SPARK-38479][PYTHON] Add `Series.duplicated` to indicate duplicate Series values

GitBox Mon, 28 Mar 2022 10:23:27 -0700


xinrong-databricks commented on a change in pull request #35782:
URL: https://github.com/apache/spark/pull/35782#discussion_r836670472




##########
File path: python/pyspark/pandas/series.py
##########
@@ -1647,6 +1647,83 @@ def to_list(self) -> List:
 
     tolist = to_list
 
+    def duplicated(self, keep: Union[bool, str] = "first") -> "Series":
+        """
+        Indicate duplicate Series values.
+
+        Duplicated values are indicated as ``True`` values in the resulting
+        Series. Either all duplicates, all except the first or all except the
+        last occurrence of duplicates can be indicated.
+
+        Parameters
+        ----------
+        keep : {'first', 'last', False}, default 'first'
+            Method to handle marking duplicates:
+            - 'first' : Mark duplicates as ``True`` except for the first
+              occurrence.
+            - 'last' : Mark duplicates as ``True`` except for the last
+              occurrence.
+            - ``False`` : Mark all duplicates as ``True``.
+
+        Returns
+        -------
+        Series
+            Series indicating whether each value has occurred in the
+            preceding values
+
+        See Also
+        --------
+        Index.duplicated : Equivalent method on pandas.Index.

Review comment:
       Good catch! Modified.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xinrong-databricks commented on a change in pull request #35782: [SPARK-38479][PYTHON] Add `Series.duplicated` to indicate duplicate Series values

Reply via email to