[GitHub] [spark] ueshin commented on a change in pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

GitBox Wed, 15 Sep 2021 17:59:27 -0700


ueshin commented on a change in pull request #33929:
URL: https://github.com/apache/spark/pull/33929#discussion_r709674753




##########
File path: python/pyspark/pandas/frame.py
##########
@@ -6631,23 +6631,25 @@ def droplevel(
     def drop(
         self,
         labels: Optional[Union[Name, List[Name]]] = None,
-        axis: Axis = 1,
+        axis: Optional[Axis] = 0,

Review comment:
       Actually this is a behavior change between 3.2 and 3.3.
   Shall we add `versionchanged` directive with explanation in the docstring, 
and document this in a migration guide?

##########
File path: python/pyspark/pandas/frame.py
##########
@@ -6695,53 +6706,96 @@ def drop(
            x  y  z  w
         0  1  3  5  7
         1  2  4  6  8
-        >>> df.drop('a')  # doctest: +NORMALIZE_WHITESPACE
+        >>> df.drop(labels='a', axis=1)  # doctest: +NORMALIZE_WHITESPACE
            b
            z  w
         0  5  7
         1  6  8
 
         Notes
         -----
-        Currently only axis = 1 is supported in this function,
-        axis = 0 is yet to be implemented.
+        Currently, dropping rows of a MultiIndex DataFrame is not supported 
yet.
         """
         if labels is not None:
+            if index is not None or columns is not None:
+                raise ValueError("Cannot specify both 'labels' and 
'index'/'columns'")
             axis = validate_axis(axis)
             if axis == 1:
-                return self.drop(columns=labels)
-            raise NotImplementedError("Drop currently only works for axis=1")
-        elif columns is not None:
-            if is_name_like_tuple(columns):
-                columns = [columns]
-            elif is_name_like_value(columns):
-                columns = [(columns,)]
+                return self.drop(index=index, columns=labels)
             else:
-                columns = [col if is_name_like_tuple(col) else (col,) for col 
in columns]
-            drop_column_labels = set(
-                label
-                for label in self._internal.column_labels
-                for col in columns
-                if label[: len(col)] == col
-            )
-            if len(drop_column_labels) == 0:
-                raise KeyError(columns)
+                return self.drop(index=labels, columns=columns)
+        else:
+            if index is None and columns is None:
+                raise ValueError("Need to specify at least one of 'labels' or 
'columns' or 'index'")
+
+            internal = self._internal
+            if index is not None:
+                if is_name_like_tuple(index) or is_name_like_value(index):
+                    index = [index]
+
+                if len(index) > 0:
+                    if internal.index_level == 1:
+                        internal = internal.resolved_copy
+
+                        if len(index) <= ps.get_option("compute.isin_limit"):
+                            self_index_type = 
self.index.to_series().spark.data_type

Review comment:
       nit: `self.index.spark.data_type` should be fine?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ueshin commented on a change in pull request #33929: [SPARK-36618][PYTHON] Support dropping rows of a single-indexed DataFrame

Reply via email to