[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37335: [SPARK-39895][PYTHON] Support multiple column drop

GitBox Thu, 28 Jul 2022 19:09:08 -0700


HyukjinKwon commented on code in PR #37335:
URL: https://github.com/apache/spark/pull/37335#discussion_r932808436



##########
python/pyspark/sql/dataframe.py:
##########
@@ -3244,10 +3244,14 @@ def drop(self, *cols: "ColumnOrName") -> "DataFrame":  
# type: ignore[misc]
             else:
                 raise TypeError("col should be a string or a Column")
         else:
-            for col in cols:
-                if not isinstance(col, str):
-                    raise TypeError("each col in the param list should be a 
string")
-            jdf = self._jdf.drop(self._jseq(cols))
+            if all(isinstance(col, str) for col in cols):
+                jdf = self._jdf.drop(self._jseq(cols))
+            elif all(isinstance(col, Column) for col in cols):
+                jdf = self._jdf
+                for col in cols:
+                    jdf = jdf.drop(col._jc)  # type: ignore[union-attr]

Review Comment:
   Can we avoid looping here? This is super expensive in Spark SQL optmizer. 
Ideally we should add the signature of `def drop(colNames: Column*` in Scala 
side first, and PySpark side directlly invokes it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37335: [SPARK-39895][PYTHON] Support multiple column drop

Reply via email to