Re: [PR] [SPARK-48482][PYTHON][FOLLOWUP] dropDuplicates and dropDuplicatesWIthinWatermark should accept named parameter [spark]

via GitHub Wed, 21 Aug 2024 15:47:13 -0700


WweiL commented on code in PR #47835:
URL: https://github.com/apache/spark/pull/47835#discussion_r1725935837



##########
python/pyspark/sql/tests/test_dataframe.py:
##########
@@ -279,15 +282,6 @@ def test_drop_duplicates(self):
             messageParameters={"arg_name": "subset", "arg_type": "NoneType"},
         )
 
-        with self.assertRaises(PySparkTypeError) as pe:
-            df.dropDuplicates(None).show()
-
-        self.check_error(
-            exception=pe.exception,
-            errorClass="NOT_STR",
-            messageParameters={"arg_name": "subset", "arg_type": "NoneType"},
-        )
-

Review Comment:
   @itholic Please let me know if deleting this test case sounds good to you... 
   
   The way I made named parameter work is to redefine the parameter as:
   ```
   def dropDuplicates(
           self, subset: Optional[Union[str, List[str]]] = None, 
*subset_varargs: str
       ) -> ParentDataFrame:
   ```
   
   But this means that when "subset" is None, it can mean two things:
   1. dropDuplicates(None)
   2. dropDuplicates()
   
   With my change it looks it's not possible to distinguish these two...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48482][PYTHON][FOLLOWUP] dropDuplicates and dropDuplicatesWIthinWatermark should accept named parameter [spark]

Reply via email to