zero323 commented on code in PR #38168:
URL: https://github.com/apache/spark/pull/38168#discussion_r1032911052


##########
python/pyspark/sql/dataframe.py:
##########
@@ -2044,7 +2044,7 @@ def crossJoin(self, other: "DataFrame") -> "DataFrame":
     def join(
         self,
         other: "DataFrame",
-        on: Optional[Union[str, List[str], Column, List[Column]]] = None,
+        on: Optional[Union[str, List[str], bool, List[bool], Column, 
List[Column]]] = None,

Review Comment:
   @HyukjinKwon  As far as I can tell, there is no real difference here. The 
difference between Pandas and Spark check, is most likely related to the 
missing stubs for the former one. If I use environment without `pandas-stubs` 
things type check in PyCharm 
   
   ![without pandas 
stubs](https://user-images.githubusercontent.com/1554276/204131753-eb818c1b-9c35-4069-9fd4-abca28f0d57b.gif)
   
   If I choose one with `pandas-stubs` installed, I get 
   
   ![with pandas 
stubs](https://user-images.githubusercontent.com/1554276/204131767-c3f103c4-b8a1-4554-a1a7-723d034abd1b.gif)
   
   which is the same category of failure as for PySpark code.
   
   Given mypy as a reference, this is an expected false positive (see 
https://github.com/python/mypy/issues/2783).
   
   On a side note ‒ Pandas and PySpark joins shown on the screenshots are not 
even remotely equivalent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to