[GitHub] [spark] HyukjinKwon commented on a change in pull request #33998: [SPARK-36769][PYTHON] Improve `filter` of single-indexed DataFrame

GitBox Thu, 16 Sep 2021 17:30:52 -0700


HyukjinKwon commented on a change in pull request #33998:
URL: https://github.com/apache/spark/pull/33998#discussion_r710607862




##########
File path: python/pyspark/pandas/frame.py
##########
@@ -9974,13 +9974,25 @@ def filter(
                 raise ValueError("items should be a list-like object.")
             if axis == 0:
                 if len(index_scols) == 1:
-                    col = None
-                    for item in items:
-                        if col is None:
-                            col = index_scols[0] == SF.lit(item)
-                        else:
-                            col = col | (index_scols[0] == SF.lit(item))
-                elif len(index_scols) > 1:
+                    if len(items) <= ps.get_option("compute.isin_limit"):
+                        col = index_scols[0].isin([SF.lit(item) for item in 
items])
+                        return DataFrame(self._internal.with_filter(col))
+                    else:
+                        item_sdf_col = verify_temp_column_name(

Review comment:
       Hmmm .. why should be boradcast? can;t we just fallback to the original 
logic?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33998: [SPARK-36769][PYTHON] Improve `filter` of single-indexed DataFrame

Reply via email to