Re: [PR] [SPARK-46306][PS] Fix `LocIndexer` to work properly when the key is missing [spark]

via GitHub Thu, 07 Dec 2023 20:36:29 -0800


itholic commented on code in PR #44236:
URL: https://github.com/apache/spark/pull/44236#discussion_r1419917002



##########
python/pyspark/pandas/indexing.py:
##########
@@ -563,6 +563,16 @@ def __getitem__(self, key: Any) -> Union["Series", 
"DataFrame"]:
         else:
             psdf_or_psser = psdf
 
+        if isinstance(key, list):
+            result_index = psdf_or_psser.index
+            if len(key) != len(result_index):
+                # Since the result Index size is expected to be small,
+                # we can collect data for checking missing key to follow the 
behavior of Pandas.
+                result_index_list = result_index.index.tolist()
+                for item in key:
+                    if item not in result_index_list:
+                        raise KeyError(f"{item} not in index")

Review Comment:
   I used the built-in error since we don't apply the error class for 
`pyspark.pandas` module yet. But maybe shall we start migration for 
`pyspark.pandas` as well from now on??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46306][PS] Fix `LocIndexer` to work properly when the key is missing [spark]

Reply via email to