[GitHub] [spark] ueshin commented on a change in pull request #35706: [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map`

GitBox Thu, 03 Mar 2022 10:40:54 -0800


ueshin commented on a change in pull request #35706:
URL: https://github.com/apache/spark/pull/35706#discussion_r818958752




##########
File path: python/pyspark/pandas/series.py
##########
@@ -1045,8 +1048,18 @@ def map(self, arg: Union[Dict, Callable]) -> "Series":
         2      I am a None
         3    I am a rabbit
         dtype: object
+
+        To avoid applying the function to missing values (and keep them as NaN)
+        na_action='ignore' can be used:
+
+        >>> s.map('I am a {}'.format, na_action='ignore')
+        0       I am a cat
+        1       I am a dog
+        2             None
+        3    I am a rabbit
+        dtype: object

Review comment:
       We might also want to have an example taking `pd.Series` as `arg`.

##########
File path: python/pyspark/pandas/tests/test_series.py
##########
@@ -1161,13 +1161,29 @@ def test_append(self):
     def test_map(self):
         pser = pd.Series(["cat", "dog", None, "rabbit"])
         psser = ps.from_pandas(pser)
-        # Currently Koalas doesn't return NaN as pandas does.
+
+        # dict correspondence
+        # Currently pandas API on Spark doesn't return NaN as pandas does.
         self.assert_eq(psser.map({}), pser.map({}).replace({pd.np.nan: None}))

Review comment:
       nit: shall we use `np.nan` instead of `pd.np.nan` while we are here?

##########
File path: python/pyspark/pandas/series.py
##########
@@ -992,8 +993,10 @@ def map(self, arg: Union[Dict, Callable]) -> "Series":
 
         Parameters
         ----------
-        arg : function or dict
+        arg : function, dict or pd.Series

Review comment:
       If we accept `pd.Series`, we might also want to accept `ps.Series`, 
which could be another future work.

##########
File path: python/pyspark/pandas/tests/test_series.py
##########
@@ -1161,13 +1161,29 @@ def test_append(self):
     def test_map(self):
         pser = pd.Series(["cat", "dog", None, "rabbit"])
         psser = ps.from_pandas(pser)
-        # Currently Koalas doesn't return NaN as pandas does.
+
+        # dict correspondence
+        # Currently pandas API on Spark doesn't return NaN as pandas does.
         self.assert_eq(psser.map({}), pser.map({}).replace({pd.np.nan: None}))
 
         d = defaultdict(lambda: "abc")
         self.assertTrue("abc" in repr(psser.map(d)))
         self.assert_eq(psser.map(d), pser.map(d))
 
+        # series correspondence
+        pser_to_apply = pd.Series(["one", "two", "four"], index=["cat", "dog", 
"rabbit"])
+        self.assert_eq(psser.map(pser_to_apply), pser.map(pser_to_apply))
+        self.assert_eq(
+            psser.map(pser_to_apply, na_action="ignore"),
+            pser.map(pser_to_apply, na_action="ignore"),

Review comment:
       When the correspondence is a `pd.Series`, seems like `na_action` doesn't 
have any effect?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ueshin commented on a change in pull request #35706: [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map`

Reply via email to