[GitHub] [spark] dgd-contributor commented on a change in pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

GitBox Tue, 07 Sep 2021 00:15:46 -0700


dgd-contributor commented on a change in pull request #33858:
URL: https://github.com/apache/spark/pull/33858#discussion_r703242599




##########
File path: python/pyspark/pandas/series.py
##########
@@ -4475,6 +4477,146 @@ def replace(
 
         return self._with_new_scol(current)  # TODO: dtype?
 
+    def combine(
+        self,
+        other: "Series",
+        func: Callable,
+        fill_value: Optional[Any] = None,
+        return_type: Union[Union[AtomicType, str], ArrayType] = "string",
+    ) -> "Series":
+        """
+        Combine the Series with a Series or scalar according to `func`.
+        Combine the Series and `other` using `func` to perform elementwise
+        selection for combined Series.
+        `fill_value` is assumed when value is missing at some index
+        from one of the two objects being combined.
+
+        .. versionadded:: 3.3.0
+
+        Parameters
+        ----------
+        other : Series or scalar
+            The value(s) to be combined with the `Series`.
+        func : function
+            Function that takes two scalars as inputs and returns an element.
+        fill_value : scalar, optional
+            The value to assume when an index is missing from
+            one Series or the other. The default specifies to use the
+            appropriate NaN value for the underlying dtype of the Series.
+        return_type : :class:`pyspark.sql.types.DataType` or str
+            the return type of the output Series. The value can be either a
+            :class:`pyspark.sql.types.DataType` object or a DDL-formatted type 
string.

Review comment:
       thank for you comments, I have learned a lot. If I have
   
   ``` python
   >>> pdf = pd.DataFrame({"s1": [1, 2, 3], "s2": [4, 5, 6]})
   >>> psdf = ps.from_pandas(pdf)
   
   >>> def true_div(s1: int, s2: int) -> float:
   >>>     return s1 / s2
   
   >>> pdf["s1"].combine(pdf["s2"], true_div),
   0    0.25
   1    0.40
   2    0.50
   dtype: float64
   float64
   >>> psdf["s1"].combine(psdf["s2"], true_div),
   0    0
   1    0
   2    0
   dtype: int64
   int64
   ```
   I think the output's dtype should not be the self's dtype. I stuck with 
this. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dgd-contributor commented on a change in pull request #33858: [SPARK-36402][PYTHON] Implement Series.combine

Reply via email to