ExpandingGroupby

GitBox Tue, 13 Sep 2022 21:12:52 -0700


itholic commented on code in PR #37836:
URL: https://github.com/apache/spark/pull/37836#discussion_r970207129



##########
python/pyspark/pandas/window.py:
##########
@@ -561,6 +573,101 @@ def mean(self) -> FrameLike:
         """
         return super().mean()
 
+    def quantile(self, quantile: float, accuracy: int = 10000) -> FrameLike:
+        """
+        Calculate the rolling quantile of the values.
+
+        .. versionadded:: 3.4.0
+
+        Parameters
+        ----------
+        quantile : float
+            Value between 0 and 1 providing the quantile to compute.
+        accuracy : int, optional
+            Default accuracy of approximation. Larger value means better 
accuracy.
+            The relative error can be deduced by 1.0 / accuracy.
+            This is a panda-on-Spark specific parameter.
+
+        Returns
+        -------
+        Series or DataFrame
+            Returned object type is determined by the caller of the rolling
+            calculation.
+
+        Notes
+        -----
+        `quantile` in pandas-on-Spark are using distributed percentile 
approximation
+        algorithm unlike pandas, the result might different with pandas, also 
`interpolation`
+        parameters are not supported yet.

Review Comment:
   nit: "parameters are not supported yet" -> "parameter is not supported yet" ?
   
   Also can we comment it as "TODO" above function definition ?
   
   e.g.
   
   ```python
       # TODO: support `interpolation` parameter.
       def quantile(self, quantile: float, accuracy: int = 10000) -> FrameLike:
           ...
   ```



##########
python/pyspark/pandas/window.py:
##########
@@ -561,6 +573,101 @@ def mean(self) -> FrameLike:
         """
         return super().mean()
 
+    def quantile(self, quantile: float, accuracy: int = 10000) -> FrameLike:
+        """
+        Calculate the rolling quantile of the values.
+
+        .. versionadded:: 3.4.0
+
+        Parameters
+        ----------
+        quantile : float
+            Value between 0 and 1 providing the quantile to compute.
+        accuracy : int, optional
+            Default accuracy of approximation. Larger value means better 
accuracy.
+            The relative error can be deduced by 1.0 / accuracy.
+            This is a panda-on-Spark specific parameter.
+
+        Returns
+        -------
+        Series or DataFrame
+            Returned object type is determined by the caller of the rolling
+            calculation.
+
+        Notes
+        -----
+        `quantile` in pandas-on-Spark are using distributed percentile 
approximation
+        algorithm unlike pandas, the result might different with pandas, also 
`interpolation`
+        parameters are not supported yet.
+
+        the current implementation of this API uses Spark's Window without
+        specifying partition specification. This leads to move all data into
+        single partition in single machine and could cause serious
+        performance degradation. Avoid this method against very large dataset.
+
+        See Also
+        --------
+        Series.rolling : Calling object with Series data.
+        DataFrame.rolling : Calling object with DataFrames.
+        Series.quantile : Equivalent method for Series.
+        DataFrame.quantile : Equivalent method for DataFrame.

Review Comment:
   Follow the description from pandas' ?
   
   <img width="548" alt="Screen Shot 2022-09-14 at 1 08 58 PM" 
src="https://user-images.githubusercontent.com/44108233/190057481-0c0c239c-674d-4189-8625-41af2664ae2e.png";>
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a diff in pull request #37836: [SPARK-40339][SPARK-40342][SPARK-40345][SPARK-40348][PS] Implement quantile in Rolling/RollingGroupby/Expanding/ExpandingGroupby

Reply via email to