[GitHub] [spark] zhengruifeng commented on pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of skewness

GitBox Sat, 14 May 2022 19:38:05 -0700


zhengruifeng commented on PR #36554:
URL: https://github.com/apache/spark/pull/36554#issuecomment-1126847198


   befor this PR:
   
   ```
   In [2]:         pdf = pd.DataFrame(
      ...:             {
      ...:                 "A": [1, 1, 1, 1, 1],
      ...:                 "B": [1.0, np.nan, 4, 2, 5],
      ...:                 "C": [-6.0, -7, np.nan, np.nan, 10],
      ...:                 "D": [1.2, np.nan, np.nan, 9.8, np.nan],
      ...:                 "E": [1, np.nan, np.nan, np.nan, np.nan],
      ...:                 "F": [np.nan, np.nan, np.nan, np.nan, np.nan],
      ...:             }
      ...:         )
      ...:         psdf = ps.from_pandas(pdf)
   
   In [3]: 
   
   In [3]: psdf.skew()
   22/05/15 10:19:28 WARN package: Truncated the string representation of a 
plan since it was too large. This behavior can be adjusted by setting 
'spark.sql.debug.maxToStringFields'.
   Out[3]:                                                                      
   
   A             NaN
   B   -1.945901e-16
   C    1.710663e+00
   D             NaN
   E             NaN
   F             NaN
   dtype: float64
   
   In [4]: pdf.skew()
   Out[4]: 
   A    0.000000
   B    0.000000
   C    1.710663
   D         NaN
   E         NaN
   F         NaN
   dtype: float64
   
   ```
   
   
   after this PR:
   
   ```
   In [3]: psdf.skew()
   Out[3]:                                                                      
   
   A    0.000000
   B    0.000000
   C    1.710663
   D         NaN
   E         NaN
   F         NaN
   dtype: float64
   
   In [4]: pdf.skew()
   Out[4]: 
   A    0.000000
   B    0.000000
   C    1.710663
   D         NaN
   E         NaN
   F         NaN
   dtype: float64
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of skewness

Reply via email to