[GitHub] [spark] zhengruifeng commented on pull request #37845: [SPARK-40399][PS] Make `pearson` correlation in `DataFrame.corr` support missing values and `min_periods `

GitBox Sun, 11 Sep 2022 19:47:44 -0700


zhengruifeng commented on PR #37845:
URL: https://github.com/apache/spark/pull/37845#issuecomment-1243151940


   > Where data is missing, the correlation is really just undefined. Why not 
return NaN?
   
   `Correlation.corr` behaves like this, when a column contains `NaN`, its 
correlation with other columns are `NaN`.
   
   But Pandas-API-on-Spark should follow the behavior of Pandas, which will 
ignore the missing values, and compute the correlation based on remaining data.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #37845: [SPARK-40399][PS] Make `pearson` correlation in `DataFrame.corr` support missing values and `min_periods `

Reply via email to