zhengruifeng commented on PR #37845: URL: https://github.com/apache/spark/pull/37845#issuecomment-1243151940
> Where data is missing, the correlation is really just undefined. Why not return NaN? `Correlation.corr` behaves like this, when a column contains `NaN`, its correlation with other columns are `NaN`. But Pandas-API-on-Spark should follow the behavior of Pandas, which will ignore the missing values, and compute the correlation based on remaining data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
