zhengruifeng commented on issue #26832: [SPARK-30202][ML][PYSPARK] impl QuantileTransform URL: https://github.com/apache/spark/pull/26832#issuecomment-564963081 @srowen Sorry to not discuess this alg before this PR. Distribution transformation is widely used in traditional statistical analysis, and it is impled in many tools like sklearn/r/sas/spss/matlab/etc. Recently my team encounts some cases in which gaussian distribution is a strict prerequisite. (Actually many algorithms implicitly suppose a gaussian distribution dataset) There is a famous method named `[BOX-COX](https://en.wikipedia.org/wiki/Power_transform#Box%E2%80%93Cox_transformation)` for this, however it seems hard to implement in a distributed system (I am still working on it), so I switch to this non-parametric methods. I think it will be helpful since no such method is provided now. > I like some of the functionality we're adding to MLlib even though it's kind of in maintenance mode, but don't want it to get too sprawling. Yes, I think we may discuss some roadmap in JIRA. However, I notice that the bandwidth for MLLIB is quite limited.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
