zhengruifeng commented on issue #26832: [SPARK-30202][ML][PYSPARK] impl 
QuantileTransform
URL: https://github.com/apache/spark/pull/26832#issuecomment-564963081
 
 
   @srowen Sorry to not discuess this alg before this PR.
   
   Distribution transformation is widely used in traditional statistical 
analysis, and it is impled in many tools like sklearn/r/sas/spss/matlab/etc. 
   Recently my team encounts some cases in which gaussian distribution is a 
strict prerequisite. (Actually many algorithms implicitly suppose a gaussian 
distribution dataset)
   
   There is a famous method named 
`[BOX-COX](https://en.wikipedia.org/wiki/Power_transform#Box%E2%80%93Cox_transformation)`
 for this,  however it seems hard to implement in a distributed system (I am 
still working on it), so I switch to this non-parametric methods. I think it 
will be helpful since no such method is provided now.
   
   > I like some of the functionality we're adding to MLlib even though it's 
kind of in maintenance mode, but don't want it to get too sprawling.
   
   Yes, I think we may discuss some roadmap in JIRA. However, I notice that the 
bandwidth for MLLIB is quite limited.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to