Hi everyone! I am a data scientist new to Spark and I am interested on clustering of mixed variables. I am more used to R, where there are implementations like Daysy, PAM, etc. It is true that dummy variables along with K-Means can perform a nice job on clustering mixed variables, but I find this is not a completely correct treatment for the categorical ones. So, my question is if there is any K-modes/k-prototypes implementation planned to be included in MLlib in the future.
I have been able to find this https://issues.apache.org/jira/browse/SPARK-4510 but it seems PAM is not completely scalable. Perhaps K-prototypes could fit better. Regards,