Hi everyone!

I am a data scientist new to Spark and I am interested on clustering of
mixed variables. I am more used to R, where there are implementations like
Daysy, PAM, etc. It is true that dummy variables along with K-Means can
perform a nice job on clustering mixed variables, but I find this is not a
completely correct treatment for the categorical ones. So, my question is
if there is any K-modes/k-prototypes implementation planned to be included
in MLlib in the future.

I have been able to find this
https://issues.apache.org/jira/browse/SPARK-4510 but it seems PAM is not
completely scalable. Perhaps K-prototypes could fit better.

Regards,

Reply via email to