Github user mouendless commented on the pull request:
https://github.com/apache/spark/pull/13133#issuecomment-219405791
@srowen Thanks for taking time reviewing and sry for style problem, I will
re-check for it.
But logically, I don't ignore the point distance to previous centers,
KMeans.pointCost(curCenters, p) computes the shortest distance between p
towards all the current centers. As during the initial center picking phases,
the centers do not move but new ones are added. We can keep in memory for every
point the closest distance towards its closest center, and each step compare it
with the distance towards new center. If the new one is closer, update it.
I compare the result of sum and cumulativeScore computed by the two
version, they are totally the same, and that are the only things may affect the
reuslt correctness in the new version. I also compared the SE computed from
this two version, they are still the same. Therefore the same result with less
time in experiment in my opinion
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]