Github user sethah commented on the issue:
https://github.com/apache/spark/pull/17556
I don't mind the weighted midpoints. However, if for a continuous feature
we find that many points have the exact same value, we are assuming we may find
data points in the test set that are close to but not these same values. But
since our train data was clustered at these particular values, perhaps it's not
a good assumption. I could live with either method, but maybe a slight
preference to match the other libraries.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]