Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/11601#issuecomment-215719811
@hhbyyh if you have time, could you create 2 follow up JIRAs for:
* PySpark impl
* adding mode at a later date for categorical features
* investigate efficiency of approaches using DataFrame/Dataset and/or
approx approaches such as `frequentItems` or Count-Min Sketch (will require an
update to CMS to return "heavy-hitters").
* investigate if we can use metadata to only allow mode for categorical
features (or perhaps as an easier alternative, allow mode for only Int/Long
columns)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]