Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @srowen, I have added the parameter to control the feature selection
type.
The usage is like this:
**var selector = new ChiSqSelector()
var model = selector.fit(df) // by default, the
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi @avulanov . In general, FPR feature selection should not modify the
code of existing ChiSqSelector, as we have implemented in this PR. But if we
need to reuse the ChiSqTestResult
Github user avulanov commented on the issue:
https://github.com/apache/spark/pull/14597
Yes, it seems that index sort can be done inside the model. With regards to
the sort by p-value, I have taken a brief look at chi-squared feature selection
in sci-kit and Weka, and they don't seem
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14597
OK, makes sense @avulanov though I'm not sure why the model can't sort the
indices if it requires this as an internal detail. No big deal. After this
change it may not matter. Conceptually though,
Github user avulanov commented on the issue:
https://github.com/apache/spark/pull/14597
@srowen I've checked our thread with @mengxr on that feature
https://github.com/apache/spark/pull/1484.
- We preserve the order of indexes to make the selection of features with
one loop
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/14597
Hi, @srowen , I can modify the implementation in .ml to accommodate the new
params. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14597
This would also need to modify the implementation in `.ml` to somehow
accommodate the new params.
---
If your project is set up for it, you can reply to this email and have your
reply appear on