[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-17 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @srowen, I have added the parameter to control the feature selection type. The usage is like this: **var selector = new ChiSqSelector() var model = selector.fit(df) // by default, the

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-14 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi @avulanov . In general, FPR feature selection should not modify the code of existing ChiSqSelector, as we have implemented in this PR. But if we need to reuse the ChiSqTestResult

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-12 Thread avulanov
Github user avulanov commented on the issue: https://github.com/apache/spark/pull/14597 Yes, it seems that index sort can be done inside the model. With regards to the sort by p-value, I have taken a brief look at chi-squared feature selection in sci-kit and Weka, and they don't seem

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14597 OK, makes sense @avulanov though I'm not sure why the model can't sort the indices if it requires this as an internal detail. No big deal. After this change it may not matter. Conceptually though,

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-12 Thread avulanov
Github user avulanov commented on the issue: https://github.com/apache/spark/pull/14597 @srowen I've checked our thread with @mengxr on that feature https://github.com/apache/spark/pull/1484. - We preserve the order of indexes to make the selection of features with one loop

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-11 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/14597 Hi, @srowen , I can modify the implementation in .ml to accommodate the new params. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #14597: [SPARK-17017][MLLIB] add a chiSquare Selector based on F...

2016-08-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14597 This would also need to modify the implementation in `.ml` to somehow accommodate the new params. --- If your project is set up for it, you can reply to this email and have your reply appear on