Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/9965#discussion_r45916644
--- Diff: docs/ml-features.md ---
@@ -1949,3 +1949,52 @@ output.select("features", "label").show()
{% endhighlight %}
</div>
</div>
+
+## ChiSqSelector
+
+`ChiSqSelector` stands for Chi-Squared feature selection. It operates on
labeled data with
+categorical features. ChiSqSelector orders features based on a
+[Chi-Squared test of
independence](https://en.wikipedia.org/wiki/Chi-squared_test)
+from the class, and then filters (selects) the top features which the
class label depends on the
+most. This is akin to yielding the features with the most predictive power.
+
+**Examples**
+
+Assume that we have a DataFrame with the columns `id`, `features`, and
`clicked`:
--- End diff --
Explain what "clicked" means just to be clear
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]