Github user imatiach-msft commented on a diff in the pull request:
https://github.com/apache/spark/pull/17084#discussion_r123924688
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
---
@@ -36,12 +36,18 @@ import org.apache.spark.sql.types.DoubleType
@Since("1.2.0")
@Experimental
class BinaryClassificationEvaluator @Since("1.4.0") (@Since("1.4.0")
override val uid: String)
- extends Evaluator with HasRawPredictionCol with HasLabelCol with
DefaultParamsWritable {
+ extends Evaluator with HasRawPredictionCol with HasLabelCol
+ with HasWeightCol with DefaultParamsWritable {
@Since("1.2.0")
def this() = this(Identifiable.randomUID("binEval"))
/**
+ * Default number of bins to use for binary classification evaluation.
+ */
+ val defaultNumberOfBins = 1000
--- End diff --
It seemed like a good default value to use - for graphing ROC curve, it's
not too large for most plots, but it's not so small that the graph would be
jagged. The user can always specify a value to override the default. However,
it's usually not a good idea to sort over the entire label/score values, since
the dataset will probably be very large, the operation will be very slow, and
when visualizing the data there won't be any difference, so by default we
should try to discourage the user from not down-sampling the number of bins.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]