> I assume that you want to tell that it is not wise to compute TP, FP, FN
and then precision and recall using cross_val_predict. If this is what you
mean, I'd like you to explain why.
Because if there is high variance as a function of training set rather than
test sample I'd like to know.
> The
Am 03.04.19 um 23:46 schrieb Joel Nothman:
Pull requests improving the documentation are always welcome. At a
minimum, users need to know that these compute different things.
Accuracy is not precision. Precision is the number of true positives
divided by the number of true positives plus
Pull requests improving the documentation are always welcome. At a minimum,
users need to know that these compute different things.
Accuracy is not precision. Precision is the number of true positives
divided by the number of true positives plus false positives. It therefore
cannot be decomposed
Am 03.04.19 um 13:59 schrieb Joel Nothman:
The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.
For a metric like MSE it will be almost identical assuming the test
sets have almost the same size.
What will
On Wed, Apr 03, 2019 at 08:54:51AM -0400, Andreas Mueller wrote:
> If the loss decomposes, the result might be different b/c of different test
> set sizes, but I'm not sure if they are "worse" in some way?
Mathematically, a cross-validation estimates a double expectation: one
expectation on the
On 4/3/19 7:59 AM, Joel Nothman wrote:
The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.
For a metric like MSE it will be almost identical assuming the test
sets have almost the same size. For
The equations in Murphy and Hastie very clearly assume a metric
decomposable over samples (a loss function). Several popular metrics
are not.
For a metric like MSE it will be almost identical assuming the test
sets have almost the same size. For something like Recall
(sensitivity) it will be
I use
sum((cross_val_predict(model, X, y) - y)**2) / len(y) (*)
to evaluate the performance of a model. This conforms with Murphy:
Machine Learning, section 6.5.3, and Hastie et al: The Elements of
Statistical Learning, eq. 7.48. However, according to the documentation
of