I'm looking for a simple way to get a small pipeline for choosing a
parameter using a modification of CV for regression type problems.

The modification is pretty simple, so, for squared-error or logistic
deviance, it is a simple modification of the score of `Y` (binary labels)
and `X.dot(beta)` (linear predictor).

I've been trying to understand how to use sklearn for this as there is no
need for me to rewrite the basic CV functions. I'd like to be able to use
my own custom estimator (so I guess I just need a subclass of BaseEstimator
with a `fit` method with (X,y) signature?), as well as my own modification
of the score.

I'd be happy to understand the code for an estimator whose fit returns
`np.zeros(X.shape[1])` and a given scoring function like

def score(estimator, X_test, y_test):
     beta = estimator.parameters_ # which is just a zero vector for my
estimator -- I guess this is the way I should extract the linear
                                                     # predictor
     linpred = X_test.dot(beta)

     #or maybe?
     linpred = estimator.transform(X_test)

     return np.linalg.norm(y_test - linpred)

This would not be an interesting model, but it would help me understand how
things are evaluated in the CV loop. I have read how to create a custom
scorer in the docs but it does not seem to describe what `estimator` will
be inside the CV loop. I presume a custom scorer will get called with
values X_test and y_test and I suppose estimator will be a model fit to
X_train and y_train?
-- 
Jonathan Taylor
Dept. of Statistics
Sequoia Hall, 137
390 Serra Mall
Stanford, CA 94305
Tel:   650.723.9230
Fax:   650.725.8977
Web: http://www-stat.stanford.edu/~jtaylo
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to