I'm looking for a simple way to get a small pipeline for choosing a parameter using a modification of CV for regression type problems.
The modification is pretty simple, so, for squared-error or logistic deviance, it is a simple modification of the score of `Y` (binary labels) and `X.dot(beta)` (linear predictor). I've been trying to understand how to use sklearn for this as there is no need for me to rewrite the basic CV functions. I'd like to be able to use my own custom estimator (so I guess I just need a subclass of BaseEstimator with a `fit` method with (X,y) signature?), as well as my own modification of the score. I'd be happy to understand the code for an estimator whose fit returns `np.zeros(X.shape[1])` and a given scoring function like def score(estimator, X_test, y_test): beta = estimator.parameters_ # which is just a zero vector for my estimator -- I guess this is the way I should extract the linear # predictor linpred = X_test.dot(beta) #or maybe? linpred = estimator.transform(X_test) return np.linalg.norm(y_test - linpred) This would not be an interesting model, but it would help me understand how things are evaluated in the CV loop. I have read how to create a custom scorer in the docs but it does not seem to describe what `estimator` will be inside the CV loop. I presume a custom scorer will get called with values X_test and y_test and I suppose estimator will be a model fit to X_train and y_train? -- Jonathan Taylor Dept. of Statistics Sequoia Hall, 137 390 Serra Mall Stanford, CA 94305 Tel: 650.723.9230 Fax: 650.725.8977 Web: http://www-stat.stanford.edu/~jtaylo
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn