Hi, Piotr,
> These preprocessing steps have some parameters too, which I would like to > tune. > I know that it is possible to tune the parameters of the preprocessing steps, > if they are part pf my pipeline. > E.g. if I am using PCA, I could tune the parameter n_components, right? > > But what if I have some "custom" preprocessing code with some parameters? > Is it possible to create a scikit-compatible "object" of my custom code in > order to tune the > parameters in the pipeline with grid search? Yeah, you could use the Pipeline class or the `make_pipeline` function, then you can create a custom estimator using the BaseEstimator class like so: class CustomEstimator(BaseEstimator): def __init__(self, my_param=None): pass def fit_transform(self, X, y=None): return self.fit(X).transform(X) def transform(self, X, y=None): return X def fit(self, X, y=None): return self pipe = make_pipeline(CustomEstimator(), LogisticRegression()) grid = {'customestimator__my_param': [3], 'logisticregression__C': [0.1, 1.0, 10.0]} gsearch1 = GridSearchCV(estimator=pipe, param_grid=grid) gsearch1.fit(X, y) Then, you can put in your desired preprocessing stuff into fit and transform. Best, Sebastian > On Sep 7, 2016, at 2:03 PM, Piotr Bialecki <piotr.biale...@hotmail.de> wrote: > > Hi all, > > I am currently tuning some parameters of my xgboost model using scikit's > grid_search, e.g.: > > param_test1 = {'max_depth':range(3,10,2), > 'min_child_weight':range(1,6,2) > } > gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, > n_estimators=762, > > max_depth=5, min_child_weight=1, gamma=0, > > subsample=0.8, colsample_bytree=0.8, > > objective= 'binary:logistic', nthread=4, > > scale_pos_weight=1, seed=2809), > param_grid = param_test1, > scoring='roc_auc', > n_jobs=6, > iid=False, cv=5) > > Before that I preprocessed my dataset X with some different methods. > These preprocessing steps have some parameters too, which I would like to > tune. > I know that it is possible to tune the parameters of the preprocessing steps, > if they are part pf my pipeline. > E.g. if I am using PCA, I could tune the parameter n_components, right? > > But what if I have some "custom" preprocessing code with some parameters? > Is it possible to create a scikit-compatible "object" of my custom code in > order to tune the > parameters in the pipeline with grid search? > Imagine I would like to write a custom method FeatureMultiplier() with a > parameter multiplier_value. > Is it possible to create a scikit-compatible class out of this method and > tune it with grid search? > > I thought I saw a talk about exactly this topic at some PyData in 2016 or > 2015, > but unfortunately I cannot find the video of it. > Maybe I misunderstood the presentation at that time. > > > Best regards, > Piotr > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn