Hi Andreas,

the best score is determined by computing the test fold performance (I think 
R^2 by default) and then averaging over them. Since you chose cv=10, you have 
10 test folds, and the performance is the average performance over those for 
choosing the best hyper parameter setting. 

Then, it looks like you are computing the performance manually:

> simple_tree.fit(x_tr,y_tr).score(x_tr,y_tr)

on the whole training set. Instead, I would take a look at the 
simple_tree.best_score_ attribute after fitting. If you do 

Best,
Sebastian

> On Mar 31, 2019, at 5:15 AM, Andreas Tosstorff <and...@hotmail.com> wrote:
> 
> Dear all,
> I am new to scikit learn so please excuse my ignorance. Using GridsearchCV I 
> am trying to optimize a DecisionTreeRegressor. The broader I make the 
> parameter space, the worse the scoring gets.
> Setting min_samples_split to range(2,10) gives me a neg_mean_squared_error of 
> -0.04. When setting it to range(2,5) The score is -0.004.
> simple_tree =GridSearchCV(tree.DecisionTreeRegressor(random_state=42), 
> n_jobs=4, param_grid={'min_samples_split': range(2, 10)}, 
> scoring='neg_mean_squared_error', cv=10, refit='neg_mean_squared_error')
> 
> simple_tree.fit(x_tr,y_tr).score(x_tr,y_tr)
> 
> I expect an equal or more positive score for a more extensive grid search 
> compared to the less extensive one.
> 
> I would really appreciate your help!
> 
> Kind regards,
> Andreas
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to