[scikit-learn] GridsearchCV returns worse scoring the broader parameter space gets

2019-03-31 Thread Andreas Tosstorff
Dear all, I am new to scikit learn so please excuse my ignorance. Using GridsearchCV I am trying to optimize a DecisionTreeRegressor. The broader I make the parameter space, the worse the scoring gets. Setting min_samples_split to range(2,10) gives me a neg_mean_squared_error of -0.04. When se

Re: [scikit-learn] GridsearchCV returns worse scoring the broader parameter space gets

2019-03-31 Thread Sebastian Raschka
Hi Andreas, the best score is determined by computing the test fold performance (I think R^2 by default) and then averaging over them. Since you chose cv=10, you have 10 test folds, and the performance is the average performance over those for choosing the best hyper parameter setting. Then,

Re: [scikit-learn] Can cluster based on the continuous access duration of an item?

2019-03-31 Thread Joel Nothman
When clustering it's often a good idea to think not about the algorithm used to identify clusters, but about what distance metric might capture your intuitions about similar and dissimilar points. HTH On Fri., 29 Mar. 2019, 6:39 pm lampahome, wrote: > I have data which contain access duration of