Dear all,
I am new to scikit learn so please excuse my ignorance. Using GridsearchCV I am
trying to optimize a DecisionTreeRegressor. The broader I make the parameter
space, the worse the scoring gets.
Setting min_samples_split to range(2,10) gives me a neg_mean_squared_error of
-0.04. When se
Hi Andreas,
the best score is determined by computing the test fold performance (I think
R^2 by default) and then averaging over them. Since you chose cv=10, you have
10 test folds, and the performance is the average performance over those for
choosing the best hyper parameter setting.
Then,
When clustering it's often a good idea to think not about the algorithm
used to identify clusters, but about what distance metric might capture
your intuitions about similar and dissimilar points. HTH
On Fri., 29 Mar. 2019, 6:39 pm lampahome, wrote:
> I have data which contain access duration of