On 11/7/18 4:01 AM, William Heymann wrote:
Hello,

I am trying to tune the bandwidth for my KernelDensity. I need to find out what optimization goal to use.

I started with

from  sklearn.grid_search  import  GridSearchCV
grid  =  GridSearchCV(KernelDensity(),
                     {'bandwidth':  np.linspace(0.1,  1.0,  30)},
                     cv=20)  # 20-fold cross-validation
grid.fit(x[:,  None])
print  grid.best_params_

From https://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/#Bandwidth-Cross-Validation-in-Scikit-Learn

I have also used RandomizedSearchCV to optimize the parameters.

The problem I have is that neither refines the answer so if I don't sample at high enough density I don't get a good answer. What I would like to do is use the same goal but put it into a different global optimizer.

I have looked through the code for GridSearchCV and RandomizedSearchCV and I have not been able to figure out yet what is the actual optimization goal.

Originally I thought the system was using something like

kde_bw = KernelDensity(kernel='gaussian', bandwidth=bw)
score = max(cross_val_score(kde_bw, data, cv=3))

That's basically what it's doing. It's maximizing the "score" method of KernelDensity. you could look at scikit-optimize for a more elaborate optimizer (or try using any of the scipy ones)
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to