[Scikit-learn-general] pairwise_distances_argmin_min_n

2016-01-29 Thread Debanjan Bhattacharyya
Hi I have written a method pairwise_distances_argmin_min_n in my "develop" mode. Functionality is similar to pairwise_distances_argmin_min, but, it returns n minimas rather than only one (both indices and the minimas). And it does it in chunk mode (parallel) on sparse matrices which needed some st

[Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread muhammad waseem
Hello All, I am new to scikitlearn and ML, and trying to train my model using random forest and gradient boosting trees regressors. I was wondering what is the best way to do hyperparameter tuning, shall I use GridSearchCV or RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread Sebastian Raschka
Hi, Waseem, with a fine-enough grid, the GridSearchCV would be more "thorough" than the randomized search. However, the problem is essentially some sort of combinatorial explosion. Typically, I start with a "rougher" grid (the different parameters are more "spaced out" relative to each other). A

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread muhammad waseem
Hi Sebastian, Thanks for your reply. So this mean I should start with e.g. "max_depth": [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and min_samples_leaf=10, then I should explore values close to these values. Am I right? Shall I use small value of number of estimator, whi

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread Sebastian Raschka
> Thanks for your reply. So this mean I should start with e.g. "max_depth": > [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and > min_samples_leaf=10, then I should explore values close to these values. Am I > right? Yes, this would work. However, keep in mind that you

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread muhammad waseem
> Thanks for your reply. So this mean I should start with e.g. "max_depth": > [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and > min_samples_leaf=10, then I should explore values close to these values. Am > I right? > > > Yes, this would work. However, keep in mind that you

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread muhammad waseem
I meant, how I make sure that I don't miss the "Good" combination that you mentioned? Also, for second point: Maybe considering computational time and then making sure that I have enough number of estimators in the parametric study? On Fri, Jan 29, 2016 at 9:38 PM, muhammad waseem wrote: > > Th

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread Sebastian Raschka
> On Jan 29, 2016, at 4:45 PM, muhammad waseem wrote: > > I meant, how I make sure that I don't miss the "Good" combination that you > mentioned? Here, we are back to an exhaustive search on an infinitely small grid :). It's really about finding the "sweet spot" that is "practical" given you

Re: [Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees

2016-01-29 Thread muhammad waseem
On Fri, Jan 29, 2016 at 9:51 PM, Sebastian Raschka wrote: > > On Jan 29, 2016, at 4:45 PM, muhammad waseem > wrote: > > I meant, how I make sure that I don't miss the "Good" combination that you > mentioned? > > > Here, we are back to an exhaustive search on an infinitely small grid :). > It's r