Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
Got it thanks! One last question:

Are there heuristics or rules of thumb about which distribution should be used 
or turn out to be best with gradient boost classifiers (depth of tree, min 
number of samples, learning rate, etc..)?

Thank you, 


-Original Message-
From: Vlad Niculae [mailto:zephy...@gmail.com] 
Sent: Monday, April 20, 2015 3:48 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] randomized grid search

The User Guide has an example that better illustrates what Andy meant: for 
continuous parameters such as C and gamma in a gaussian kernel SVM, you should 
use a continuous distribution (e.g. exponential):


http://scikit-learn.org/stable/modules/grid_search.html#randomized-parameter-optimization

Vlad


> On 20 Apr 2015, at 15:34, Vlad Niculae  wrote:
> 
> The example you cite contains these lines:
> 
>  "max_features": sp_randint(1, 11),
>  "min_samples_split": sp_randint(1, 11),
>  "min_samples_leaf": sp_randint(1, 11),
> 
> Those are not lists, but distribution objects from scipy (see at the top of 
> the example, `from scipy.stats import randint as sp_randint`).
> 
> RandomizedSearchCV takes this kind of input, but GridSearchCV does not. As 
> Andy says, this kind of parametrization is what you should use if you intend 
> to do randomized parameter search.  You can use other distributions from 
> scipy.stats as well, if more appropriate.
> 
> Vlad
> 
> 
>> On 20 Apr 2015, at 15:16, Pagliari, Roberto  wrote:
>> 
>> Yes, I agree. From the example, though, my understanding is that you can 
>> only pass arrays, not functions, isn't that true?
>> 
>> Thank you,
>> 
>> 
>> From: Andreas Mueller [t3k...@gmail.com]
>> Sent: Monday, April 20, 2015 2:55 PM
>> To: scikit-learn-general@lists.sourceforge.net
>> Subject: Re: [Scikit-learn-general] randomized grid search
>> 
>> If you have continuous parameter you should really really really use 
>> continuous distributions!
>> 
>> On 04/20/2015 12:58 PM, Pagliari, Roberto wrote:
>>> Hi Vlad,
>>> when using randomized grid search, does sklearn look into intermediate 
>>> values, or does it samples from the values provided in the parameter grid?
>>> 
>>> Thank you,
>>> 
>>> 
>>> From: Vlad Niculae [zephy...@gmail.com]
>>> Sent: Monday, April 20, 2015 12:50 PM
>>> To: scikit-learn-general@lists.sourceforge.net
>>> Subject: Re: [Scikit-learn-general] randomized grid search
>>> 
>>> Hi Roberto
>>> 
>>>> what does None do for max_depth?
>>> Copy-pasted from 
>>> http://scikit-learn.org/stable/modules/generated/sklearn.tree.Decisi
>>> onTreeClassifier.html
>>> 
>>> "If None, then nodes are expanded until all leaves are pure or until all 
>>> leaves contain less than min_samples_split samples."
>>> 
>>>> In particular, if lists are provided, does randomized grid search 
>>>> construct a uniform probability distribution?
>>> Yes
>>> 
>>>> If that's the case, I presume there is no advantage over GridSearchCV?
>>> You still get roughly the same advantages (if some parameters matter way 
>>> more than others, you can get the good scores faster), as long as the grid 
>>> you're randomly sampling from is large enough. But if you have more 
>>> informed distributions to specify, that's even better.
>>> 
>>> For convenience, when I have computing power and time to spare, I often run 
>>> a few tens/hundreds iterations of RandomSearch on large discrete grids, and 
>>> if it seems promising, I run a full GridSearch overnight with minimal 
>>> changes to the code.
>>> 
>>> For practical purposes, it would probably be a better use of the time to 
>>> just do more random search, but if this would go into a paper, for some 
>>> audiences it can be more convincing to say you searched a grid thoroughly.
>>> 
>>> Hope this makes sense,
>>> Vlad
>>> 
>>>> Thank you,
>>>> 
>>>> ---
>>>> --- BPM Camp - Free Virtual Workshop May 6th at 10am 
>>>> PDT/1PM EDT Develop your own process in accordance with the BPMN 2 
>>>> standard Learn Process modeling best practices with Bonita BPM 
>>>> through live exercises
>>>> http://www.bonit

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
The User Guide has an example that better illustrates what Andy meant: for 
continuous parameters such as C and gamma in a gaussian kernel SVM, you should 
use a continuous distribution (e.g. exponential):


http://scikit-learn.org/stable/modules/grid_search.html#randomized-parameter-optimization

Vlad


> On 20 Apr 2015, at 15:34, Vlad Niculae  wrote:
> 
> The example you cite contains these lines:
> 
>  "max_features": sp_randint(1, 11),
>  "min_samples_split": sp_randint(1, 11),
>  "min_samples_leaf": sp_randint(1, 11),
> 
> Those are not lists, but distribution objects from scipy (see at the top of 
> the example, `from scipy.stats import randint as sp_randint`).
> 
> RandomizedSearchCV takes this kind of input, but GridSearchCV does not. As 
> Andy says, this kind of parametrization is what you should use if you intend 
> to do randomized parameter search.  You can use other distributions from 
> scipy.stats as well, if more appropriate.
> 
> Vlad
> 
> 
>> On 20 Apr 2015, at 15:16, Pagliari, Roberto  wrote:
>> 
>> Yes, I agree. From the example, though, my understanding is that you can 
>> only pass arrays, not functions, isn't that true?
>> 
>> Thank you, 
>> 
>> 
>> From: Andreas Mueller [t3k...@gmail.com]
>> Sent: Monday, April 20, 2015 2:55 PM
>> To: scikit-learn-general@lists.sourceforge.net
>> Subject: Re: [Scikit-learn-general] randomized grid search
>> 
>> If you have continuous parameter you should really really really use
>> continuous distributions!
>> 
>> On 04/20/2015 12:58 PM, Pagliari, Roberto wrote:
>>> Hi Vlad,
>>> when using randomized grid search, does sklearn look into intermediate 
>>> values, or does it samples from the values provided in the parameter grid?
>>> 
>>> Thank you,
>>> 
>>> 
>>> From: Vlad Niculae [zephy...@gmail.com]
>>> Sent: Monday, April 20, 2015 12:50 PM
>>> To: scikit-learn-general@lists.sourceforge.net
>>> Subject: Re: [Scikit-learn-general] randomized grid search
>>> 
>>> Hi Roberto
>>> 
>>>> what does None do for max_depth?
>>> Copy-pasted from 
>>> http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
>>> 
>>> "If None, then nodes are expanded until all leaves are pure or until all 
>>> leaves contain less than min_samples_split samples.”
>>> 
>>>> In particular, if lists are provided, does randomized grid search 
>>>> construct a uniform probability distribution?
>>> Yes
>>> 
>>>> If that's the case, I presume there is no advantage over GridSearchCV?
>>> You still get roughly the same advantages (if some parameters matter way 
>>> more than others, you can get the good scores faster), as long as the grid 
>>> you’re randomly sampling from is large enough. But if you have more 
>>> informed distributions to specify, that’s even better.
>>> 
>>> For convenience, when I have computing power and time to spare, I often run 
>>> a few tens/hundreds iterations of RandomSearch on large discrete grids, and 
>>> if it seems promising, I run a full GridSearch overnight with minimal 
>>> changes to the code.
>>> 
>>> For practical purposes, it would probably be a better use of the time to 
>>> just do more random search, but if this would go into a paper, for some 
>>> audiences it can be more convincing to say you searched a grid thoroughly.
>>> 
>>> Hope this makes sense,
>>> Vlad
>>> 
>>>> Thank you,
>>>> 
>>>> --
>>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>>> Develop your own process in accordance with the BPMN 2 standard
>>>> Learn Process modeling best practices with Bonita BPM through live 
>>>> exercises
>>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> 
>>> -

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
The example you cite contains these lines:

  "max_features": sp_randint(1, 11),
  "min_samples_split": sp_randint(1, 11),
  "min_samples_leaf": sp_randint(1, 11),

Those are not lists, but distribution objects from scipy (see at the top of the 
example, `from scipy.stats import randint as sp_randint`).

RandomizedSearchCV takes this kind of input, but GridSearchCV does not. As Andy 
says, this kind of parametrization is what you should use if you intend to do 
randomized parameter search.  You can use other distributions from scipy.stats 
as well, if more appropriate.

Vlad


> On 20 Apr 2015, at 15:16, Pagliari, Roberto  wrote:
> 
> Yes, I agree. From the example, though, my understanding is that you can only 
> pass arrays, not functions, isn't that true?
> 
> Thank you, 
> 
> 
> From: Andreas Mueller [t3k...@gmail.com]
> Sent: Monday, April 20, 2015 2:55 PM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] randomized grid search
> 
> If you have continuous parameter you should really really really use
> continuous distributions!
> 
> On 04/20/2015 12:58 PM, Pagliari, Roberto wrote:
>> Hi Vlad,
>> when using randomized grid search, does sklearn look into intermediate 
>> values, or does it samples from the values provided in the parameter grid?
>> 
>> Thank you,
>> 
>> 
>> From: Vlad Niculae [zephy...@gmail.com]
>> Sent: Monday, April 20, 2015 12:50 PM
>> To: scikit-learn-general@lists.sourceforge.net
>> Subject: Re: [Scikit-learn-general] randomized grid search
>> 
>> Hi Roberto
>> 
>>> what does None do for max_depth?
>> Copy-pasted from 
>> http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
>> 
>> "If None, then nodes are expanded until all leaves are pure or until all 
>> leaves contain less than min_samples_split samples.”
>> 
>>> In particular, if lists are provided, does randomized grid search construct 
>>> a uniform probability distribution?
>> Yes
>> 
>>> If that's the case, I presume there is no advantage over GridSearchCV?
>> You still get roughly the same advantages (if some parameters matter way 
>> more than others, you can get the good scores faster), as long as the grid 
>> you’re randomly sampling from is large enough. But if you have more informed 
>> distributions to specify, that’s even better.
>> 
>> For convenience, when I have computing power and time to spare, I often run 
>> a few tens/hundreds iterations of RandomSearch on large discrete grids, and 
>> if it seems promising, I run a full GridSearch overnight with minimal 
>> changes to the code.
>> 
>> For practical purposes, it would probably be a better use of the time to 
>> just do more random search, but if this would go into a paper, for some 
>> audiences it can be more convincing to say you searched a grid thoroughly.
>> 
>> Hope this makes sense,
>> Vlad
>> 
>>> Thank you,
>>> 
>>> --
>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>>> Develop your own process in accordance with the BPMN 2 standard
>>> Learn Process modeling best practices with Bonita BPM through live exercises
>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> --
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
>> ___
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
>> --
>>

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
Yes, I agree. From the example, though, my understanding is that you can only 
pass arrays, not functions, isn't that true?

Thank you, 


From: Andreas Mueller [t3k...@gmail.com]
Sent: Monday, April 20, 2015 2:55 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] randomized grid search

If you have continuous parameter you should really really really use
continuous distributions!

On 04/20/2015 12:58 PM, Pagliari, Roberto wrote:
> Hi Vlad,
> when using randomized grid search, does sklearn look into intermediate 
> values, or does it samples from the values provided in the parameter grid?
>
> Thank you,
>
> 
> From: Vlad Niculae [zephy...@gmail.com]
> Sent: Monday, April 20, 2015 12:50 PM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] randomized grid search
>
> Hi Roberto
>
>> what does None do for max_depth?
> Copy-pasted from 
> http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
>
> "If None, then nodes are expanded until all leaves are pure or until all 
> leaves contain less than min_samples_split samples.”
>
>> In particular, if lists are provided, does randomized grid search construct 
>> a uniform probability distribution?
> Yes
>
>> If that's the case, I presume there is no advantage over GridSearchCV?
> You still get roughly the same advantages (if some parameters matter way more 
> than others, you can get the good scores faster), as long as the grid you’re 
> randomly sampling from is large enough. But if you have more informed 
> distributions to specify, that’s even better.
>
> For convenience, when I have computing power and time to spare, I often run a 
> few tens/hundreds iterations of RandomSearch on large discrete grids, and if 
> it seems promising, I run a full GridSearch overnight with minimal changes to 
> the code.
>
> For practical purposes, it would probably be a better use of the time to just 
> do more random search, but if this would go into a paper, for some audiences 
> it can be more convincing to say you searched a grid thoroughly.
>
> Hope this makes sense,
> Vlad
>
>> Thank you,
>>
>> --
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> --
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> --
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
___
Scikit-learn-general mailin

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Andreas Mueller
If you have continuous parameter you should really really really use 
continuous distributions!

On 04/20/2015 12:58 PM, Pagliari, Roberto wrote:
> Hi Vlad,
> when using randomized grid search, does sklearn look into intermediate 
> values, or does it samples from the values provided in the parameter grid?
>
> Thank you,
>
> 
> From: Vlad Niculae [zephy...@gmail.com]
> Sent: Monday, April 20, 2015 12:50 PM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] randomized grid search
>
> Hi Roberto
>
>> what does None do for max_depth?
> Copy-pasted from 
> http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
>
> "If None, then nodes are expanded until all leaves are pure or until all 
> leaves contain less than min_samples_split samples.”
>
>> In particular, if lists are provided, does randomized grid search construct 
>> a uniform probability distribution?
> Yes
>
>> If that's the case, I presume there is no advantage over GridSearchCV?
> You still get roughly the same advantages (if some parameters matter way more 
> than others, you can get the good scores faster), as long as the grid you’re 
> randomly sampling from is large enough. But if you have more informed 
> distributions to specify, that’s even better.
>
> For convenience, when I have computing power and time to spare, I often run a 
> few tens/hundreds iterations of RandomSearch on large discrete grids, and if 
> it seems promising, I run a full GridSearch overnight with minimal changes to 
> the code.
>
> For practical purposes, it would probably be a better use of the time to just 
> do more random search, but if this would go into a paper, for some audiences 
> it can be more convincing to say you searched a grid thoroughly.
>
> Hope this makes sense,
> Vlad
>
>> Thank you,
>>
>> --
>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
>> Develop your own process in accordance with the BPMN 2 standard
>> Learn Process modeling best practices with Bonita BPM through live exercises
>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> --
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> --
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Pagliari, Roberto
Hi Vlad,
when using randomized grid search, does sklearn look into intermediate values, 
or does it samples from the values provided in the parameter grid?

Thank you, 


From: Vlad Niculae [zephy...@gmail.com]
Sent: Monday, April 20, 2015 12:50 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] randomized grid search

Hi Roberto

> what does None do for max_depth?

Copy-pasted from 
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

"If None, then nodes are expanded until all leaves are pure or until all leaves 
contain less than min_samples_split samples.”

> In particular, if lists are provided, does randomized grid search construct a 
> uniform probability distribution?

Yes

> If that's the case, I presume there is no advantage over GridSearchCV?

You still get roughly the same advantages (if some parameters matter way more 
than others, you can get the good scores faster), as long as the grid you’re 
randomly sampling from is large enough. But if you have more informed 
distributions to specify, that’s even better.

For convenience, when I have computing power and time to spare, I often run a 
few tens/hundreds iterations of RandomSearch on large discrete grids, and if it 
seems promising, I run a full GridSearch overnight with minimal changes to the 
code.

For practical purposes, it would probably be a better use of the time to just 
do more random search, but if this would go into a paper, for some audiences it 
can be more convincing to say you searched a grid thoroughly.

Hope this makes sense,
Vlad

>
> Thank you,
>
> --
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
Hi Roberto

> what does None do for max_depth?

Copy-pasted from 
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html

"If None, then nodes are expanded until all leaves are pure or until all leaves 
contain less than min_samples_split samples.”

> In particular, if lists are provided, does randomized grid search construct a 
> uniform probability distribution?

Yes

> If that's the case, I presume there is no advantage over GridSearchCV?

You still get roughly the same advantages (if some parameters matter way more 
than others, you can get the good scores faster), as long as the grid you’re 
randomly sampling from is large enough. But if you have more informed 
distributions to specify, that’s even better.

For convenience, when I have computing power and time to spare, I often run a 
few tens/hundreds iterations of RandomSearch on large discrete grids, and if it 
seems promising, I run a full GridSearch overnight with minimal changes to the 
code.

For practical purposes, it would probably be a better use of the time to just 
do more random search, but if this would go into a paper, for some audiences it 
can be more convincing to say you searched a grid thoroughly.

Hope this makes sense,
Vlad

> 
> Thank you, 
> 
> --
> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
> Develop your own process in accordance with the BPMN 2 standard
> Learn Process modeling best practices with Bonita BPM through live exercises
> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general