Re: [Scikit-learn-general] randomized grid search
Got it thanks! One last question: Are there heuristics or rules of thumb about which distribution should be used or turn out to be best with gradient boost classifiers (depth of tree, min number of samples, learning rate, etc..)? Thank you, -Original Message- From: Vlad Niculae [mailto:zephy...@gmail.com] Sent: Monday, April 20, 2015 3:48 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] randomized grid search The User Guide has an example that better illustrates what Andy meant: for continuous parameters such as C and gamma in a gaussian kernel SVM, you should use a continuous distribution (e.g. exponential): http://scikit-learn.org/stable/modules/grid_search.html#randomized-parameter-optimization Vlad > On 20 Apr 2015, at 15:34, Vlad Niculae wrote: > > The example you cite contains these lines: > > "max_features": sp_randint(1, 11), > "min_samples_split": sp_randint(1, 11), > "min_samples_leaf": sp_randint(1, 11), > > Those are not lists, but distribution objects from scipy (see at the top of > the example, `from scipy.stats import randint as sp_randint`). > > RandomizedSearchCV takes this kind of input, but GridSearchCV does not. As > Andy says, this kind of parametrization is what you should use if you intend > to do randomized parameter search. You can use other distributions from > scipy.stats as well, if more appropriate. > > Vlad > > >> On 20 Apr 2015, at 15:16, Pagliari, Roberto wrote: >> >> Yes, I agree. From the example, though, my understanding is that you can >> only pass arrays, not functions, isn't that true? >> >> Thank you, >> >> >> From: Andreas Mueller [t3k...@gmail.com] >> Sent: Monday, April 20, 2015 2:55 PM >> To: scikit-learn-general@lists.sourceforge.net >> Subject: Re: [Scikit-learn-general] randomized grid search >> >> If you have continuous parameter you should really really really use >> continuous distributions! >> >> On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: >>> Hi Vlad, >>> when using randomized grid search, does sklearn look into intermediate >>> values, or does it samples from the values provided in the parameter grid? >>> >>> Thank you, >>> >>> >>> From: Vlad Niculae [zephy...@gmail.com] >>> Sent: Monday, April 20, 2015 12:50 PM >>> To: scikit-learn-general@lists.sourceforge.net >>> Subject: Re: [Scikit-learn-general] randomized grid search >>> >>> Hi Roberto >>> >>>> what does None do for max_depth? >>> Copy-pasted from >>> http://scikit-learn.org/stable/modules/generated/sklearn.tree.Decisi >>> onTreeClassifier.html >>> >>> "If None, then nodes are expanded until all leaves are pure or until all >>> leaves contain less than min_samples_split samples." >>> >>>> In particular, if lists are provided, does randomized grid search >>>> construct a uniform probability distribution? >>> Yes >>> >>>> If that's the case, I presume there is no advantage over GridSearchCV? >>> You still get roughly the same advantages (if some parameters matter way >>> more than others, you can get the good scores faster), as long as the grid >>> you're randomly sampling from is large enough. But if you have more >>> informed distributions to specify, that's even better. >>> >>> For convenience, when I have computing power and time to spare, I often run >>> a few tens/hundreds iterations of RandomSearch on large discrete grids, and >>> if it seems promising, I run a full GridSearch overnight with minimal >>> changes to the code. >>> >>> For practical purposes, it would probably be a better use of the time to >>> just do more random search, but if this would go into a paper, for some >>> audiences it can be more convincing to say you searched a grid thoroughly. >>> >>> Hope this makes sense, >>> Vlad >>> >>>> Thank you, >>>> >>>> --- >>>> --- BPM Camp - Free Virtual Workshop May 6th at 10am >>>> PDT/1PM EDT Develop your own process in accordance with the BPMN 2 >>>> standard Learn Process modeling best practices with Bonita BPM >>>> through live exercises >>>> http://www.bonit
Re: [Scikit-learn-general] randomized grid search
The User Guide has an example that better illustrates what Andy meant: for continuous parameters such as C and gamma in a gaussian kernel SVM, you should use a continuous distribution (e.g. exponential): http://scikit-learn.org/stable/modules/grid_search.html#randomized-parameter-optimization Vlad > On 20 Apr 2015, at 15:34, Vlad Niculae wrote: > > The example you cite contains these lines: > > "max_features": sp_randint(1, 11), > "min_samples_split": sp_randint(1, 11), > "min_samples_leaf": sp_randint(1, 11), > > Those are not lists, but distribution objects from scipy (see at the top of > the example, `from scipy.stats import randint as sp_randint`). > > RandomizedSearchCV takes this kind of input, but GridSearchCV does not. As > Andy says, this kind of parametrization is what you should use if you intend > to do randomized parameter search. You can use other distributions from > scipy.stats as well, if more appropriate. > > Vlad > > >> On 20 Apr 2015, at 15:16, Pagliari, Roberto wrote: >> >> Yes, I agree. From the example, though, my understanding is that you can >> only pass arrays, not functions, isn't that true? >> >> Thank you, >> >> >> From: Andreas Mueller [t3k...@gmail.com] >> Sent: Monday, April 20, 2015 2:55 PM >> To: scikit-learn-general@lists.sourceforge.net >> Subject: Re: [Scikit-learn-general] randomized grid search >> >> If you have continuous parameter you should really really really use >> continuous distributions! >> >> On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: >>> Hi Vlad, >>> when using randomized grid search, does sklearn look into intermediate >>> values, or does it samples from the values provided in the parameter grid? >>> >>> Thank you, >>> >>> >>> From: Vlad Niculae [zephy...@gmail.com] >>> Sent: Monday, April 20, 2015 12:50 PM >>> To: scikit-learn-general@lists.sourceforge.net >>> Subject: Re: [Scikit-learn-general] randomized grid search >>> >>> Hi Roberto >>> >>>> what does None do for max_depth? >>> Copy-pasted from >>> http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html >>> >>> "If None, then nodes are expanded until all leaves are pure or until all >>> leaves contain less than min_samples_split samples.” >>> >>>> In particular, if lists are provided, does randomized grid search >>>> construct a uniform probability distribution? >>> Yes >>> >>>> If that's the case, I presume there is no advantage over GridSearchCV? >>> You still get roughly the same advantages (if some parameters matter way >>> more than others, you can get the good scores faster), as long as the grid >>> you’re randomly sampling from is large enough. But if you have more >>> informed distributions to specify, that’s even better. >>> >>> For convenience, when I have computing power and time to spare, I often run >>> a few tens/hundreds iterations of RandomSearch on large discrete grids, and >>> if it seems promising, I run a full GridSearch overnight with minimal >>> changes to the code. >>> >>> For practical purposes, it would probably be a better use of the time to >>> just do more random search, but if this would go into a paper, for some >>> audiences it can be more convincing to say you searched a grid thoroughly. >>> >>> Hope this makes sense, >>> Vlad >>> >>>> Thank you, >>>> >>>> -- >>>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >>>> Develop your own process in accordance with the BPMN 2 standard >>>> Learn Process modeling best practices with Bonita BPM through live >>>> exercises >>>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ >>>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> -
Re: [Scikit-learn-general] randomized grid search
The example you cite contains these lines: "max_features": sp_randint(1, 11), "min_samples_split": sp_randint(1, 11), "min_samples_leaf": sp_randint(1, 11), Those are not lists, but distribution objects from scipy (see at the top of the example, `from scipy.stats import randint as sp_randint`). RandomizedSearchCV takes this kind of input, but GridSearchCV does not. As Andy says, this kind of parametrization is what you should use if you intend to do randomized parameter search. You can use other distributions from scipy.stats as well, if more appropriate. Vlad > On 20 Apr 2015, at 15:16, Pagliari, Roberto wrote: > > Yes, I agree. From the example, though, my understanding is that you can only > pass arrays, not functions, isn't that true? > > Thank you, > > > From: Andreas Mueller [t3k...@gmail.com] > Sent: Monday, April 20, 2015 2:55 PM > To: scikit-learn-general@lists.sourceforge.net > Subject: Re: [Scikit-learn-general] randomized grid search > > If you have continuous parameter you should really really really use > continuous distributions! > > On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: >> Hi Vlad, >> when using randomized grid search, does sklearn look into intermediate >> values, or does it samples from the values provided in the parameter grid? >> >> Thank you, >> >> >> From: Vlad Niculae [zephy...@gmail.com] >> Sent: Monday, April 20, 2015 12:50 PM >> To: scikit-learn-general@lists.sourceforge.net >> Subject: Re: [Scikit-learn-general] randomized grid search >> >> Hi Roberto >> >>> what does None do for max_depth? >> Copy-pasted from >> http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html >> >> "If None, then nodes are expanded until all leaves are pure or until all >> leaves contain less than min_samples_split samples.” >> >>> In particular, if lists are provided, does randomized grid search construct >>> a uniform probability distribution? >> Yes >> >>> If that's the case, I presume there is no advantage over GridSearchCV? >> You still get roughly the same advantages (if some parameters matter way >> more than others, you can get the good scores faster), as long as the grid >> you’re randomly sampling from is large enough. But if you have more informed >> distributions to specify, that’s even better. >> >> For convenience, when I have computing power and time to spare, I often run >> a few tens/hundreds iterations of RandomSearch on large discrete grids, and >> if it seems promising, I run a full GridSearch overnight with minimal >> changes to the code. >> >> For practical purposes, it would probably be a better use of the time to >> just do more random search, but if this would go into a paper, for some >> audiences it can be more convincing to say you searched a grid thoroughly. >> >> Hope this makes sense, >> Vlad >> >>> Thank you, >>> >>> -- >>> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >>> Develop your own process in accordance with the BPMN 2 standard >>> Learn Process modeling best practices with Bonita BPM through live exercises >>> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ >>> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> -- >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF >> ___ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> -- >>
Re: [Scikit-learn-general] randomized grid search
Yes, I agree. From the example, though, my understanding is that you can only pass arrays, not functions, isn't that true? Thank you, From: Andreas Mueller [t3k...@gmail.com] Sent: Monday, April 20, 2015 2:55 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] randomized grid search If you have continuous parameter you should really really really use continuous distributions! On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: > Hi Vlad, > when using randomized grid search, does sklearn look into intermediate > values, or does it samples from the values provided in the parameter grid? > > Thank you, > > > From: Vlad Niculae [zephy...@gmail.com] > Sent: Monday, April 20, 2015 12:50 PM > To: scikit-learn-general@lists.sourceforge.net > Subject: Re: [Scikit-learn-general] randomized grid search > > Hi Roberto > >> what does None do for max_depth? > Copy-pasted from > http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html > > "If None, then nodes are expanded until all leaves are pure or until all > leaves contain less than min_samples_split samples.” > >> In particular, if lists are provided, does randomized grid search construct >> a uniform probability distribution? > Yes > >> If that's the case, I presume there is no advantage over GridSearchCV? > You still get roughly the same advantages (if some parameters matter way more > than others, you can get the good scores faster), as long as the grid you’re > randomly sampling from is large enough. But if you have more informed > distributions to specify, that’s even better. > > For convenience, when I have computing power and time to spare, I often run a > few tens/hundreds iterations of RandomSearch on large discrete grids, and if > it seems promising, I run a full GridSearch overnight with minimal changes to > the code. > > For practical purposes, it would probably be a better use of the time to just > do more random search, but if this would go into a paper, for some audiences > it can be more convincing to say you searched a grid thoroughly. > > Hope this makes sense, > Vlad > >> Thank you, >> >> -- >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > ___ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > ___ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF ___ Scikit-learn-general mailin
Re: [Scikit-learn-general] randomized grid search
If you have continuous parameter you should really really really use continuous distributions! On 04/20/2015 12:58 PM, Pagliari, Roberto wrote: > Hi Vlad, > when using randomized grid search, does sklearn look into intermediate > values, or does it samples from the values provided in the parameter grid? > > Thank you, > > > From: Vlad Niculae [zephy...@gmail.com] > Sent: Monday, April 20, 2015 12:50 PM > To: scikit-learn-general@lists.sourceforge.net > Subject: Re: [Scikit-learn-general] randomized grid search > > Hi Roberto > >> what does None do for max_depth? > Copy-pasted from > http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html > > "If None, then nodes are expanded until all leaves are pure or until all > leaves contain less than min_samples_split samples.” > >> In particular, if lists are provided, does randomized grid search construct >> a uniform probability distribution? > Yes > >> If that's the case, I presume there is no advantage over GridSearchCV? > You still get roughly the same advantages (if some parameters matter way more > than others, you can get the good scores faster), as long as the grid you’re > randomly sampling from is large enough. But if you have more informed > distributions to specify, that’s even better. > > For convenience, when I have computing power and time to spare, I often run a > few tens/hundreds iterations of RandomSearch on large discrete grids, and if > it seems promising, I run a full GridSearch overnight with minimal changes to > the code. > > For practical purposes, it would probably be a better use of the time to just > do more random search, but if this would go into a paper, for some audiences > it can be more convincing to say you searched a grid thoroughly. > > Hope this makes sense, > Vlad > >> Thank you, >> >> -- >> BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT >> Develop your own process in accordance with the BPMN 2 standard >> Learn Process modeling best practices with Bonita BPM through live exercises >> http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ >> source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > ___ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -- > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF > ___ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] randomized grid search
Hi Vlad, when using randomized grid search, does sklearn look into intermediate values, or does it samples from the values provided in the parameter grid? Thank you, From: Vlad Niculae [zephy...@gmail.com] Sent: Monday, April 20, 2015 12:50 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] randomized grid search Hi Roberto > what does None do for max_depth? Copy-pasted from http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html "If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.” > In particular, if lists are provided, does randomized grid search construct a > uniform probability distribution? Yes > If that's the case, I presume there is no advantage over GridSearchCV? You still get roughly the same advantages (if some parameters matter way more than others, you can get the good scores faster), as long as the grid you’re randomly sampling from is large enough. But if you have more informed distributions to specify, that’s even better. For convenience, when I have computing power and time to spare, I often run a few tens/hundreds iterations of RandomSearch on large discrete grids, and if it seems promising, I run a full GridSearch overnight with minimal changes to the code. For practical purposes, it would probably be a better use of the time to just do more random search, but if this would go into a paper, for some audiences it can be more convincing to say you searched a grid thoroughly. Hope this makes sense, Vlad > > Thank you, > > -- > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Re: [Scikit-learn-general] randomized grid search
Hi Roberto > what does None do for max_depth? Copy-pasted from http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html "If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.” > In particular, if lists are provided, does randomized grid search construct a > uniform probability distribution? Yes > If that's the case, I presume there is no advantage over GridSearchCV? You still get roughly the same advantages (if some parameters matter way more than others, you can get the good scores faster), as long as the grid you’re randomly sampling from is large enough. But if you have more informed distributions to specify, that’s even better. For convenience, when I have computing power and time to spare, I often run a few tens/hundreds iterations of RandomSearch on large discrete grids, and if it seems promising, I run a full GridSearch overnight with minimal changes to the code. For practical purposes, it would probably be a better use of the time to just do more random search, but if this would go into a paper, for some audiences it can be more convincing to say you searched a grid thoroughly. Hope this makes sense, Vlad > > Thank you, > > -- > BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT > Develop your own process in accordance with the BPMN 2 standard > Learn Process modeling best practices with Bonita BPM through live exercises > http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ > source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF ___ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general