Re: [Scikit-learn-general] Optimal Subset Selection Code Contribution

Giuseppe Marco Randazzo Fri, 22 Aug 2014 02:56:57 -0700

Hi Mathieu,

i did not understand why u have decided to do not include these sampling 
techniques.


These sampling technique are used to select small representative population of 
objects from a sample. They work on a multidimensional space of norm , so a 
simple euclidean space. 

What i think about the Grid Search is the same as the sub object selection: you 
can subselect your parameters to build a model…. but this subselection means 
that you lost also informations. The interaction between Grid Search and 
Subobject selection can also be evaluated but i know how todo this and will be 
explained in a publication in the future :-)

Marco


On 21 Aug 2014, at 10:31, Mathieu Blondel <[email protected]> wrote:

> There was a thread on the mailing-list a while ago on instance reduction 
> methods.
> It was decided to not include such methods for the time being as changing 
> n_samples is not supported by transformers or pipelines.
> It is also not clear yet how such methods would play with grid search, for 
> instance.
> 
> A separate project was created by Dayvid Victor for the time being:
> https://github.com/dvro/scikit-protopy
> 
> Mathieu
> 
> 
> On Wed, Aug 20, 2014 at 5:23 AM, Gael Varoquaux 
> <[email protected]> wrote:
> Hi Giuseppe,
> 
> Is there a specific highly-cited reference for these methods. I did a
> quick search on Google scholar, and it seemed that I could mostly find
> them used in chemistry.
> 
> Cheers,
> 
> Gaël
> 
> On Tue, Aug 19, 2014 at 12:05:13PM +0200, Giuseppe Marco Randazzo wrote:
> > Hello,
> 
> > i'm interested to contribute in scikit learn implementing some
> > algorithms to make an optimal selection of objects in a N-dimensional
> > space. These techniques are used when sampling is needed in large data
> > and when the sampling must be done with a specifi criterion:
> 
> > - Most Descriptive Compound: The aim of this algorithm is to select a
> > subset of compounds which most effectively represents the compounds in
> > the original population[Hudson, B; Quantitative Structure-Activity
> > Relationships 1996, 15, 285]
> 
> > - Dissimilarity Selection: The aim of this algorithm is to select a
> > subset of compounds which are really different each others [Lajiness, M;
> > Perspectives in Drug Discovery and Design 1997, 7(8), 65].
> 
> 
> > - others....
> 
> > I can implement the Dissimilarity Selection, the Most Descriptive
> > Compound for the moment. Maybe lather other algorithms.
> 
> 
> > Are you intrested?
> 
> > Giuseppe Marco Randazzo
> --
>     Gael Varoquaux
>     Researcher, INRIA Parietal
>     Laboratoire de Neuro-Imagerie Assistee par Ordinateur
>     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
>     Phone:  ++ 33-1-69-08-79-68
>     http://gael-varoquaux.info            http://twitter.com/GaelVaroquaux
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 
> ------------------------------------------------------------------------------
> Slashdot TV.  
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Optimal Subset Selection Code Contribution

Reply via email to