Hi Mathieu,
i did not understand why u have decided to do not include these sampling
techniques.
These sampling technique are used to select small representative population of
objects from a sample. They work on a multidimensional space of norm , so a
simple euclidean space.
What i think about the Grid Search is the same as the sub object selection: you
can subselect your parameters to build a model…. but this subselection means
that you lost also informations. The interaction between Grid Search and
Subobject selection can also be evaluated but i know how todo this and will be
explained in a publication in the future :-)
Marco
On 21 Aug 2014, at 10:31, Mathieu Blondel <[email protected]> wrote:
> There was a thread on the mailing-list a while ago on instance reduction
> methods.
> It was decided to not include such methods for the time being as changing
> n_samples is not supported by transformers or pipelines.
> It is also not clear yet how such methods would play with grid search, for
> instance.
>
> A separate project was created by Dayvid Victor for the time being:
> https://github.com/dvro/scikit-protopy
>
> Mathieu
>
>
> On Wed, Aug 20, 2014 at 5:23 AM, Gael Varoquaux
> <[email protected]> wrote:
> Hi Giuseppe,
>
> Is there a specific highly-cited reference for these methods. I did a
> quick search on Google scholar, and it seemed that I could mostly find
> them used in chemistry.
>
> Cheers,
>
> Gaël
>
> On Tue, Aug 19, 2014 at 12:05:13PM +0200, Giuseppe Marco Randazzo wrote:
> > Hello,
>
> > i'm interested to contribute in scikit learn implementing some
> > algorithms to make an optimal selection of objects in a N-dimensional
> > space. These techniques are used when sampling is needed in large data
> > and when the sampling must be done with a specifi criterion:
>
> > - Most Descriptive Compound: The aim of this algorithm is to select a
> > subset of compounds which most effectively represents the compounds in
> > the original population[Hudson, B; Quantitative Structure-Activity
> > Relationships 1996, 15, 285]
>
> > - Dissimilarity Selection: The aim of this algorithm is to select a
> > subset of compounds which are really different each others [Lajiness, M;
> > Perspectives in Drug Discovery and Design 1997, 7(8), 65].
>
>
> > - others....
>
> > I can implement the Dissimilarity Selection, the Most Descriptive
> > Compound for the moment. Maybe lather other algorithms.
>
>
> > Are you intrested?
>
> > Giuseppe Marco Randazzo
> --
> Gael Varoquaux
> Researcher, INRIA Parietal
> Laboratoire de Neuro-Imagerie Assistee par Ordinateur
> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> Phone: ++ 33-1-69-08-79-68
> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/_______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Slashdot TV.
Video for Nerds. Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general