Re: [Scikit-learn-general] Optimal Subset Selection Code Contribution

2014-08-22 Thread Giuseppe Marco Randazzo
Hi Mathieu, i did not understand why u have decided to do not include these sampling techniques. These sampling technique are used to select small representative population of objects from a sample. They work on a multidimensional space of norm , so a simple euclidean space. What i think abo

Re: [Scikit-learn-general] Optimal Subset Selection Code Contribution

2014-08-21 Thread Mathieu Blondel
There was a thread on the mailing-list a while ago on instance reduction methods. It was decided to not include such methods for the time being as changing n_samples is not supported by transformers or pipelines. It is also not clear yet how such methods would play with grid search, for instance.

Re: [Scikit-learn-general] Optimal Subset Selection Code Contribution

2014-08-19 Thread Gael Varoquaux
Hi Giuseppe, Is there a specific highly-cited reference for these methods. I did a quick search on Google scholar, and it seemed that I could mostly find them used in chemistry. Cheers, Gaƫl On Tue, Aug 19, 2014 at 12:05:13PM +0200, Giuseppe Marco Randazzo wrote: > Hello, > i'm interested to c

[Scikit-learn-general] Optimal Subset Selection Code Contribution

2014-08-19 Thread Giuseppe Marco Randazzo
Hello, i'm interested to contribute in scikit learn implementing some algorithms to make an optimal selection of objects in a N-dimensional space. These techniques are used when sampling is needed in large data and when the sampling must be done with a specifi criterion: - Most Descriptive Com