Hello, i'm interested to contribute in scikit learn implementing some algorithms to make an optimal selection of objects in a N-dimensional space. These techniques are used when sampling is needed in large data and when the sampling must be done with a specifi criterion:
- Most Descriptive Compound: The aim of this algorithm is to select a subset of compounds which most effectively represents the compounds in the original population[Hudson, B; Quantitative Structure-Activity Relationships 1996, 15, 285] - Dissimilarity Selection: The aim of this algorithm is to select a subset of compounds which are really different each others [Lajiness, M; Perspectives in Drug Discovery and Design 1997, 7(8), 65]. - others.... I can implement the Dissimilarity Selection, the Most Descriptive Compound for the moment. Maybe lather other algorithms. Are you intrested? Giuseppe Marco Randazzo -- Giuseppe Marco Randazzo, Chemist, Ph.D Collaborateur Ens. Recherche - UniGE Post-Doc Fellow School of Pharmaceutical Sciences University of Geneva, University of Lausanne Pharmacochemistry and Pharmaceutical Analytical Chemistry Pavillon des isotopes 20, Bd d'Yvoy CH-1211 Geneva 4 (Switzerland) Office: I20B Portable : +41 76 262 67 12 Phone : +41 22 37 968 94 skype : gmrandazzo ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general