Greetings, I want to augment my training set but preserve at the same time the correlations between feature values. More specifically my features are NMR resonances of the nuclei of a single amino acid. For example for Glutamic acid I have for each observation the following feature values:
[CA, HA, CB, HB, CG, HG] where CA is the resonance of the alpha carbon, HA the resonance of the alpha proton, and so forth. The complication here is that these feature values are not independent. HA is covalently bonded to CA, CB to CA, and so on. Therefore if I sample a random CA value from the distribution of experimental values of CA, I cannot pick ANY HA VALUE from the respective experimental distribution, simply because CA and HA are correlated. The same applies to CA and CB, CB and HB, CB and CG, CG and HG. Is there any algorithm that can generate [CA, HA, CB, HB, CG, HG] feature vectors that comply with the atom distributions and their correlations? I saw that Gaussian Mixture Models have a function to generate random samples from the fitted Gaussian distribution (sklearn.mixture.GaussianMixture.sample) but it is not clear if these samples will retain the correlations between the features (nuclei in this case). If there is not such an algorithm in scikit-learn, could you please point me to any other Python library which does that? Thanks in advance. Thomas -- ====================================================================== Dr Thomas Evangelidis Post-doctoral Researcher CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/2S049, 62500 Brno, Czech Republic email: tev...@pharm.uoa.gr teva...@gmail.com website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn