Another approach would be to pose this as a "ranking" problem to predict relative affinities rather than absolute affinities. E.g., if you have data from one (or more) molecules that has/have been tested under 2 or more experimental conditions, you can rank the other molecules accordingly or normalize. E.g. if you observe that the binding affinity of molecule a is -7 kcal/mol in assay A and -9 kcal/mol in assay to, and say the binding affinities of molecule B are -10 and -12 kcal/mol, respectively, that should give you some information for normalizing the values from assay 2 (e.g., by adding 2 kcal/mol). Of course this is not a perfect solution and might be error prone, but so are experimental assays ... (when I sometimes look at the std error/CI of the data I get from collaborators ... well, it seems that absolute binding affinities have always taken with a grain of salt anyway)
Best, Sebastian > On Sep 5, 2017, at 1:02 PM, Jason Rudy <jcr...@gmail.com> wrote: > > Thomas, > > This is sort of related to the problem I did my M.S. thesis on years ago: > cross-platform normalization of gene expression data. If you google that > term you'll find some papers. The situation is somewhat different, though, > because with microarrays or RNA-seq you get thousands of data points for each > experiment, which makes it easier to estimate the batch effect. The > principle is the similar, however. > > If I were in your situation, I would consider whether I have any of the > following advantages: > > 1. Some molecules that appear in multiple data sets > 2. Detailed information about the different experimental conditions > 3. Physical/chemical models of how experimental conditions influence binding > affinity > > If you have any of the above, you can potentially use them to improve your > estimates. You could also consider using experiment ID as a categorical > predictor in a sufficiently general regression method. > > Lastly, you may already know this, but the term "meta-analysis" is relevant > here, and you can google for specific techniques. Most of these would be > more limited than what you are envisioning, I think. > > Best, > > Jason > > On Tue, Sep 5, 2017 at 6:39 AM, Thomas Evangelidis <teva...@gmail.com> wrote: > Greetings, > > I am working on a problem that involves predicting the binding affinity of > small molecules on a receptor structure (is regression problem, not > classification). I have multiple small datasets of molecules with measured > binding affinities on a receptor, but each dataset was measured in different > experimental conditions and therefore I cannot use them all together as > trainning set. So, instead of using them individually, I was wondering > whether there is a method to combine them all into a super training set. The > first way I could think of is to convert the binding affinities to Z-scores > and then combine all the small datasets of molecules. But this is would be > inaccurate because, firstly the datasets are very small (10-50 molecules > each), and secondly, the range of binding affinities differs in each > experiment (some datasets contain really strong binders, while others do not, > etc.). Is there any other approach to combine datasets with values coming > from different sources? Maybe if som eone points me to the right reference I could read and understand if it is applicable to my case. > > Thanks, > Thomas > > -- > ====================================================================== > Dr Thomas Evangelidis > Post-doctoral Researcher > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/2S049, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > teva...@gmail.com > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn