Re: [scikit-learn] combining datasets from different sources

Sebastian Raschka Tue, 05 Sep 2017 10:37:50 -0700

Another approach would be to pose this as a "ranking" problem to predict 
relative affinities rather than absolute affinities. E.g., if you have data 
from one (or more) molecules that has/have been tested under 2 or more 
experimental conditions, you can rank the other molecules accordingly or 
normalize. E.g. if you observe that the binding affinity of molecule a is -7 
kcal/mol in assay A and -9 kcal/mol in assay to, and say the binding affinities 
of molecule B are -10 and -12 kcal/mol, respectively, that should give you some 
information for normalizing the values from assay 2 (e.g., by adding 2 
kcal/mol). Of course this is not a perfect solution and might be error prone, 
but so are experimental assays ... (when I sometimes look at the std error/CI 
of the data I get from collaborators ... well, it seems that absolute binding 
affinities have always taken with a grain of salt anyway)


Best,
Sebastian

> On Sep 5, 2017, at 1:02 PM, Jason Rudy <jcr...@gmail.com> wrote:
> 
> Thomas,
> 
> This is sort of related to the problem I did my M.S. thesis on years ago: 
> cross-platform normalization of gene expression data.  If you google that 
> term you'll find some papers.  The situation is somewhat different, though, 
> because with microarrays or RNA-seq you get thousands of data points for each 
> experiment, which makes it easier to estimate the batch effect.  The 
> principle is the similar, however.  
> 
> If I were in your situation, I would consider whether I have any of the 
> following advantages:
> 
> 1. Some molecules that appear in multiple data sets
> 2. Detailed information about the different experimental conditions
> 3. Physical/chemical models of how experimental conditions influence binding 
> affinity
> 
> If you have any of the above, you can potentially use them to improve your 
> estimates.  You could also consider using experiment ID as a categorical 
> predictor in a sufficiently general regression method.
> 
> Lastly, you may already know this, but the term "meta-analysis" is relevant 
> here, and you can google for specific techniques.  Most of these would be 
> more limited than what you are envisioning, I think.
> 
> Best,
> 
> Jason
> 
> On Tue, Sep 5, 2017 at 6:39 AM, Thomas Evangelidis <teva...@gmail.com> wrote:
> Greetings,
> 
> I am working on a problem that involves predicting the binding affinity of 
> small molecules on a receptor structure (is regression problem, not 
> classification). I have multiple small datasets of molecules with measured 
> binding affinities on a receptor, but each dataset was measured in different 
> experimental conditions and therefore I cannot use them all together as 
> trainning set. So, instead of using them individually, I was wondering 
> whether there is a method to combine them all into a super training set. The 
> first way I could think of is to convert the binding affinities to Z-scores 
> and then combine all the small datasets of molecules. But this is would be 
> inaccurate because, firstly the datasets are very small (10-50 molecules 
> each), and secondly, the range of binding affinities differs in each 
> experiment (some datasets contain really strong binders, while others do not, 
> etc.). Is there any other approach to combine datasets with values coming 
> from different sources? Maybe if som
 eone points me to the right reference I could read and understand if it is 
applicable to my case.
> 
> Thanks,
> Thomas
> 
> -- 
> ======================================================================
> Dr Thomas Evangelidis
> Post-doctoral Researcher
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/2S049, 
> 62500 Brno, Czech Republic 
> 
> email: tev...@pharm.uoa.gr
>               teva...@gmail.com
> 
> website: https://sites.google.com/site/thomasevangelidishomepage/
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] combining datasets from different sources

Reply via email to