Re: [scikit-learn] combining datasets from different sources

2017-09-07 Thread Maciek Wójcikowski
2017-09-07 15:57 GMT+02:00 Thomas Evangelidis : > > > On 7 September 2017 at 15:29, Maciek Wójcikowski > wrote: > >> I think StandardScaller is what you want. For each assay you will get >> mean and var. Average mean would be the "optimal" shift and average >> variance the spread. But would this

Re: [scikit-learn] combining datasets from different sources

2017-09-07 Thread Thomas Evangelidis
On 7 September 2017 at 15:29, Maciek Wójcikowski wrote: > I think StandardScaller is what you want. For each assay you will get mean > and var. Average mean would be the "optimal" shift and average variance the > spread. But would this value make any physical sense? > > ​I think you missed my poi

Re: [scikit-learn] combining datasets from different sources

2017-09-07 Thread Maciek Wójcikowski
I think StandardScaller is what you want. For each assay you will get mean and var. Average mean would be the "optimal" shift and average variance the spread. But would this value make any physical sense? Considering the RF-Score-VS: In fact it's a regressor and it predicts a real value, not a cla

Re: [scikit-learn] combining datasets from different sources

2017-09-06 Thread Thomas Evangelidis
​​ After some though about this problem today, I think it is an objective function minimization problem, when the objective function can be the root mean square deviation (RMSD) between the affinities of the common molecules in the two data sets. I could work iteratively, first rescale and fit assa

Re: [scikit-learn] combining datasets from different sources

2017-09-05 Thread Thomas Evangelidis
Thanks Jason, Sebastian and Maciek! I believe from all the suggestions, the most feasible solutions is to look experimental assays which overlap by at least two compounds, and then adjust the binding affinities of one of them by looking in their difference in both assays. Sebastian mentioned the s

Re: [scikit-learn] combining datasets from different sources

2017-09-05 Thread Maciek Wójcikowski
Hi Thomas and others, It also really depend on how many data points you have on each compound. If you had more than a few then there are few options. If you get two very distinct activities for one ligand. I'd discard such samples as ambiguous or decide on one of the assays/experiments (the one wi

Re: [scikit-learn] combining datasets from different sources

2017-09-05 Thread Sebastian Raschka
Another approach would be to pose this as a "ranking" problem to predict relative affinities rather than absolute affinities. E.g., if you have data from one (or more) molecules that has/have been tested under 2 or more experimental conditions, you can rank the other molecules accordingly or no

Re: [scikit-learn] combining datasets from different sources

2017-09-05 Thread Jason Rudy
Thomas, This is sort of related to the problem I did my M.S. thesis on years ago: cross-platform normalization of gene expression data. If you google that term you'll find some papers. The situation is somewhat different, though, because with microarrays or RNA-seq you get thousands of data poin

[scikit-learn] combining datasets from different sources

2017-09-05 Thread Thomas Evangelidis
Greetings, I am working on a problem that involves predicting the binding affinity of small molecules on a receptor structure (is regression problem, not classification). I have multiple small datasets of molecules with measured binding affinities on a receptor, but each dataset was measured in di