Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-11 Thread Greg Landrum
I've been quiet on this one since I'm traveling this week, but I want to briefly weigh in on the fingerprint aspects since I think some terms are being used incorrectly and that's maybe making things even more confusing. I believe that the terms "collision" as applied to fingerprints normally

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Peter S. Shenkin
It is very far from a solved problem, since it depends strongly on the interactions within the crystal. And it’s not terribly uncommon for a drug-like compound to exhibit different crystal forms, each with its own melting point and solubility. This has been an issue for drug formulation, where you

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Chris Earnshaw
Hi It sounds to me like you're already getting better results than you could reasonably expect. Prediction of melting point is a phenomenally difficult thing to do; you're trying to find the temperature at which a (generally undefined) solid crystalline phase is in equilibrium with a (probably

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Thomas Evangelidis
Your radius and bitvector lengths are too small for such a big training set. You probably have bit collisions or the radius is not enough to capture the differences in substructures, that's why you see that artifact. Try radius 3, bitvector length 4096. I think that you have enough training

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Pavel
Hi Michal,   I think if you can provide several examples of structures having identical bitstrings this will help a lot to better understand the issue. Pavel. On 10/10/18 14:15, Michal Krompiec wrote: Hi Thomas, Radius 2, 2048 bits, 5200 data points. On Wed, 10 Oct 2018 at 13:13, Thomas

Re: [Rdkit-discuss] Fingerprint collision and machine learning

2018-10-10 Thread Michal Krompiec
Hi Thomas, Radius 2, 2048 bits, 5200 data points. On Wed, 10 Oct 2018 at 13:13, Thomas Evangelidis wrote: > What's your bitvector length and radius? How many training samples do you > have? > > On Wed, 10 Oct 2018 at 13:51, Michal Krompiec > wrote: > >> Hi all, >> I have a slightly off-topic