Dear all,
this seems trivial, but it may also be worth checking the sanity of original
melting point data, crystallographers sometimes enter the melting point in
degree Celsius, when degree Fahrenheit is expected, so cross checking with the
crystallization temperature etc. can be quite useful.
Best wishes,Maria
On Wednesday, 10 October 2018, 15:28:06 BST, Michal Krompiec
<[email protected]> wrote:
Dear All,Thank you all very much for your feedback! Actually, the number of
collisions didn't decrease when I increased the bit length, though increasing
radius to 3 did help a bit. Overall, it is good to know that great results are
not to be expected.Best wishes,Michal
On Wed, 10 Oct 2018 at 13:31, Chris Earnshaw <[email protected]> wrote:
Hi
It sounds to me like you're already getting better results than you could
reasonably expect.
Prediction of melting point is a phenomenally difficult thing to do; you're
trying to find the temperature at which a (generally undefined) solid
crystalline phase is in equilibrium with a (probably even less defined) liquid
phase. You also need to consider that the crystalline form of your solid phase
is not necessarily truly constant - what polymorph is involved? Melting points
of alternative polymorphs can be radically different and this is one of the
real bugbears of pharmaceutical and agrochemical development. If you haven't
found the most stable form early in the development process there can be very
nasty surprises downstream.
Expecting to handle all these challenges with a descriptor as simple as a
molecular fingerprint - regardless of bit-length, collisions etc. is probably
over optimistic...
Regards,Chris Earnshaw
On Wed, 10 Oct 2018 at 13:16, Michal Krompiec <[email protected]> wrote:
Hi Thomas,Radius 2, 2048 bits, 5200 data points.
On Wed, 10 Oct 2018 at 13:13, Thomas Evangelidis <[email protected]> wrote:
What's your bitvector length and radius? How many training samples do you have?
On Wed, 10 Oct 2018 at 13:51, Michal Krompiec <[email protected]> wrote:
Hi all,I have a slightly off-topic question. I'm trying to train a neural
network on a dataset of small molecules and their melting points. I did get a
not-so-bad accuracy with Morgan fingerprints, but I've realised that regardless
of FP radius and bitvector length, several dozen molecules have the same
fingerprints but wildly different melting points. I am pretty sure this is a
"solved problem" so I don't want to reinvent the wheel. What is the
recommended/usual way of dealing with this?Thanks,Michal
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
======================================================================
Dr Thomas Evangelidis
Research Scientist
IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy of
Sciences
Prague, Czech Republic & CEITEC - Central European Institute of Technology
Brno, Czech Republic
email: [email protected]
website:https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss