Dear all, this seems trivial, but it may also be worth checking the sanity of original melting point data, crystallographers sometimes enter the melting point in degree Celsius, when degree Fahrenheit is expected, so cross checking with the crystallization temperature etc. can be quite useful. Best wishes,Maria
On Wednesday, 10 October 2018, 15:28:06 BST, Michal Krompiec <michal.kromp...@gmail.com> wrote: Dear All,Thank you all very much for your feedback! Actually, the number of collisions didn't decrease when I increased the bit length, though increasing radius to 3 did help a bit. Overall, it is good to know that great results are not to be expected.Best wishes,Michal On Wed, 10 Oct 2018 at 13:31, Chris Earnshaw <cgearns...@gmail.com> wrote: Hi It sounds to me like you're already getting better results than you could reasonably expect. Prediction of melting point is a phenomenally difficult thing to do; you're trying to find the temperature at which a (generally undefined) solid crystalline phase is in equilibrium with a (probably even less defined) liquid phase. You also need to consider that the crystalline form of your solid phase is not necessarily truly constant - what polymorph is involved? Melting points of alternative polymorphs can be radically different and this is one of the real bugbears of pharmaceutical and agrochemical development. If you haven't found the most stable form early in the development process there can be very nasty surprises downstream. Expecting to handle all these challenges with a descriptor as simple as a molecular fingerprint - regardless of bit-length, collisions etc. is probably over optimistic... Regards,Chris Earnshaw On Wed, 10 Oct 2018 at 13:16, Michal Krompiec <michal.kromp...@gmail.com> wrote: Hi Thomas,Radius 2, 2048 bits, 5200 data points. On Wed, 10 Oct 2018 at 13:13, Thomas Evangelidis <teva...@gmail.com> wrote: What's your bitvector length and radius? How many training samples do you have? On Wed, 10 Oct 2018 at 13:51, Michal Krompiec <michal.kromp...@gmail.com> wrote: Hi all,I have a slightly off-topic question. I'm trying to train a neural network on a dataset of small molecules and their melting points. I did get a not-so-bad accuracy with Morgan fingerprints, but I've realised that regardless of FP radius and bitvector length, several dozen molecules have the same fingerprints but wildly different melting points. I am pretty sure this is a "solved problem" so I don't want to reinvent the wheel. What is the recommended/usual way of dealing with this?Thanks,Michal _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- ====================================================================== Dr Thomas Evangelidis Research Scientist IOCB - Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences Prague, Czech Republic & CEITEC - Central European Institute of Technology Brno, Czech Republic email: teva...@gmail.com website:https://sites.google.com/site/thomasevangelidishomepage/ _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss