The paper is pretty vague on implementation details. However, note that the code is copyright Novartis Institutes for BioMedical Research Inc. It was released in the public domain and at that point (2013) it was the implementation that was used internally at Novartis. You can therefore use the Python implementation in RDKit as the reference for this method. I would not spend any more time on finding the discrepancy.
Best, Peter > On Nov 15, 2020, at 11:01 AM, Gustavo Seabra <gustavo.sea...@gmail.com> wrote: > > So, basically, your code perfectly reproduces RDKit's Python implementation. > However, those results (both yours and RDKit's) *do not* match the original > paper. > > It foes look like a constant shift, but it is not: Some molecules have a > different shift than others. > > Questions: > > 1. Are those the same molecules as in the original paper? > 2. How well defined are the equations in the original paper? > > I'm guessing the RDKit's implementation is *not* 100% the same as in the > original paper, as is stated in the guthub page > (https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py > <https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py>) > > # several small modifications to the original paper are included > # particularly slightly different formula for marocyclic penalty > # and taking into account also molecule symmetry (fingerprint density) > > > -- > Gustavo Seabra > From: Steven Pak <steven....@stonybrook.edu> > Sent: Saturday, November 14, 2020 12:20:47 PM > To: Greg Landrum <greg.land...@gmail.com> > Cc: rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net> > Subject: Re: [Rdkit-discuss] Hello questions about the Synthetic > Accessibility score > > Blue dots are RDKit-based python code vs My CPP implementation code. Orange > dots are My CPP implementation code vs scores extracted from the original > paper ( Estimation of synthetic accessibility score of drug-like molecules > based on molecular complexity and fragment contributions). My CPP > implementation of the SA_score is based on the python version of RDKIT. I am > trying to match the values exactly the same as the RDKit version (which > appears to be working). That is why I am a bit confused about why the orange > dots appear to shift at a constant value. I am wondering as to why it shifts > like that. > > As for the open source comment, I will let you know. I also did the same > thing for QED scoring functions, and I have a couple of questions about that > too, which I will send an email soon. I must talk to my team about this > before we could step forward. > > Thanks! > > On Sat, Nov 14, 2020 at 2:29 AM Greg Landrum <greg.land...@gmail.com > <mailto:greg.land...@gmail.com>> wrote: > Steven, > > Wow cool! Any thoughts about making that implementation open source? > > Did you recalculate the Python SA score with the same version of the RDKit > you used for the CPP version? Did you do your implementation based on the > Python code (hopefully) or the algorithm description in the paper? > > If the answer to both those questionsthat is “yes”, then I’m going to guess > we’d need to see the code to diagnose the problem > > Best, > -greg > > On Sat, 14 Nov 2020 at 00:06, Steven Pak <steven....@stonybrook.edu > <mailto:steven....@stonybrook.edu>> wrote: > Hello. > > I have been working on a CPP version of SA score. Results are fantastic! > <image.png> > As you can see in the image, the blue dots represent the SA_scores from > python vs scores from my CPP version. The scores are perfectly in line with > each other, which is great! However, for the orange dots, these are the > values from RDKit vs original paper's. These are the original 40 compounds > that I found in the original paper. I was just wondering why do the orange > dots seem to have a constant shift throughout the graph? What part of the > code was changed to have caused this? I am just curious. > > Thank you, > -- > Steven Pak Pharm.D > Ph.D Student | Rizzo Lab > Stony Brook University (SUNY) > Department of Pharmacological Sciences > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > <mailto:Rdkit-discuss@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss> > > > -- > Steven Pak Pharm.D > Ph.D Student | Rizzo Lab > Stony Brook University (SUNY) > Department of Pharmacological Sciences > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss