Hi Paolo! Nice to hear from you -- and thanks for the lightning-fix+working example. Very helpful as usual. (I don't imagine you need me to open a github issue on this, but I'd be happy to if you think that is helpful/want to keep a record).
Any thoughts on whether it is useful to reionize after neutralizing charges in the pipeline above? Many thanks, On Thu, 24 Jun 2021 at 18:58, Paolo Tosco <paolo.tosco.m...@gmail.com> wrote: > Hi JP, > > the problem is caused by the reaction SMARTS that standardizes pyridine > *N*-oxides being not very specific and also hitting your molecule, which > is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion. > I will submit a PR to fix the reaction pattern; in the meantime you can > fix the problem by loading a custom list of normalization reaction SMARTS > as shown in this gist: > > https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f > > HTH, cheers > p. > > On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer <jean.p.ebe...@um.edu.mt> > wrote: > >> Apologies I took my sweet time to reply, I went down the standardization >> rabbit-hole and went through most of the material (thanks Matthew and >> Francois, but also links from other notebooks). The recording of the >> OpenScience session is excellent and crystal clear as usual Greg. I >> enjoyed that. >> >> I have collated code to do the standardization as follows (I am putting >> this here, for when my future self searches this list for the same thing in >> 6 years time*): >> >> 0. Cleanup >> 1. FragmentParent >> 2. Uncharge >> 3. Canonicalize Tautomer >> >> My only question left, is whether I should reionize between steps 2 and >> 3. What do you think? My opinion is, probably, that there is no harm in >> doing so (so I should do it). Earlier, Greg said that cleanup does >> reionization, but perhaps it is worth redoing after the uncharge step? Or >> is this just a waste of CPU cycles? Any thoughts? >> >> Also, there is something slightly weird going on. A (successfully) >> sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when >> passed to Cleanup(...) starts spitting out can't kekulize errors. I have >> created a jupyter notebook to highlight this; >> https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b. >> Any ideas what is going on? IMHO cleanup should not choke on sanitized >> (correct) molecules. Is there a way to catch when these errors happen? As >> a bonus, FragmentParent(...) on the original sanitized molecule also >> exhibits this unexpected behaviour (not shown in the notebook). Could this >> be because it's doing an internal cleanup? >> >> * The exact code is here: >> https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/ >> >> >> >> >> On Fri, 18 Jun 2021 at 15:08, Greg Landrum <greg.land...@gmail.com> >> wrote: >> >>> Hi JP, >>> >>> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer <jean.p.ebe...@um.edu.mt> >>> wrote: >>> >>>> >>>> I am trying to standardize(/normalize?) some molecules from different >>>> sources, to generate a set of descriptors for them. I have done this a >>>> number of times, and each time I find the process slightly confusing. I >>>> have the following questions please, if you don't mind: >>>> >>>> >>> As a starting point in case you want more information about this topic. >>> I did a webinar/presentation on this topic earlier this year as part of >>> the RSC Open Science series. >>> >>> My materials for that are in github: >>> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104 >>> and there's a youtube recording: >>> https://www.youtube.com/watch?v=eWTApNX8dJQ >>> >>> >>> >>>> 1. What is the relation between molvs and rdkit (I remember there was >>>> an integration project between the two a while back). When I call >>>> rdMolStandardize does rdkit code or molvs code get called? The github repo >>>> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has. >>>> >>> >>> When you call operations from rdMolStandardize it invokes RDKit code. >>> That code was started by Susan Leung as a Google Summer of Code project and >>> we have continued to improve and expand that code since then. >>> >>> >>>> 2. What is the difference between standardization and normalization of >>>> a molecule? Does one automatically imply the other or should these two >>>> processes be both run on a molecule? >>>> >>> >>> I would be surprised if there were universal agreement about this, but >>> when I use the terms normalization typically refers to making changes to >>> molecules to get "functional groups" (loosely defined) into a normal form, >>> while standardization is getting the molecules into a standard form in >>> preparation for doing something with them. Normalization is often part of >>> standardization, standardization can also include things like stripping >>> salts, neutralizing molecules, etc. >>> Normalization involves applying transformations like converting -N(=O)=O >>> to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O; >>> >>> >>>> 3. Specifically, what is the difference between >>>> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol), >>>> rdMolStandardize.Normalize(mol). Should I call any of these manually three >>>> after I run "standardization/cleaning operations" such as uncharging, >>>> reionizing, etc? >>>> >>> >>> SanitizeMol() is different from the others: it does a small amount of >>> normalization - fixing groups like nitro which are commonly drawn in a >>> hypervalent state but which can be represented in a charge-separated form >>> without needing weird valences - and some validation - rejecting molecules >>> with atoms that have non-physical valences, rejecting molecules that cannot >>> be kekulized - and a bunch of chemistry perception - ring finding, >>> calculating valences, finding aromatic systems, etc. >>> >>> rdMolStandardize.Normalize() applies a bunch of standard transformations >>> to a molecule. >>> >>> rdMolStandardize.Cleanup() does a number of standardization operations: >>> - removeHs >>> - disconnect metal atoms >>> - normalize the molecule >>> - reionize the molecule >>> >>> 4. I understand what uncharge does, but what does reionizer do? >>>> >>> >>> Reionizing does two things: >>> 1. adds a charge to a small set of free atoms which are likely >>> counterions. These include Na, Mg, Cl, etc. >>> 1a. if the above added a positive charge: remove an H from an acidic >>> group to neutrailze the positive charge that was added. >>> 2. Moves negative charges from less acidic groups to more acidic groups. >>> >>> 5. Is there a way to chain operations together >>>> standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order >>>> makes sense here), other than creating a class instance for each calling >>>> the method, returning a new mol and using this mol in the next operation? >>>> >>> >>> The easy "pipeline" type functions in rdMolStandardize are the xxxParent >>> functions. >>> - fragmentParent: cleanup(), pick largest fragment >>> - chargeParent: fragmentParent(); uncharge() >>> >>> Note that this list will be more complete in the 2021.09 release. >>> >>> >>>> >>>> Apologies for the many questions. Have I missed the documentation >>>> about this? I have found some excellent examples here: >>>> https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb >>>> (thanks!). This is not exactly a cleaning pipeline, but still quite >>>> helpful to understand these methods. >>>> >>>> >>> The github link I provide above has some more up-to-date information >>> about what the code currently does. >>> This all needs to land in the RDKit documentation >>> >>> -greg >>> >>> >> >> -- >> >> <https://www.um.edu.mt/> >> >> Dr Jean-Paul Ebejer | Senior Lecturer >> >> BSc (Hons) (Melita), MSc (Imperial), DPhil (Oxon.) >> >> Centre for Molecular Medicine and Biobanking >> >> Office 320, Biomedical Sciences Building, >> >> University of Malta, Msida, MSD 2080. MALTA. >> >> T: (00356) 2340 3263 >> >> >> *Associate Member* >> >> Department of Artificial Intelligence >> >> >> Where am I? <https://bitsilla.com/blog/where-to-find-me/> >> >> [image: https://twitter.com/dr_jpe] <https://twitter.com/dr_jpe> [image: >> https://bitsilla.com/blog/] <https://bitsilla.com/blog/> [image: >> https://github.com/jp-um] <https://github.com/jp-um> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > -- <https://www.um.edu.mt/> Dr Jean-Paul Ebejer | Senior Lecturer BSc (Hons) (Melita), MSc (Imperial), DPhil (Oxon.) Centre for Molecular Medicine and Biobanking Office 320, Biomedical Sciences Building, University of Malta, Msida, MSD 2080. MALTA. T: (00356) 2340 3263 *Associate Member* Department of Artificial Intelligence Where am I? <https://bitsilla.com/blog/where-to-find-me/> [image: https://twitter.com/dr_jpe] <https://twitter.com/dr_jpe> [image: https://bitsilla.com/blog/] <https://bitsilla.com/blog/> [image: https://github.com/jp-um] <https://github.com/jp-um>
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss