Hi Paolo!

Nice to hear from you -- and thanks for the lightning-fix+working example.
Very helpful as usual.  (I don't imagine you need me to open a github issue
on this, but I'd be happy to if you think that is helpful/want to keep
a record).

Any thoughts on whether it is useful to reionize after neutralizing charges
in the pipeline above?

Many thanks,

On Thu, 24 Jun 2021 at 18:58, Paolo Tosco <paolo.tosco.m...@gmail.com>
wrote:

> Hi JP,
>
> the problem is caused by the reaction SMARTS that standardizes pyridine
> *N*-oxides being not very specific and also hitting your molecule, which
> is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion.
> I will submit a PR to fix the reaction pattern; in the meantime you can
> fix the problem by loading a custom list of normalization reaction SMARTS
> as shown in this gist:
>
> https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f
>
> HTH, cheers
> p.
>
> On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer <jean.p.ebe...@um.edu.mt>
> wrote:
>
>> Apologies I took my sweet time to reply, I went down the standardization
>> rabbit-hole and went through most of the material (thanks Matthew and
>> Francois, but also links from other notebooks).  The recording of the
>> OpenScience session is excellent and crystal clear as usual Greg.  I
>> enjoyed that.
>>
>> I have collated code to do the standardization as follows (I am putting
>> this here, for when my future self searches this list for the same thing in
>> 6 years time*):
>>
>> 0. Cleanup
>> 1. FragmentParent
>> 2. Uncharge
>> 3. Canonicalize Tautomer
>>
>> My only question left, is whether I should reionize between steps 2 and
>> 3.  What do you think?  My opinion is, probably, that there is no harm in
>> doing so (so I should do it).  Earlier, Greg said that cleanup does
>> reionization, but perhaps it is worth redoing after the uncharge step?  Or
>> is this just a waste of CPU cycles?  Any thoughts?
>>
>> Also, there is something slightly weird going on.  A (successfully)
>> sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when
>> passed to Cleanup(...) starts spitting out can't kekulize errors.  I have
>> created a jupyter notebook to highlight this;
>> https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b.
>> Any ideas what is going on?  IMHO cleanup should not choke on sanitized
>> (correct) molecules.  Is there a way to catch when these errors happen?  As
>> a bonus, FragmentParent(...) on the original sanitized molecule also
>> exhibits this unexpected behaviour (not shown in the notebook). Could this
>> be because it's doing an internal cleanup?
>>
>> * The exact code is here:
>> https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 15:08, Greg Landrum <greg.land...@gmail.com>
>> wrote:
>>
>>> Hi JP,
>>>
>>> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer <jean.p.ebe...@um.edu.mt>
>>> wrote:
>>>
>>>>
>>>> I am trying to standardize(/normalize?) some molecules from different
>>>> sources, to generate a set of descriptors for them.  I have done this a
>>>> number of times, and each time I find the process slightly confusing.  I
>>>> have the following questions please, if you don't mind:
>>>>
>>>>
>>> As a starting point in case you want more information about this topic.
>>> I did a webinar/presentation on this topic earlier this year as part of
>>> the RSC Open Science series.
>>>
>>> My materials for that are in github:
>>> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
>>> and there's a youtube recording:
>>> https://www.youtube.com/watch?v=eWTApNX8dJQ
>>>
>>>
>>>
>>>> 1.  What is the relation between molvs and rdkit (I remember there was
>>>> an integration project between the two a while back).  When I call
>>>> rdMolStandardize does rdkit code or molvs code get called?  The github repo
>>>> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
>>>>
>>>
>>> When you call operations from rdMolStandardize it invokes RDKit code.
>>> That code was started by Susan Leung as a Google Summer of Code project and
>>> we have continued to improve and expand that code since then.
>>>
>>>
>>>> 2.  What is the difference between standardization and normalization of
>>>> a molecule?  Does one automatically imply the other or should these two
>>>> processes be both run on a molecule?
>>>>
>>>
>>> I would be surprised if there were universal agreement about this, but
>>> when I use the terms normalization typically refers to making changes to
>>> molecules to get "functional groups" (loosely defined) into a normal form,
>>> while standardization is getting the molecules into a standard form in
>>> preparation for doing something with them. Normalization is often part of
>>> standardization, standardization can also include things like stripping
>>> salts, neutralizing molecules, etc.
>>> Normalization involves applying transformations like converting -N(=O)=O
>>> to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O;
>>>
>>>
>>>> 3.  Specifically, what is the difference between
>>>> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
>>>> rdMolStandardize.Normalize(mol).  Should I call any of these manually three
>>>> after I run "standardization/cleaning operations" such as uncharging,
>>>> reionizing, etc?
>>>>
>>>
>>> SanitizeMol() is different from the others: it does a small amount of
>>> normalization - fixing groups like nitro which are commonly drawn in a
>>> hypervalent state but which can be represented in a charge-separated form
>>> without needing weird valences - and some validation - rejecting molecules
>>> with atoms that have non-physical valences, rejecting molecules that cannot
>>> be kekulized - and a bunch of chemistry perception - ring finding,
>>> calculating valences, finding aromatic systems, etc.
>>>
>>> rdMolStandardize.Normalize() applies a bunch of standard transformations
>>> to a molecule.
>>>
>>> rdMolStandardize.Cleanup() does a number of standardization operations:
>>> - removeHs
>>> - disconnect metal atoms
>>> - normalize the molecule
>>> - reionize the molecule
>>>
>>> 4.  I understand what uncharge does, but what does reionizer do?
>>>>
>>>
>>> Reionizing does two things:
>>> 1. adds a charge to a small set of free atoms which are likely
>>> counterions. These include Na, Mg, Cl, etc.
>>> 1a. if the above added a positive charge: remove an H from an acidic
>>> group to neutrailze the positive charge that was added.
>>> 2. Moves negative charges from less acidic groups to more acidic groups.
>>>
>>> 5.  Is there a way to chain operations together
>>>> standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order
>>>> makes sense here), other than creating a class instance for each calling
>>>> the method, returning a new mol and using this mol in the next operation?
>>>>
>>>
>>> The easy "pipeline" type functions in rdMolStandardize are the xxxParent
>>> functions.
>>> - fragmentParent: cleanup(), pick largest fragment
>>> - chargeParent: fragmentParent(); uncharge()
>>>
>>> Note that this list will be more complete in the 2021.09 release.
>>>
>>>
>>>>
>>>> Apologies for the many questions.  Have I missed the documentation
>>>> about this?  I have found some excellent examples here:
>>>> https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
>>>> (thanks!).  This is not exactly a cleaning pipeline, but still quite
>>>> helpful to understand these methods.
>>>>
>>>>
>>> The github link I provide above has some more up-to-date information
>>> about what the code currently does.
>>> This all needs to land in the RDKit documentation
>>>
>>> -greg
>>>
>>>
>>
>> --
>>
>> <https://www.um.edu.mt/>
>>
>> Dr Jean-Paul Ebejer | Senior Lecturer
>>
>> BSc (Hons) (Melita), MSc (Imperial), DPhil (Oxon.)
>>
>> Centre for Molecular Medicine and Biobanking
>>
>> Office 320, Biomedical Sciences Building,
>>
>> University of Malta, Msida, MSD 2080.  MALTA.
>>
>> T: (00356) 2340 3263
>>
>>
>> *Associate Member*
>>
>> Department of Artificial Intelligence
>>
>>
>> Where am I? <https://bitsilla.com/blog/where-to-find-me/>
>>
>> [image: https://twitter.com/dr_jpe] <https://twitter.com/dr_jpe> [image:
>> https://bitsilla.com/blog/] <https://bitsilla.com/blog/> [image:
>> https://github.com/jp-um] <https://github.com/jp-um>
>>
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>

-- 

<https://www.um.edu.mt/>

Dr Jean-Paul Ebejer | Senior Lecturer

BSc (Hons) (Melita), MSc (Imperial), DPhil (Oxon.)

Centre for Molecular Medicine and Biobanking

Office 320, Biomedical Sciences Building,

University of Malta, Msida, MSD 2080.  MALTA.

T: (00356) 2340 3263


*Associate Member*

Department of Artificial Intelligence


Where am I? <https://bitsilla.com/blog/where-to-find-me/>

[image: https://twitter.com/dr_jpe] <https://twitter.com/dr_jpe> [image:
https://bitsilla.com/blog/] <https://bitsilla.com/blog/> [image:
https://github.com/jp-um] <https://github.com/jp-um>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to