Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol
HI JP, you are welcome, thanks a lot for reporting the problem with a reproducible! No need to bother filing a GitHub issue, I have already done that and also submitted a fix: https://github.com/rdkit/rdkit/pull/4282 Reionizing is good to make sure that charges are shuffled around if needed and localized on the most appropriate groups based on their acidity/basicity. I normally run the Reionizer as part of the standardization pipeline, even though in most cases it will not actually do anything to the molecule. Cheers, p. On Mon, Jun 28, 2021 at 10:43 AM JP Ebejer wrote: > Hi Paolo! > > Nice to hear from you -- and thanks for the lightning-fix+working > example. Very helpful as usual. (I don't imagine you need me to open a > github issue on this, but I'd be happy to if you think that is helpful/want > to keep a record). > > Any thoughts on whether it is useful to reionize after neutralizing > charges in the pipeline above? > > Many thanks, > > On Thu, 24 Jun 2021 at 18:58, Paolo Tosco > wrote: > >> Hi JP, >> >> the problem is caused by the reaction SMARTS that standardizes pyridine >> *N*-oxides being not very specific and also hitting your molecule, which >> is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion. >> I will submit a PR to fix the reaction pattern; in the meantime you can >> fix the problem by loading a custom list of normalization reaction SMARTS >> as shown in this gist: >> >> https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f >> >> HTH, cheers >> p. >> >> On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer >> wrote: >> >>> Apologies I took my sweet time to reply, I went down the standardization >>> rabbit-hole and went through most of the material (thanks Matthew and >>> Francois, but also links from other notebooks). The recording of the >>> OpenScience session is excellent and crystal clear as usual Greg. I >>> enjoyed that. >>> >>> I have collated code to do the standardization as follows (I am putting >>> this here, for when my future self searches this list for the same thing in >>> 6 years time*): >>> >>> 0. Cleanup >>> 1. FragmentParent >>> 2. Uncharge >>> 3. Canonicalize Tautomer >>> >>> My only question left, is whether I should reionize between steps 2 and >>> 3. What do you think? My opinion is, probably, that there is no harm in >>> doing so (so I should do it). Earlier, Greg said that cleanup does >>> reionization, but perhaps it is worth redoing after the uncharge step? Or >>> is this just a waste of CPU cycles? Any thoughts? >>> >>> Also, there is something slightly weird going on. A (successfully) >>> sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when >>> passed to Cleanup(...) starts spitting out can't kekulize errors. I have >>> created a jupyter notebook to highlight this; >>> https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b. >>> Any ideas what is going on? IMHO cleanup should not choke on sanitized >>> (correct) molecules. Is there a way to catch when these errors happen? As >>> a bonus, FragmentParent(...) on the original sanitized molecule also >>> exhibits this unexpected behaviour (not shown in the notebook). Could this >>> be because it's doing an internal cleanup? >>> >>> * The exact code is here: >>> https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/ >>> >>> >>> >>> >>> On Fri, 18 Jun 2021 at 15:08, Greg Landrum >>> wrote: >>> Hi JP, On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer wrote: > > I am trying to standardize(/normalize?) some molecules from different > sources, to generate a set of descriptors for them. I have done this a > number of times, and each time I find the process slightly confusing. I > have the following questions please, if you don't mind: > > As a starting point in case you want more information about this topic. I did a webinar/presentation on this topic earlier this year as part of the RSC Open Science series. My materials for that are in github: https://github.com/greglandrum/RSC_OpenScience_Standardization_202104 and there's a youtube recording: https://www.youtube.com/watch?v=eWTApNX8dJQ > 1. What is the relation between molvs and rdkit (I remember there was > an integration project between the two a while back). When I call > rdMolStandardize does rdkit code or molvs code get called? The github > repo > for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize > has. > When you call operations from rdMolStandardize it invokes RDKit code. That code was started by Susan Leung as a Google Summer of Code project and we have continued to improve and expand that code since then. > 2. What is the difference between standardization and normalization > of a molecule? Does one automatically imply the other or should these two > processes be
Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol
Hi Paolo! Nice to hear from you -- and thanks for the lightning-fix+working example. Very helpful as usual. (I don't imagine you need me to open a github issue on this, but I'd be happy to if you think that is helpful/want to keep a record). Any thoughts on whether it is useful to reionize after neutralizing charges in the pipeline above? Many thanks, On Thu, 24 Jun 2021 at 18:58, Paolo Tosco wrote: > Hi JP, > > the problem is caused by the reaction SMARTS that standardizes pyridine > *N*-oxides being not very specific and also hitting your molecule, which > is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion. > I will submit a PR to fix the reaction pattern; in the meantime you can > fix the problem by loading a custom list of normalization reaction SMARTS > as shown in this gist: > > https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f > > HTH, cheers > p. > > On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer > wrote: > >> Apologies I took my sweet time to reply, I went down the standardization >> rabbit-hole and went through most of the material (thanks Matthew and >> Francois, but also links from other notebooks). The recording of the >> OpenScience session is excellent and crystal clear as usual Greg. I >> enjoyed that. >> >> I have collated code to do the standardization as follows (I am putting >> this here, for when my future self searches this list for the same thing in >> 6 years time*): >> >> 0. Cleanup >> 1. FragmentParent >> 2. Uncharge >> 3. Canonicalize Tautomer >> >> My only question left, is whether I should reionize between steps 2 and >> 3. What do you think? My opinion is, probably, that there is no harm in >> doing so (so I should do it). Earlier, Greg said that cleanup does >> reionization, but perhaps it is worth redoing after the uncharge step? Or >> is this just a waste of CPU cycles? Any thoughts? >> >> Also, there is something slightly weird going on. A (successfully) >> sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when >> passed to Cleanup(...) starts spitting out can't kekulize errors. I have >> created a jupyter notebook to highlight this; >> https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b. >> Any ideas what is going on? IMHO cleanup should not choke on sanitized >> (correct) molecules. Is there a way to catch when these errors happen? As >> a bonus, FragmentParent(...) on the original sanitized molecule also >> exhibits this unexpected behaviour (not shown in the notebook). Could this >> be because it's doing an internal cleanup? >> >> * The exact code is here: >> https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/ >> >> >> >> >> On Fri, 18 Jun 2021 at 15:08, Greg Landrum >> wrote: >> >>> Hi JP, >>> >>> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer >>> wrote: >>> I am trying to standardize(/normalize?) some molecules from different sources, to generate a set of descriptors for them. I have done this a number of times, and each time I find the process slightly confusing. I have the following questions please, if you don't mind: >>> As a starting point in case you want more information about this topic. >>> I did a webinar/presentation on this topic earlier this year as part of >>> the RSC Open Science series. >>> >>> My materials for that are in github: >>> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104 >>> and there's a youtube recording: >>> https://www.youtube.com/watch?v=eWTApNX8dJQ >>> >>> >>> 1. What is the relation between molvs and rdkit (I remember there was an integration project between the two a while back). When I call rdMolStandardize does rdkit code or molvs code get called? The github repo for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has. >>> >>> When you call operations from rdMolStandardize it invokes RDKit code. >>> That code was started by Susan Leung as a Google Summer of Code project and >>> we have continued to improve and expand that code since then. >>> >>> 2. What is the difference between standardization and normalization of a molecule? Does one automatically imply the other or should these two processes be both run on a molecule? >>> >>> I would be surprised if there were universal agreement about this, but >>> when I use the terms normalization typically refers to making changes to >>> molecules to get "functional groups" (loosely defined) into a normal form, >>> while standardization is getting the molecules into a standard form in >>> preparation for doing something with them. Normalization is often part of >>> standardization, standardization can also include things like stripping >>> salts, neutralizing molecules, etc. >>> Normalization involves applying transformations like converting -N(=O)=O >>> to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O; >>> >>> 3. Specifically, what is the
Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol
Hi JP, the problem is caused by the reaction SMARTS that standardizes pyridine *N*-oxides being not very specific and also hitting your molecule, which is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion. I will submit a PR to fix the reaction pattern; in the meantime you can fix the problem by loading a custom list of normalization reaction SMARTS as shown in this gist: https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f HTH, cheers p. On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer wrote: > Apologies I took my sweet time to reply, I went down the standardization > rabbit-hole and went through most of the material (thanks Matthew and > Francois, but also links from other notebooks). The recording of the > OpenScience session is excellent and crystal clear as usual Greg. I > enjoyed that. > > I have collated code to do the standardization as follows (I am putting > this here, for when my future self searches this list for the same thing in > 6 years time*): > > 0. Cleanup > 1. FragmentParent > 2. Uncharge > 3. Canonicalize Tautomer > > My only question left, is whether I should reionize between steps 2 and > 3. What do you think? My opinion is, probably, that there is no harm in > doing so (so I should do it). Earlier, Greg said that cleanup does > reionization, but perhaps it is worth redoing after the uncharge step? Or > is this just a waste of CPU cycles? Any thoughts? > > Also, there is something slightly weird going on. A (successfully) > sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when > passed to Cleanup(...) starts spitting out can't kekulize errors. I have > created a jupyter notebook to highlight this; > https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b. > Any ideas what is going on? IMHO cleanup should not choke on sanitized > (correct) molecules. Is there a way to catch when these errors happen? As > a bonus, FragmentParent(...) on the original sanitized molecule also > exhibits this unexpected behaviour (not shown in the notebook). Could this > be because it's doing an internal cleanup? > > * The exact code is here: > https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/ > > > > > On Fri, 18 Jun 2021 at 15:08, Greg Landrum wrote: > >> Hi JP, >> >> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer >> wrote: >> >>> >>> I am trying to standardize(/normalize?) some molecules from different >>> sources, to generate a set of descriptors for them. I have done this a >>> number of times, and each time I find the process slightly confusing. I >>> have the following questions please, if you don't mind: >>> >>> >> As a starting point in case you want more information about this topic. >> I did a webinar/presentation on this topic earlier this year as part of >> the RSC Open Science series. >> >> My materials for that are in github: >> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104 >> and there's a youtube recording: >> https://www.youtube.com/watch?v=eWTApNX8dJQ >> >> >> >>> 1. What is the relation between molvs and rdkit (I remember there was >>> an integration project between the two a while back). When I call >>> rdMolStandardize does rdkit code or molvs code get called? The github repo >>> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has. >>> >> >> When you call operations from rdMolStandardize it invokes RDKit code. >> That code was started by Susan Leung as a Google Summer of Code project and >> we have continued to improve and expand that code since then. >> >> >>> 2. What is the difference between standardization and normalization of >>> a molecule? Does one automatically imply the other or should these two >>> processes be both run on a molecule? >>> >> >> I would be surprised if there were universal agreement about this, but >> when I use the terms normalization typically refers to making changes to >> molecules to get "functional groups" (loosely defined) into a normal form, >> while standardization is getting the molecules into a standard form in >> preparation for doing something with them. Normalization is often part of >> standardization, standardization can also include things like stripping >> salts, neutralizing molecules, etc. >> Normalization involves applying transformations like converting -N(=O)=O >> to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O; >> >> >>> 3. Specifically, what is the difference between >>> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol), >>> rdMolStandardize.Normalize(mol). Should I call any of these manually three >>> after I run "standardization/cleaning operations" such as uncharging, >>> reionizing, etc? >>> >> >> SanitizeMol() is different from the others: it does a small amount of >> normalization - fixing groups like nitro which are commonly drawn in a >> hypervalent state but which can be represented in a charge-separated form >> without needing weird valences - and some
Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol
Apologies I took my sweet time to reply, I went down the standardization rabbit-hole and went through most of the material (thanks Matthew and Francois, but also links from other notebooks). The recording of the OpenScience session is excellent and crystal clear as usual Greg. I enjoyed that. I have collated code to do the standardization as follows (I am putting this here, for when my future self searches this list for the same thing in 6 years time*): 0. Cleanup 1. FragmentParent 2. Uncharge 3. Canonicalize Tautomer My only question left, is whether I should reionize between steps 2 and 3. What do you think? My opinion is, probably, that there is no harm in doing so (so I should do it). Earlier, Greg said that cleanup does reionization, but perhaps it is worth redoing after the uncharge step? Or is this just a waste of CPU cycles? Any thoughts? Also, there is something slightly weird going on. A (successfully) sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when passed to Cleanup(...) starts spitting out can't kekulize errors. I have created a jupyter notebook to highlight this; https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b. Any ideas what is going on? IMHO cleanup should not choke on sanitized (correct) molecules. Is there a way to catch when these errors happen? As a bonus, FragmentParent(...) on the original sanitized molecule also exhibits this unexpected behaviour (not shown in the notebook). Could this be because it's doing an internal cleanup? * The exact code is here: https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/ On Fri, 18 Jun 2021 at 15:08, Greg Landrum wrote: > Hi JP, > > On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer wrote: > >> >> I am trying to standardize(/normalize?) some molecules from different >> sources, to generate a set of descriptors for them. I have done this a >> number of times, and each time I find the process slightly confusing. I >> have the following questions please, if you don't mind: >> >> > As a starting point in case you want more information about this topic. > I did a webinar/presentation on this topic earlier this year as part of > the RSC Open Science series. > > My materials for that are in github: > https://github.com/greglandrum/RSC_OpenScience_Standardization_202104 > and there's a youtube recording: > https://www.youtube.com/watch?v=eWTApNX8dJQ > > > >> 1. What is the relation between molvs and rdkit (I remember there was an >> integration project between the two a while back). When I call >> rdMolStandardize does rdkit code or molvs code get called? The github repo >> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has. >> > > When you call operations from rdMolStandardize it invokes RDKit code. That > code was started by Susan Leung as a Google Summer of Code project and we > have continued to improve and expand that code since then. > > >> 2. What is the difference between standardization and normalization of a >> molecule? Does one automatically imply the other or should these two >> processes be both run on a molecule? >> > > I would be surprised if there were universal agreement about this, but > when I use the terms normalization typically refers to making changes to > molecules to get "functional groups" (loosely defined) into a normal form, > while standardization is getting the molecules into a standard form in > preparation for doing something with them. Normalization is often part of > standardization, standardization can also include things like stripping > salts, neutralizing molecules, etc. > Normalization involves applying transformations like converting -N(=O)=O > to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O; > > >> 3. Specifically, what is the difference between >> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol), >> rdMolStandardize.Normalize(mol). Should I call any of these manually three >> after I run "standardization/cleaning operations" such as uncharging, >> reionizing, etc? >> > > SanitizeMol() is different from the others: it does a small amount of > normalization - fixing groups like nitro which are commonly drawn in a > hypervalent state but which can be represented in a charge-separated form > without needing weird valences - and some validation - rejecting molecules > with atoms that have non-physical valences, rejecting molecules that cannot > be kekulized - and a bunch of chemistry perception - ring finding, > calculating valences, finding aromatic systems, etc. > > rdMolStandardize.Normalize() applies a bunch of standard transformations > to a molecule. > > rdMolStandardize.Cleanup() does a number of standardization operations: > - removeHs > - disconnect metal atoms > - normalize the molecule > - reionize the molecule > > 4. I understand what uncharge does, but what does reionizer do? >> > > Reionizing does two things: > 1. adds a charge to a small set of free atoms which are likely
Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol
Dear JP, To confuse you even more, you can also have a look at the ChEMBL open-source molecular standardizer: https://github.com/chembl/ChEMBL_Structure_Pipeline/blob/master/chembl_structure_pipeline/standardizer.py No need to thank me. :D On 18/06/2021 03:12, JP Ebejer wrote: Dear all, I am trying to standardize(/normalize?) some molecules from different sources, to generate a set of descriptors for them. I have done this a number of times, and each time I find the process slightly confusing. I have the following questions please, if you don't mind: 1. What is the relation between molvs and rdkit (I remember there was an integration project between the two a while back). When I call rdMolStandardize does rdkit code or molvs code get called? The github repo for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has. 2. What is the difference between standardization and normalization of a molecule? Does one automatically imply the other or should these two processes be both run on a molecule? 3. Specifically, what is the difference between rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol), rdMolStandardize.Normalize(mol). Should I call any of these manually three after I run "standardization/cleaning operations" such as uncharging, reionizing, etc? 4. I understand what uncharge does, but what does reionizer do? 5. Is there a way to chain operations together standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order makes sense here), other than creating a class instance for each calling the method, returning a new mol and using this mol in the next operation? Apologies for the many questions. Have I missed the documentation about this? I have found some excellent examples here: https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb (thanks!). This is not exactly a cleaning pipeline, but still quite helpful to understand these methods. Many thanks, JP ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol
Hi JP, On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer wrote: > > I am trying to standardize(/normalize?) some molecules from different > sources, to generate a set of descriptors for them. I have done this a > number of times, and each time I find the process slightly confusing. I > have the following questions please, if you don't mind: > > As a starting point in case you want more information about this topic. I did a webinar/presentation on this topic earlier this year as part of the RSC Open Science series. My materials for that are in github: https://github.com/greglandrum/RSC_OpenScience_Standardization_202104 and there's a youtube recording: https://www.youtube.com/watch?v=eWTApNX8dJQ > 1. What is the relation between molvs and rdkit (I remember there was an > integration project between the two a while back). When I call > rdMolStandardize does rdkit code or molvs code get called? The github repo > for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has. > When you call operations from rdMolStandardize it invokes RDKit code. That code was started by Susan Leung as a Google Summer of Code project and we have continued to improve and expand that code since then. > 2. What is the difference between standardization and normalization of a > molecule? Does one automatically imply the other or should these two > processes be both run on a molecule? > I would be surprised if there were universal agreement about this, but when I use the terms normalization typically refers to making changes to molecules to get "functional groups" (loosely defined) into a normal form, while standardization is getting the molecules into a standard form in preparation for doing something with them. Normalization is often part of standardization, standardization can also include things like stripping salts, neutralizing molecules, etc. Normalization involves applying transformations like converting -N(=O)=O to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O; > 3. Specifically, what is the difference between > rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol), > rdMolStandardize.Normalize(mol). Should I call any of these manually three > after I run "standardization/cleaning operations" such as uncharging, > reionizing, etc? > SanitizeMol() is different from the others: it does a small amount of normalization - fixing groups like nitro which are commonly drawn in a hypervalent state but which can be represented in a charge-separated form without needing weird valences - and some validation - rejecting molecules with atoms that have non-physical valences, rejecting molecules that cannot be kekulized - and a bunch of chemistry perception - ring finding, calculating valences, finding aromatic systems, etc. rdMolStandardize.Normalize() applies a bunch of standard transformations to a molecule. rdMolStandardize.Cleanup() does a number of standardization operations: - removeHs - disconnect metal atoms - normalize the molecule - reionize the molecule 4. I understand what uncharge does, but what does reionizer do? > Reionizing does two things: 1. adds a charge to a small set of free atoms which are likely counterions. These include Na, Mg, Cl, etc. 1a. if the above added a positive charge: remove an H from an acidic group to neutrailze the positive charge that was added. 2. Moves negative charges from less acidic groups to more acidic groups. 5. Is there a way to chain operations together > standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order > makes sense here), other than creating a class instance for each calling > the method, returning a new mol and using this mol in the next operation? > The easy "pipeline" type functions in rdMolStandardize are the xxxParent functions. - fragmentParent: cleanup(), pick largest fragment - chargeParent: fragmentParent(); uncharge() Note that this list will be more complete in the 2021.09 release. > > Apologies for the many questions. Have I missed the documentation about > this? I have found some excellent examples here: > https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb > (thanks!). This is not exactly a cleaning pipeline, but still quite > helpful to understand these methods. > > The github link I provide above has some more up-to-date information about what the code currently does. This all needs to land in the RDKit documentation -greg ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol
Hi JP, Lots of good questions, and it is quite an involved topic. I'll let others who are more knowledgeable of the background answer questions on the history and relationship between the tools. One resource that may be helpful is the https://github.com/chembl/ChEMBL_Structure_Pipeline repo, which calls many of the functions you mentioned. Looking into the code explains the order or steps quite well. It also has an open access article linked in the README, that explains at least how one group (ChEMBL) handles the process. https://doi.org/10.1186/s13321-020-00456-1 Best, Matt On Thu, Jun 17, 2021 at 2:37 PM JP Ebejer wrote: > Dear all, > > I am trying to standardize(/normalize?) some molecules from different > sources, to generate a set of descriptors for them. I have done this a > number of times, and each time I find the process slightly confusing. I > have the following questions please, if you don't mind: > > 1. What is the relation between molvs and rdkit (I remember there was an > integration project between the two a while back). When I call > rdMolStandardize does rdkit code or molvs code get called? The github repo > for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has. > 2. What is the difference between standardization and normalization of a > molecule? Does one automatically imply the other or should these two > processes be both run on a molecule? > 3. Specifically, what is the difference between > rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol), > rdMolStandardize.Normalize(mol). Should I call any of these manually three > after I run "standardization/cleaning operations" such as uncharging, > reionizing, etc? > 4. I understand what uncharge does, but what does reionizer do? > 5. Is there a way to chain operations together > standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order > makes sense here), other than creating a class instance for each calling > the method, returning a new mol and using this mol in the next operation? > > Apologies for the many questions. Have I missed the documentation about > this? I have found some excellent examples here: > https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb > (thanks!). This is not exactly a cleaning pipeline, but still quite > helpful to understand these methods. > > Many thanks, > JP > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss