Re: [Rdkit-discuss] Chembience
Hello, I have releases Chembience 0.2.6 - it switches Python from 3.6 to 3.7 and updates RDKit to 2018.09.1. Just to mention it, the Docker images of all previous releases are also still available from Dockerhub. https://github.com/chembience/chembience/releases/tag/v0.2.6 https://twitter.com/markussitzmann/status/105216581521409 Markus On Tue, Apr 24, 2018 at 10:44 AM Markus Sitzmann wrote: > Hello, > > since it includes RDKit as one of its major components I am happy to > announce the first release of my new open-source project Chembience: > > A Docker-based, cloudable platform for the development of > chemoinformatics-centric web applications and microservices. > > https://github.com/chembience/chembience > > (unfortunately it is still on RDKit 2017.09_3, I failed releasing it > before 2018.03 :-) ). > > Best, > Markus > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Sometimes one sanitization is not enough?
Hi Greg, Thanks for the detailed explanation. You are right that this is not a real molecule; it came from applying a user-supplied reaction SMARTS. (The reaction SMARTS was not the best-written perhaps, but that's tangential...). I normally sanitize the products and skip those that fail the sanitization, but in this case I was surprised when the sanitized molecule caused issues later while trying to compute descriptors. I look forward to a fix, but in the meantime maybe I'll consider running SanitzeMol twice. :-) Best, Ivan On Wed, Oct 31, 2018 at 2:41 AM Greg Landrum wrote: > Hi Ivan, > > Short answer: I would not normally expect a second sanitization to fail if > the first succeeds, but your input SMILES is very odd and triggers a bug. > > This is an interesting edge case for the sanitization code because it > includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope > this came out of some computational process and isn't a "real" molecule. > You almost couldn't have picked a better example to highlight the situation > that's causing the problem here. Some form of congratulations are in order. > :-) > > Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1 > The fundamental problem is that atom 1 (the first nitrogen) has a valence > of 4 and is neutral... > If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization > process produces, I don't think you'd be surprised that the RDKit > sanitization fails (and your second call to sanitize does fail). > > To understand why it passes the first time, you need to understand the > flow of the sanitization process, described here; > https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization > Step 3, updatePropertyCache(), is the part that reports valency errors. > There's a special case in this code for aromatic atoms that allows atoms > like the N in Cn11 to pass sanitization even though they are formally > four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is > triggering that special case because atom 1 is aromatic in the input > SMILES. Incorrect aromatic rings that get through this step normally end up > getting caught later when the molecule is kekulized (step 5). In your case > there are no aromatic bonds to kekulize, so no error is thrown. The > aromaticity perception (step 6) does not consider the ring to be aromatic, > so the final molecule is the equivalent of C1=N(C)C=CN1. > > It ought to be possible to clear this in the sanitization code relatively > easily; I just need to think about it a bit and do a bunch of testing. > > -greg > > > > > > > > > On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman < > ivan.tubert-broh...@schrodinger.com> wrote: > >> Hi, >> >> I was surprised to see that a (dubious) structure that goes through >> SanitizeMol OK can fail a subsequent sanitization call: >> >> print("Start") >> mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False) >> print("Before first sanitization") >> Chem.SanitizeMol(mol) >> print("Before second sanitization") >> Chem.SanitizeMol(mol) >> print("Done") >> >> >> The output is: >> >> Start >> Before first sanitization >> Before second sanitization >> [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted >> Traceback (most recent call last): >> File "./san.py", line 9, in >> Chem.SanitizeMol(mol) >> ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is >> greater than permitted >> >> >> Is this an unavoidable aspect of the way SanitizeMol works, since it does >> several operations (Kekulize, check valencies, set aromaticity, conjugation >> and hybridization) in a certain order, or should this be considered a bug? >> >> Best, >> Ivan >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Adding certain atomic indexes to functional groups
Thanks, I'll check it out. Until now, it works pretty well! By the way, I'm curious about the RWmol. What does it do and what is the difference between GetMol() and EditableMol()? On Tue, Oct 30, 2018 at 4:36 PM Greg Landrum wrote: > Hi, > > I think this gist does what you're looking for: > https://gist.github.com/greglandrum/fd488309268cb085be218f26178e13b8 > > -greg > > > On Tue, Oct 30, 2018 at 7:20 AM Noki Lee wrote: > >> Hi rdkit-discuss, >> >> I'm struggling to add functional groups. >> >> What I want is a new function like below. >> >> Using the new func(smiles1, smiles2, smiles3, **) >> it will return as a combined molecule (smiles1 + smiles2 + smiles3) as a >> new smiles >> connecting between atomic indexes (i1, j), (i2, k). >> The smileses should be arbitrary but we know their index orders for each >> isolated smiles. >> >> For example, func('c1c1','C(=O)O','Cl', {3:0}, {5:0}) >> => 'c1cc(C(=O)O)cc(Cl)c1' (This might be worng, but result will be smiles >> format) >> >> The struggle part is to keep the persistent indexes of the core molecule. >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Sometimes one sanitization is not enough?
Hi Ivan, Short answer: I would not normally expect a second sanitization to fail if the first succeeds, but your input SMILES is very odd and triggers a bug. This is an interesting edge case for the sanitization code because it includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope this came out of some computational process and isn't a "real" molecule. You almost couldn't have picked a better example to highlight the situation that's causing the problem here. Some form of congratulations are in order. :-) Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1 The fundamental problem is that atom 1 (the first nitrogen) has a valence of 4 and is neutral... If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization process produces, I don't think you'd be surprised that the RDKit sanitization fails (and your second call to sanitize does fail). To understand why it passes the first time, you need to understand the flow of the sanitization process, described here; https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization Step 3, updatePropertyCache(), is the part that reports valency errors. There's a special case in this code for aromatic atoms that allows atoms like the N in Cn11 to pass sanitization even though they are formally four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is triggering that special case because atom 1 is aromatic in the input SMILES. Incorrect aromatic rings that get through this step normally end up getting caught later when the molecule is kekulized (step 5). In your case there are no aromatic bonds to kekulize, so no error is thrown. The aromaticity perception (step 6) does not consider the ring to be aromatic, so the final molecule is the equivalent of C1=N(C)C=CN1. It ought to be possible to clear this in the sanitization code relatively easily; I just need to think about it a bit and do a bunch of testing. -greg On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> wrote: > Hi, > > I was surprised to see that a (dubious) structure that goes through > SanitizeMol OK can fail a subsequent sanitization call: > > print("Start") > mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False) > print("Before first sanitization") > Chem.SanitizeMol(mol) > print("Before second sanitization") > Chem.SanitizeMol(mol) > print("Done") > > > The output is: > > Start > Before first sanitization > Before second sanitization > [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted > Traceback (most recent call last): > File "./san.py", line 9, in > Chem.SanitizeMol(mol) > ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is > greater than permitted > > > Is this an unavoidable aspect of the way SanitizeMol works, since it does > several operations (Kekulize, check valencies, set aromaticity, conjugation > and hybridization) in a certain order, or should this be considered a bug? > > Best, > Ivan > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss