[Rdkit-discuss] Try to reproduce a code working in January
Dear community, I try to reproduce this code https://iwatobipen.wordpress.com/2019/01/18/generate-possible-molecules-from-a-dataset-chemoinformatics-rdkit/ but got an error un panda / rdkit during generation: frame = frame[["ROMol", "Smiles", "Core", "R1", "R2", "R3"]] frame['Core']=frame['Core'].apply(Chem.RemoveHs) frame.head(2) RDKit ERROR: [05:02:02] RDKit ERROR: RDKit ERROR: RDKit ERROR: Pre-condition Violation RDKit ERROR: getExplicitValence() called without call to calcExplicitValence() RDKit ERROR: Violation occurred on line 161 in file /opt/conda/conda-bld/rdkit_1561471048963/work/Code/GraphMol/Atom.cpp RDKit ERROR: Failed Expression: d_explicitValence > -1 RDKit ERROR: RDKit ERROR: RDKit ERROR: [05:05:04] Explicit valence for atom # 6 N, 5, is greater than permitted --- ValueErrorTraceback (most recent call last) in 1 frame = frame[["ROMol", "Smiles", "Core", "R1", "R2", "R3"]] > 2 frame['Core']=frame['Core'].apply(Chem.RemoveHs) 3 frame.head(2) ~/miniconda/envs/py37/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds) 3589 else: 3590 values = self.astype(object).values -> 3591 mapped = lib.map_infer(values, f, convert=convert_dtype) 3592 3593 if len(mapped) and isinstance(mapped[0], Series): pandas/_libs/lib.pyx in pandas._libs.lib.map_infer() ValueError: Sanitization error: Explicit valence for atom # 6 N, 5, is greater than permitted Any idea why ? BR Guillaume *** DISCLAIMER This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Firmenich, are confidential and solely for the use of the intended recipient. The contents do not represent the opinion of Firmenich except to the extent that it relates to their official business. *** ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()
Hi Ivan, I agree that there is a bug here, but I think the problem is actually that the double bond is being assigned stereochemistry at all in this case. In [2]: m = Chem.MolFromSmiles('[H]/C=C/F') In [3]: m.Debug() Atoms: 0 1 H chg: 0 deg: 1 exp: 1 imp: 0 hyb: 1 arom?: 0 chi: 0 1 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0 2 6 C chg: 0 deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0 3 9 F chg: 0 deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0 Bonds: 0 0->1 order: 1 dir: 4 conj?: 0 aromatic?: 0 1 1->2 order: 2 stereo: 3 stereoAts: (0 3) conj?: 0 aromatic?: 0 2 2->3 order: 1 dir: 4 conj?: 0 aromatic?: 0 Given that the two substituents on the first C are the same, the double bond shouldn't be marked as STEREOE at all. I'll get this fixed. -greg On Wed, Nov 6, 2019 at 4:34 PM Ivan Tubert-Brohman < ivan.tubert-broh...@schrodinger.com> wrote: > Hi, > > For reasons to complicated to get into here, I ended up with a molecule > containing a =CH2 in which one of the hydrogens was explicit and had E/Z > stereo info. For example, consider [H]/C=C/F. > > I was surprised that RemoveHs() refused to remove the hydrogen, although > later I found that that's the documented behavior, and generally it makes > sense as a way to prevent the loss of stereochemical information. > > For example, compare these two: > > In [7]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]/C=C/F'))) > Out[7]: '[H]/C=C/F' > > In [8]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]C=C/F'))) > Out[8]: 'C=CF' > > A chemist would say that these two are obviously the same molecule, and > arguably the second representation is better, because a double bond ending > in =CH2 can't have geometric isomers. Maybe it's unreasonable to expect > RDKit to make that kind of inference, but still I wonder, what would be a > good automated way to get from [H]/C=C/F to C=CF? > > One idea is to add a "=CH2 cleanup" step, perhaps implemented by applying > this reaction: > > [H][C:1]=[*:2]>>[CH2:1]=[*:2] > > but perhaps there's a better way? > > Best, > Ivan > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The "confID" for "MMFFOptimizeMoleculeConfs"
Hi Paolo, Thanks for helping me! Appreciate it. Best, Leon On Tue, Nov 19, 2019 at 5:01 PM Paolo Tosco wrote: > Hi Leon, > > you are right, that's a documentation bug: The confId parameter is > actually ignored, as you have already found out. > > Thanks for reporting this, cheers > p. > > On 19/11/2019 20:56, topgunhaides . wrote: > > Hi guys, > > Does the "confID" argument actually work for "MMFFOptimizeMoleculeConfs"? > Try the following code: > > > from rdkit import Chem > from rdkit.Chem import AllChem > > mh = Chem.AddHs(Chem.MolFromSmiles('OCCCN')) > cids = AllChem.EmbedMultipleConfs(mh, numConfs=3, maxAttempts=1000, > pruneRmsThresh=0.5, numThreads=0, > randomSeed=-1) > > # try to optimize one conformer at a time in the loop: > for cid in cids: > mmffopt_1 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=cid, > maxIters=1000, > mmffVariant='MMFF94s', > numThreads=0) > print(mmffopt_1) > > # just optimize one specific conformer (ID = 0): > mmffopt_2 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=0, maxIters=1000, > mmffVariant='MMFF94s', > numThreads=0) > print(mmffopt_2) > > # Or optimize all conformers: > mmffopt_3 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=-1, maxIters=1000, > mmffVariant='MMFF94s', > numThreads=0) > print(mmffopt_3) > > > In the document for MMFFOptimizeMoleculeConfs: "confId : indicates which > conformer to optimize". However, in all three cases, it still optimize > all conformers and give me the "whole" thing: > > [(0, 1.0966514172064503), (0, -1.5120724826923375), (0, > 0.6847373779429624)] > [(0, 1.0966514171119535), (0, -1.512072483200475), (0, 0.6847373779078172)] > [(0, 1.0966514168939838), (0, -1.5120724834832924), (0, > 0.6847373779001575)] > [(0, 1.0966514168498929), (0, -1.512072483655178), (0, 0.6847371291858746)] > [(0, 1.096651416829605), (0, -1.5120724837465005), (0, 0.6847371291858746)] > > Thank you. > > Best, > Leon > > > > > ___ > Rdkit-discuss mailing > listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] The "confID" for "MMFFOptimizeMoleculeConfs"
Hi Leon, you are right, that's a documentation bug: The confId parameter is actually ignored, as you have already found out. Thanks for reporting this, cheers p. On 19/11/2019 20:56, topgunhaides . wrote: Hi guys, Does the "confID" argument actually work for "MMFFOptimizeMoleculeConfs"? Try the following code: from rdkit import Chem from rdkit.Chem import AllChem mh = Chem.AddHs(Chem.MolFromSmiles('OCCCN')) cids = AllChem.EmbedMultipleConfs(mh, numConfs=3, maxAttempts=1000, pruneRmsThresh=0.5, numThreads=0, randomSeed=-1) # try to optimize one conformer at a time in the loop: for cid in cids: mmffopt_1 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=cid, maxIters=1000, mmffVariant='MMFF94s', numThreads=0) print(mmffopt_1) # just optimize one specific conformer (ID = 0): mmffopt_2 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=0, maxIters=1000, mmffVariant='MMFF94s', numThreads=0) print(mmffopt_2) # Or optimize all conformers: mmffopt_3 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=-1, maxIters=1000, mmffVariant='MMFF94s', numThreads=0) print(mmffopt_3) In the document for MMFFOptimizeMoleculeConfs: "confId : indicates which conformer to optimize". However, in all three cases, it still optimize all conformers and give me the "whole" thing: [(0, 1.0966514172064503), (0, -1.5120724826923375), (0, 0.6847373779429624)] [(0, 1.0966514171119535), (0, -1.512072483200475), (0, 0.6847373779078172)] [(0, 1.0966514168939838), (0, -1.5120724834832924), (0, 0.6847373779001575)] [(0, 1.0966514168498929), (0, -1.512072483655178), (0, 0.6847371291858746)] [(0, 1.096651416829605), (0, -1.5120724837465005), (0, 0.6847371291858746)] Thank you. Best, Leon ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] The "confID" for "MMFFOptimizeMoleculeConfs"
Hi guys, Does the "confID" argument actually work for "MMFFOptimizeMoleculeConfs"? Try the following code: from rdkit import Chem from rdkit.Chem import AllChem mh = Chem.AddHs(Chem.MolFromSmiles('OCCCN')) cids = AllChem.EmbedMultipleConfs(mh, numConfs=3, maxAttempts=1000, pruneRmsThresh=0.5, numThreads=0, randomSeed=-1) # try to optimize one conformer at a time in the loop: for cid in cids: mmffopt_1 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=cid, maxIters=1000, mmffVariant='MMFF94s', numThreads=0) print(mmffopt_1) # just optimize one specific conformer (ID = 0): mmffopt_2 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=0, maxIters=1000, mmffVariant='MMFF94s', numThreads=0) print(mmffopt_2) # Or optimize all conformers: mmffopt_3 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=-1, maxIters=1000, mmffVariant='MMFF94s', numThreads=0) print(mmffopt_3) In the document for MMFFOptimizeMoleculeConfs: "confId : indicates which conformer to optimize". However, in all three cases, it still optimize all conformers and give me the "whole" thing: [(0, 1.0966514172064503), (0, -1.5120724826923375), (0, 0.6847373779429624)] [(0, 1.0966514171119535), (0, -1.512072483200475), (0, 0.6847373779078172)] [(0, 1.0966514168939838), (0, -1.5120724834832924), (0, 0.6847373779001575)] [(0, 1.0966514168498929), (0, -1.512072483655178), (0, 0.6847371291858746)] [(0, 1.096651416829605), (0, -1.5120724837465005), (0, 0.6847371291858746)] Thank you. Best, Leon ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] assign all bond directions in SMILES
Hi all, Is there any way to assign all bond directions (E/Z stereochemistry) to the output SMILES string? For example, here's a structure: >>> mol = Chem.MolFromSmiles(r"F/C(Cl)=C(O)/N") >>> Chem.MolToSmiles(mol) 'N/C(O)=C(/F)Cl' It's a minimal definition, in that I could have specified the directions for all of the bonds: >>> mol = Chem.MolFromSmiles(r"F/C(/Cl)=C(\O)/N") >>> Chem.MolToSmiles(mol) 'N/C(O)=C(/F)Cl' Note that RDKit figured out which bond directions were minimal. The underlying code checks for conflicting assignments: >>> mol = Chem.MolFromSmiles(r"F/C(/Cl)=C(/O)/N") [18:25:25] Conflicting single bond directions around double bond at index 2. [18:25:25] BondStereo set to STEREONONE and single bond directions set to NONE. >>> Chem.MolToSmiles(mol) 'NC(O)=C(F)Cl' What I want is some way to go from N/C(O)=C(/F)Cl to a fully specified F/C(/Cl)=C(\O)/N Andrew da...@dalkescientific.com ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Folding count vectors
Hello Francois, I am trying to replicate some of the functionality of CreateDifferenceFingerprintForReaction [Ref 1] for my own understanding on how the code works. The function CreateDifferenceFingerprintForReaction allows for three difference fingerprint representation of the molecules: AtomPair, Morgan, and TopologicalTorsion [Ref 2]. All three are count vectors [Ref 3], and the function allows for variable fingerprint size output. I was following this post [Ref 4] describing how to create reaction difference fingerprints using different fingerprints representation. Using the code from the post I can create reaction difference fingerprints using either Morgan or AtomPair, but comparing the output from the post [Ref 4] to CreateDifferenceFingerprintForReaction results in different size fingerprints, with different values within the fingerprint, and different densities. I am assuming this due to folding the count vector down to the default fingerprint size of 2048. Example code snippet: # The below defs are from the post https://sourceforge.net/p/rdkit/mailman/message/35240736/ from rdkit import Chem from rdkit.Chem import AllChem from rdkit import DataStructs import copy def _createFP(mol,maxSize,fpType='AP'): mol.UpdatePropertyCache(False) if fpType == 'AP': return AllChem.GetAtomPairFingerprint(mol, minLength=1, maxLength=maxSize) else: Chem.GetSSSR(mol) rinfo = mol.GetRingInfo() return AllChem.GetMorganFingerprint(mol, radius=maxSize) def getSumFps(fps): summedFP = copy.deepcopy(fps[0]) for fp in fps[1:]: summedFP += fp return summedFP def buildReactionFP(rxn, maxSize=3, fpType='AP'): reactants = rxn.GetReactants() products = rxn.GetProducts() rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in reactants]) pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in products]) return pFP-rFP >>> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1C1>>[N:1]C1C1' , useSmiles=True) >>> rxfp1 = buildReactionFP(rxn1,maxSize=2) >>> rxfp1.GetNonzeroElements() {558114: -2, 574497: -1, 1066050: 2, 1066081: 1} >>> rxfp1.GetLength() 8388608 # Same reaction now using CreateDifferenceFingerprintForReaction >>> rxn1_fp = AllChem.CreateDifferenceFingerprintForReaction(rxn1) >>> rxn1_fp.GetNonzeroElements() {1048: 10, 1310: -20, 1325: 20, 1372: -10, 1390: 20, 1692: -10, 1757: -20, 1772: 10} >>> print(rxn1_fp.GetLength(),rxfp1.GetLength()) 2048 8388608 References 1. https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html#rdkit.Chem.rdChemReactions.CreateDifferenceFingerprintForReaction 2. https://www.rdkit.org/docs/cppapi/structRDKit_1_1ReactionFingerprintParams.html 3. https://www.rdkit.org/docs/GettingStartedInPython.html#morgan-fingerprints-circular-fingerprints 4. https://sourceforge.net/p/rdkit/mailman/message/35240736/ v/r, Ben On Mon, Nov 18, 2019 at 10:13 PM Francois Berenger wrote: > On 19/11/2019 03:34, Benjamin Datko wrote: > > Hello all, > > > > I am curious on how to fold a count vector fingerprint. I understand > > when folding bit vectors the most common way is to split the vector in > > half, and apply a bitwise OR operation. I think this is how the > > function rdkit.DataStructs.FoldFingerprint works in RDKit, correct me > > if I am wrong. > > > > How does RDKit and or what is the appropriate way to fold count > > vectors such as AtomPair, Morgan, and Topological torsion? > > Can you give us some context? Why do you want to do that? > > Maybe, you can use the following in order to create > shorter "fingerprints" for which the Tanimoto distance is > still computable (despite becoming approximate then): > > --- > Shrivastava, A. (2016). > Simple and efficient weighted minwise hashing. > In Advances in Neural Information Processing Systems (pp. 1498-1506). > > > https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf > --- > > Regards, > F. > > > I thought about turning the fingerprint into a bit vector using their > > respected "AsBitVect" method then folding using > > rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't > > have a "AsBitVect" method > > [https://www.rdkit.org/docs/GettingStartedInPython.html]. > > > > For an explicit example using AtomPair fingerprint we can see the > > fingerprint is extremely sparse. Could this AtomPair fingerprint be > > folded to increase the density? > > > from rdkit import Chem > > > from rdkit.Chem import AllChem > > > mol = Chem.MolFromSmiles('CC1C1') > ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1, > > maxLength=3) > > > number_of_nonzero_elements = > > len(ap_fp.GetNonzeroElements().values()) > > > print((ap_fp.GetLength(),number_of_nonzero_elements)) > > (8388608,9) > > > > Very Respectfully, > > > > Ben > > ___ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > >
Re: [Rdkit-discuss] Anaconda installation without hard dependency on Intel MKl (windows)
Hi all, In the last couple days there has been increased foucs on this on certain tech/social media sites (MKL crippling Ryzen) for example matlab is also affected. Some of you might have seen it but there seems to be a very simple workaround to get MKL to run properly on AMD Ryzen. One simply needs to create a system environment variable MKL_DEBUG_CPU_TYPE=5 And then anything using MKL will use AVX2 code path (if applicable) and run much faster. faster than with openblas. Again, no extensive testing done. But this would be in my opinion the simplest workaround. Best Regards, Thomas ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss