Re: [Rdkit-discuss] canonical SMILES of a fragment
Hi Pavel, It is, unfortunately, not that easy. The canonicalization algorithm does not use atomic aromaticity when determining atom ordering, so as far as it is concerned there is no difference between atoms 0 and 2 in either of your examples. What does get used is the number of hydrogens, so you need to use that in order to get the results you are looking for.[1] For technical reasons, you also need to tell the RDKit that the atoms should not have implicit Hs attached. Here's a gist that works for me: https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843 Two notes: 1) I don't set the number of Hs on atom 1 in that gist, but I would suggest doing that too. 2) If atoms 0 and 2 have the same number of Hs attached, this still is not going to work if you're building things from fragments. The canonicalization code was not really designed to be used in situations like this. -greg [1] The details of the canonicalization algorithm, including the contents of the atom invariants, are described here: http://dx.doi.org/10.1021/acs.jcim.5b00543 On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchukwrote: > Hi all, > > canonicalization of fragment SMILES does not work properly. Below there > are two examples of identical fragments. The only difference is the order > of atoms (indices). However, it seems that RDKit canonicalization does not > take into account atom types. > > Does someone have an idea how to solve this issue with small losses? > > #1 === > > m = RWMol() > > for i in range(3): > a = Atom(6) > m.AddAtom(a) > a = Atom(0) > m.AddAtom(a) > > m.GetAtomWithIdx(0).SetIsAromatic(True) # set atom 0 as aromatic > m.GetAtomWithIdx(3).SetAtomMapNum(1) > > > m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) > m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) > m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) > > Chem.MolToSmiles(m) > > OUTPUT: 'cC(C)[*:1]' > > #2 === > > m2 = RWMol() > > for i in range(3): > a = Atom(6) > m2.AddAtom(a) > a = Atom(0) > m2.AddAtom(a) > > m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic > m2.GetAtomWithIdx(3).SetAtomMapNum(1) > > > m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) > m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) > m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) > > Chem.MolToSmiles(m2) > > OUTPUT: 'CC(c)[*:1]' > > > Pavel. > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using inchikey as entry
I believe InchiKey uses a 1 way hash (sha-256), so what you are asking for is basically impossible. That is, to go from InchiKey to molecule requires already having a table of molecules corresponding to the InchiKeys. There are various services online that have such lookup tables for a large number of molecules (e.g. NCI CADD resolver, PubChem). -David > On Aug 1, 2017, at 9:05 PM, Kazmierczak Stéphane> wrote: > > Hello, I would like to draw molecules with rdkit, but only have inchikeys. > I compiled rdkit with inchi support but it seems that I can only output > inchikeys but not import them. > > Are there any api function that I am missing ? > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! > http://sdm.link/slashdot___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Using inchikey as entry
Hello, I would like to draw molecules with rdkit, but only have inchikeys. I compiled rdkit with inchi support but it seems that I can only output inchikeys but not import them. Are there any api function that I am missing ? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] canonical SMILES of a fragment
Hi all, canonicalization of fragment SMILES does not work properly. Below there are two examples of identical fragments. The only difference is the order of atoms (indices). However, it seems that RDKit canonicalization does not take into account atom types. Does someone have an idea how to solve this issue with small losses? #1 === m = RWMol() for i in range(3): a = Atom(6) m.AddAtom(a) a = Atom(0) m.AddAtom(a) m.GetAtomWithIdx(0).SetIsAromatic(True) # set atom 0 as aromatic m.GetAtomWithIdx(3).SetAtomMapNum(1) m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) Chem.MolToSmiles(m) OUTPUT: 'cC(C)[*:1]' #2 === m2 = RWMol() for i in range(3): a = Atom(6) m2.AddAtom(a) a = Atom(0) m2.AddAtom(a) m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic m2.GetAtomWithIdx(3).SetAtomMapNum(1) m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE) m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE) m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE) Chem.MolToSmiles(m2) OUTPUT: 'CC(c)[*:1]' Pavel. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] kekulize AllChem.CanonSmiles error and workaround
When I cut the bonds in the daylight implementation, I add new single bonds to xenon atoms – this means the ‘imidazole’ fragment would be [Xe]n1ccnc1, which *is* a valid smiles. The following python code could then be used to convert these Xe atoms to hydrogen, from rdkit import Chem HYDROGEN = 1 XENON = 54 m = Chem.MolFromSmiles('[Xe]n1ccnc1') for a in m.GetAtoms(): if a.GetAtomicNum() == XENON: a.SetAtomicNum(HYDROGEN) n = Chem.RemoveHs(m) print Chem.MolToSmiles(n) hopefully that is useful. Please fire away if you have any further questions or would like help with any other aspects of this – it will be great to have an RDKit version available for people to use ☺ Rich From: Konrad Koehler [mailto:konrad.koeh...@icloud.com] Sent: 01 August 2017 05:29 To: rdkit-discuss@lists.sourceforge.net Subject: [Rdkit-discuss] kekulize AllChem.CanonSmiles error and workaround Hi, I am having trouble canonicalizing smiles with ambiguous heteroaromatic tautomers such as imidazole. For example: >>> from rdkit import Chem >>> from rdkit.Chem import AllChem >>> smiles = ‘n1cncc1' >>> AllChem.CanonSmiles(smiles) [21:42:52] Can't kekulize mol. Unkekulized atoms: 0 1 2 3 4 As a workaround, one can first canonicalize with Open Babel pybel to remove the ambiguity and then canonicalize with RDKit: >>> import pybel >>> pybel.readstring("smi", "n1cncc1").write("can") 'c1ncc[nH]1\t\n' >>> AllChem.CanonSmiles('c1ncc[nH]1\t\n') 'c1c[nH]cn1’ or in one line: >>> AllChem.CanonSmiles(pybel.readstring("smi", "n1cncc1").write("can")) 'c1c[nH]cn1' It would be nice if RDKit could do this without the assistance of pybel. This problem arose when implementing the algorithm described in the following paper: Hall RJ, Murray CW, Verdonk ML. The Fragment Network: A Chemistry Recommendation Engine Built Using a Graph Database. J Med Chem. 2017; 60(14):6440-50. PMID: 28712298, doi: 10.1021/acs.jmedchem.7b00809 Details of the algorithm are contained in supporting information: http://pubs.acs.org/doi/suppl/10.1021/acs.jmedchem.7b00809/suppl_file/jm7b00809_si_001.pdf The algorithm fragments the molecule at acyclic bonds connected to rings and it is necessary to canonicalize both the parent and child fragments. The algorithm is recursive and fortunately the smiles can be recursively processed by AllChem.CanonSmiles after it has been disambiguated: >>> AllChem.CanonSmiles('c1c[nH]cn1') 'c1c[nH]cn1’ I eventually plan to donate the RDKit Fragment Network script to the community after testing and optimization. Best, Konrad This email and any attachments thereto may contain private, confidential, and privileged material for the sole use of the intended recipient. Any review, copying or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please delete the original and any copies of this email and any attachments thereto and notify the sender immediately. -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss