Re: [Rdkit-discuss] canonical SMILES of a fragment

2017-08-01 Thread Greg Landrum
Hi Pavel,

It is, unfortunately, not that easy.
The canonicalization algorithm does not use atomic aromaticity when
determining atom ordering, so as far as it is concerned there is no
difference between atoms 0 and 2 in either of your examples. What does get
used is the number of hydrogens, so you need to use that in order to get
the results you are looking for.[1] For technical reasons, you also need to
tell the RDKit that the atoms should not have implicit Hs attached. Here's
a gist that works for me:
https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843

Two notes:
 1) I don't set the number of Hs on atom 1 in that gist, but I would
suggest doing that too.
 2) If atoms 0 and 2 have the same number of Hs attached, this still is not
going to work if you're building things from fragments. The
canonicalization code was not really designed to be used in situations like
this.

-greg
[1] The details of the canonicalization algorithm, including the contents
of the atom invariants, are described here:
http://dx.doi.org/10.1021/acs.jcim.5b00543


On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchuk 
wrote:

> Hi all,
>
>   canonicalization of fragment SMILES does not work properly. Below there
> are two examples of identical fragments. The only difference is the order
> of atoms (indices). However, it seems that RDKit canonicalization does not
> take into account atom types.
>
>   Does someone have an idea how to solve this issue with small losses?
>
> #1 ===
>
> m = RWMol()
>
> for i in range(3):
> a = Atom(6)
> m.AddAtom(a)
> a = Atom(0)
> m.AddAtom(a)
>
> m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
> m.GetAtomWithIdx(3).SetAtomMapNum(1)
>
>
> m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
> m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
> m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)
>
> Chem.MolToSmiles(m)
>
> OUTPUT: 'cC(C)[*:1]'
>
> #2 ===
>
> m2 = RWMol()
>
> for i in range(3):
> a = Atom(6)
> m2.AddAtom(a)
> a = Atom(0)
> m2.AddAtom(a)
>
> m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
> m2.GetAtomWithIdx(3).SetAtomMapNum(1)
>
>
> m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
> m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
> m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)
>
> Chem.MolToSmiles(m2)
>
> OUTPUT: 'CC(c)[*:1]'
>
>
> Pavel.
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using inchikey as entry

2017-08-01 Thread David Hall
I believe InchiKey uses a 1 way hash (sha-256), so what you are asking for is 
basically impossible. That is, to go from InchiKey to molecule requires already 
having a table of molecules corresponding to the InchiKeys. There are various 
services online that have such lookup tables for a large number of molecules 
(e.g. NCI CADD resolver, PubChem).

-David


> On Aug 1, 2017, at 9:05 PM, Kazmierczak Stéphane  
> wrote:
> 
> Hello, I would like to draw molecules with rdkit, but only have inchikeys.
> I compiled rdkit with inchi support but it seems that I can only output 
> inchikeys but not import them. 
> 
> Are there any api function that I am missing ?
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! 
> http://sdm.link/slashdot___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Using inchikey as entry

2017-08-01 Thread Kazmierczak Stéphane
Hello, I would like to draw molecules with rdkit, but only have inchikeys.
I compiled rdkit with inchi support but it seems that I can only output
inchikeys but not import them.

Are there any api function that I am missing ?
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] canonical SMILES of a fragment

2017-08-01 Thread Pavel Polishchuk

Hi all,

  canonicalization of fragment SMILES does not work properly. Below 
there are two examples of identical fragments. The only difference is 
the order of atoms (indices). However, it seems that RDKit 
canonicalization does not take into account atom types.


  Does someone have an idea how to solve this issue with small losses?

#1 ===

m = RWMol()

for i in range(3):
a = Atom(6)
m.AddAtom(a)
a = Atom(0)
m.AddAtom(a)

m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
m.GetAtomWithIdx(3).SetAtomMapNum(1)


m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m)

OUTPUT: 'cC(C)[*:1]'

#2 ===

m2 = RWMol()

for i in range(3):
a = Atom(6)
m2.AddAtom(a)
a = Atom(0)
m2.AddAtom(a)

m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
m2.GetAtomWithIdx(3).SetAtomMapNum(1)


m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m2)

OUTPUT: 'CC(c)[*:1]'


Pavel.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] kekulize AllChem.CanonSmiles error and workaround

2017-08-01 Thread Richard Hall
When I cut the bonds in the daylight implementation, I add new single bonds to 
xenon atoms – this means the ‘imidazole’ fragment would be [Xe]n1ccnc1, which 
*is* a valid smiles.  The following python code could then be used to convert 
these Xe atoms to hydrogen,

from rdkit import Chem

HYDROGEN = 1
XENON = 54

m = Chem.MolFromSmiles('[Xe]n1ccnc1')
for a in m.GetAtoms():
if a.GetAtomicNum() == XENON:
a.SetAtomicNum(HYDROGEN)
n = Chem.RemoveHs(m)
print Chem.MolToSmiles(n)

hopefully that is useful.  Please fire away if you have any further questions 
or would like help with any other aspects of this – it will be great to have an 
RDKit version available for people to use ☺
Rich

From: Konrad Koehler [mailto:konrad.koeh...@icloud.com]
Sent: 01 August 2017 05:29
To: rdkit-discuss@lists.sourceforge.net
Subject: [Rdkit-discuss] kekulize AllChem.CanonSmiles error and workaround

Hi,

I am having trouble canonicalizing smiles with ambiguous heteroaromatic 
tautomers such as imidazole. For example:

>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> smiles = ‘n1cncc1'
>>> AllChem.CanonSmiles(smiles)
[21:42:52] Can't kekulize mol.  Unkekulized atoms: 0 1 2 3 4


As a workaround, one can first canonicalize with Open Babel pybel to remove the 
ambiguity and then canonicalize with RDKit:

>>> import pybel
>>> pybel.readstring("smi", "n1cncc1").write("can")
'c1ncc[nH]1\t\n'
>>> AllChem.CanonSmiles('c1ncc[nH]1\t\n')
'c1c[nH]cn1’


or in one line:


>>> AllChem.CanonSmiles(pybel.readstring("smi", "n1cncc1").write("can"))
'c1c[nH]cn1'

It would be nice if RDKit could do this without the assistance of pybel.



This problem arose when implementing the algorithm described in the following 
paper:

Hall RJ, Murray CW, Verdonk ML. The Fragment Network: A Chemistry 
Recommendation Engine Built Using a Graph Database. J Med Chem. 2017; 
60(14):6440-50. PMID: 28712298, doi: 10.1021/acs.jmedchem.7b00809
Details of the algorithm are contained in supporting information:
http://pubs.acs.org/doi/suppl/10.1021/acs.jmedchem.7b00809/suppl_file/jm7b00809_si_001.pdf

The algorithm fragments the molecule at acyclic bonds connected to rings and it 
is necessary to canonicalize both the parent and child fragments. The algorithm 
is recursive and fortunately the smiles can be recursively processed by 
AllChem.CanonSmiles after it has been disambiguated:

>>> AllChem.CanonSmiles('c1c[nH]cn1')
'c1c[nH]cn1’

I eventually plan to donate the RDKit Fragment Network script to the community 
after testing and optimization.

Best,

Konrad
This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying or distribution of this email (or any attachments thereto) by others is 
strictly prohibited. If you are not the intended recipient, please delete the 
original and any copies of this email and any attachments thereto and notify 
the sender immediately.
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss