On Feb 3, 2016, at 6:42 AM, Greg Landrum wrote:
> 1) in the code you have this snippet:
> # This gives: c1ccc(nc1)-n1ncc2ccc(nc21)C1CC1
> # That SMILES appears to be incorrect!
> Why do you think that's true?

I was incorrect in saying "incorrect". I should have said "not canonical". I 
expect the default output from MolToSmiles() to be canonical.

>>> Chem.MolToSmiles(Chem.MolFromSmiles("c1ccc(nc1)-n1ncc2ccc(nc21)C1CC1"))
'c1ccc(-n2ncc3ccc(C4CC4)nc32)nc1'
>>> Chem.MolToSmiles(Chem.MolFromSmiles("c1ccc(nc1)-n1ncc2ccc(nc21)C1CC1"), 
>>> isomericSmiles=True)
'c1ccc(-n2ncc3ccc(C4CC4)nc32)nc1'



> 2) If you add a call to Chem.SanitizeMol(hydrogren_mol) before any of the 
> calls to SMILES generation, it clears up the problem. The calls to 
> SetNumExplicitHs() are not necessary.

I think the calls to SetNumExplicitHs() are still necessary. Consider C4PN 
where I want to replace the carbons with hydrogens. How is SanitizeMol() 
supposed to know that the P is 5-valent, not 3-valent?

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("NP(C)(C)(C)C")
>>> emol = Chem.EditableMol(mol)
>>> emol.RemoveBond(1, 2)
>>> emol.RemoveBond(1, 3)
>>> emol.RemoveBond(1, 4)
>>> emol.RemoveBond(1, 5)
>>> mol2 = emol.GetMol()
>>> Chem.SanitizeMol(mol2)
rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>> Chem.MolToSmiles(mol2)
'C.C.C.C.NP'

But the P in the SMILES "NP" is 3-valent:

>>> mol3 = Chem.MolFromSmiles("NP([H])([H])([H])[H]")
>>> Chem.MolToSmiles(mol3)
'N[PH4]'
>>> mol4 = Chem.MolFromSmiles("NP([H])[H]")
>>> Chem.MolToSmiles(mol4)
'NP'

What I want is to have mol2 generate "C.C.C.C.N[PH4]", and the only way I 
figured out how to do that was by calling SetNumExplicitHs():

>>> mol = Chem.MolFromSmiles("NP(C)(C)(C)C")
>>> emol = Chem.EditableMol(mol)
>>> emol.RemoveBond(1, 2)
>>> emol.RemoveBond(1, 3)
>>> emol.RemoveBond(1, 4)
>>> emol.RemoveBond(1, 5)
>>> mol2 = emol.GetMol()
>>> mol2.GetAtomWithIdx(1).SetNumExplicitHs(4)
>>> Chem.MolToSmiles(mol2)
'C.C.C.C.N[PH4]'


> 3) I suspect that you should be using Chem.FragmentOnBonds(). It's likely 
> more efficient than what you're currently doing.

Thank you for pointing out that function. I have only seen it in passing, but 
not looked at it.

It is close to what I want, but it too produces a 3-valent P:

>>> mol = Chem.MolFromSmiles("NP(C)(C)(C)C")
>>> mol2 = Chem.FragmentOnBonds(mol, [1,2,3,4], addDummies=False)
>>> Chem.MolToSmiles(mol2)
'C.C.C.C.NP'


https://github.com/rdkit/rdkit/issues/511 ("FragmentOnBonds() producing 
incorrect chirality") suggests that it has the same chirality issue I ran into 
in my previous email. This would not prevent me from using the function as I 
can still fix the chirality flag after the fact.

By the way, the 'cutsPerAtom' parameter doesn't seem to do anything:

>>> mol = Chem.MolFromSmiles("NP(C)(C)(C)C")
>>> x = []
>>> mol2 = Chem.FragmentOnBonds(mol, [1,2,3], addDummies=True, cutsPerAtom=x)
>>> x
[]
>>> Chem.MolToSmiles(mol2)
'[*]C.[*]C.[*]C.[*]P([*])([*])(C)N'

Cheers,

                                Andrew
                                [email protected]



------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to