Hi Fabian,
On Fri, Aug 10, 2012 at 3:21 PM, Fabian Dey <[email protected]> wrote:
>
>
> I am in the process of a scaffold analysis, printing the scaffolds in smiles
> format, and came across some unexpected behaviour:
> Whenever I remove hydrogens with RemoveHs() and print out the smiles string,
> some hydrogens remain attached (irrespective
> if the original input file type is SDF or smiles):
>
>
>
> from rdkit import Chem
> from rdkit.Chem.Scaffolds import MurckoScaffold
>
> # molecule from zinc:
> suppl = Chem.SDMolSupplier("zinc_69443014.sdf");
> for mol in suppl:
> mol = Chem.RemoveHs(mol)
> print 'Mol1: %s' %(Chem.MolToSmiles(mol))
>
> mol = Chem.MolFromSmiles("c1cc(C[NH2+]CC2CNc3ccnn3C2)[nH]n1")
> mol = Chem.RemoveHs(mol)
> print 'Mol2: %s' %(Chem.MolToSmiles(mol))
>
>
> Output:
> Mol1: Cn1nccc1C[NH2+]CC1CNc2ccnn2C1
> Mol2: c1cc(C[NH2+]CC2CNc3ccnn3C2)[nH]n1
The function RemoveHs() goes through the molecular graph and removes
hydrogens that are present as explicit atoms. It doesn't affect the
hydrogen count on any given atom though. Any other behavior would
change the chemistry of the molecule, which is definitely not the
intent.
What are you trying to do? What output would you expect above?
> This is also the case when extracting the carbon scaffold of a molecule
> (even after
> resetting all formal charges to zero)
>
> mol = Chem.MolFromSmiles("c1cc(C[NH2+]CC2CNc3ccnn3C2)[nH]n1")
> mol = Chem.RemoveHs(mol,implicitOnly=False)
> sc = [MurckoScaffold.MakeScaffoldGeneric(mol)]
> print 'Scaffold1: %s' %(Chem.MolToSmiles(sc[0]))
>
>
> mol = Chem.MolFromSmiles("c1cc(C[NH2+]CC2CNc3ccnn3C2)[nH]n1")
> sc2 = [MurckoScaffold.MakeScaffoldGeneric(mol)]
> sc2[0] = Chem.RemoveHs(sc2[0],implicitOnly=False)
> print 'Scaffold2: %s' %(Chem.MolToSmiles(sc2[0]))
>
>
> mol = Chem.MolFromSmiles("c1cc(C[NH2+]CC2CNc3ccnn3C2)[nH]n1")
> mol = Chem.RemoveHs(mol,implicitOnly=False)
> for atom in mol.GetAtoms():
> atom.SetFormalCharge(0)
> sc3 = [MurckoScaffold.MakeScaffoldGeneric(mol)]
> sc3[0] = Chem.RemoveHs(sc3[0],implicitOnly=False)
> print 'Scaffold3: %s' %(Chem.MolToSmiles(sc3[0]))
>
> Output:
> Scaffold1: [CH]1CCCC1C[CH2+]CC1CCC2CCCC2C1
> Scaffold2: [CH]1CCCC1C[CH2+]CC1CCC2CCCC2C1
> Scaffold3: [CH]1CCCC1CCCC1CCC2CCCC2C1
That looks like it's a bug in MurckoScaffold.MakeScaffoldGeneric(); it
should be removing the explict H counts and setting charges to zero.
Thanks for reporting it.
-greg
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss