Thanks for looking into this Greg. I'm skipping the sanitization step for a
different reason, I want to catch and processes Sanitization errors at a
different point in my program flow to when I load molecules, but for most
of the molecules I need to process I can work around this requirement - so
I will. How it represented hydrogens was just a byproduct of the simplest
example I could create to illustrate the issue.
Thanks,
Toby
--
InhibOx Ltd
On 14 August 2013 16:19, Greg Landrum <greg.land...@gmail.com> wrote:
> Hi Toby,
>
> On Wed, Aug 14, 2013 at 2:00 PM, Toby Wright <toby.wri...@inhibox.com>wrote:
>
>> Hi,
>>
>> I think the following behaviour is a bug but feel free to correct me. I
>> have an SD file (attached) with two stereoisomers of alanine (built by
>> openbabel from the smiles). I want to read it and write it's contents as
>> isomeric smiles. I execute the following:
>>
>> import rdkit
>> from rdkit import Chem
>>
>> smiles_writer = Chem.SmilesWriter("ChiralTest.smi", includeHeader=False,
>> isomericSmiles=True)
>> suppl = Chem.SDMolSupplier("ChiralTest3D.sdf", sanitize=False)
>> for mol in suppl:
>> Chem.SanitizeMol(mol)
>> smiles_writer.write(mol)
>>
>> smiles_writer.flush()
>> smiles_writer.close()
>>
>> smiles_writer2 = Chem.SmilesWriter("ChiralTest2.smi",
>> includeHeader=False, isomericSmiles=True)
>> suppl2 = Chem.SDMolSupplier("ChiralTest3D.sdf", sanitize=True)
>> for mol in suppl2:
>> smiles_writer2.write(mol)
>>
>> smiles_writer2.flush()
>> smiles_writer2.close()
>>
>> The file ChiralTest.smi now contains:
>> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] L-alanine
>> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] D-alanine
>>
>> and ChiralTest2.smi contains:
>> C[C@H](N)C(=O)O L-alanine
>> C[C@@H](N)C(=O)O D-alanine
>>
>>
>> My question is why do I get different outputs depending on when
>> sanitization was performed?
>>
>
> It's a bug, as you correctly assumed.
>
> It's actually not the sanitization step per se. If that were the case,
> this would work:
> In [14]: s = Chem.SDMolSupplier('ChiralTest3D.sdf',sanitize=False)
>
> In [15]: for m in s: Chem.SanitizeMol(m)
>
> In [16]: for m in s: print Chem.MolToSmiles(m,True)
> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H]
> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H]
>
> There is a step in the mol file parser that handles the stereochemistry
> information from the CTAB. This step is only called if you do santization.
> That is (probably) fixable; I'll definitely look into it.
>
> In the meantime, if the only reason you are skipping the sanitization step
> is to avoid having hydrogens removed, you can avoid that as follows:
>
> In [18]: s = Chem.SDMolSupplier('ChiralTest3D.sdf',removeHs=False)
>
> In [19]: for m in s: print Chem.MolToSmiles(m,True)
> [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]
> [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H]
>
> -greg
>
>
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss