Toby,

I just realized that I never replied to point out that this one has been
fixed:
https://github.com/rdkit/rdkit/issues/82

-greg



On Fri, Aug 16, 2013 at 5:21 PM, Toby Wright <toby.wri...@inhibox.com>wrote:

> Thanks for looking into this Greg. I'm skipping the sanitization step for
> a different reason, I want to catch and processes Sanitization errors at a
> different point in my program flow to when I load molecules, but for most
> of the molecules I need to process I can work around this requirement - so
> I will. How it represented hydrogens was just a byproduct of the simplest
> example I could create to illustrate the issue.
>
> Thanks,
>
> Toby
>
> --
> InhibOx Ltd
>
>
> On 14 August 2013 16:19, Greg Landrum <greg.land...@gmail.com> wrote:
>
>> Hi Toby,
>>
>> On Wed, Aug 14, 2013 at 2:00 PM, Toby Wright <toby.wri...@inhibox.com>wrote:
>>
>>> Hi,
>>>
>>> I think the following behaviour is a bug but feel free to correct me. I
>>> have an SD file (attached) with two stereoisomers of alanine (built by
>>> openbabel from the smiles). I want to read it and write it's contents as
>>> isomeric smiles. I execute the following:
>>>
>>> import rdkit
>>> from rdkit import Chem
>>>
>>> smiles_writer = Chem.SmilesWriter("ChiralTest.smi", includeHeader=False,
>>> isomericSmiles=True)
>>> suppl = Chem.SDMolSupplier("ChiralTest3D.sdf", sanitize=False)
>>> for mol in suppl:
>>>     Chem.SanitizeMol(mol)
>>>     smiles_writer.write(mol)
>>>
>>> smiles_writer.flush()
>>> smiles_writer.close()
>>>
>>> smiles_writer2 = Chem.SmilesWriter("ChiralTest2.smi",
>>> includeHeader=False, isomericSmiles=True)
>>> suppl2 = Chem.SDMolSupplier("ChiralTest3D.sdf", sanitize=True)
>>> for mol in suppl2:
>>>     smiles_writer2.write(mol)
>>>
>>> smiles_writer2.flush()
>>> smiles_writer2.close()
>>>
>>> The file ChiralTest.smi now contains:
>>> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] L-alanine
>>> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H] D-alanine
>>>
>>> and ChiralTest2.smi contains:
>>> C[C@H](N)C(=O)O L-alanine
>>> C[C@@H](N)C(=O)O D-alanine
>>>
>>>
>>> My question is why do I get different outputs depending on when
>>> sanitization was performed?
>>>
>>
>> It's a bug, as you correctly assumed.
>>
>> It's actually not the sanitization step per se. If that were the case,
>> this would work:
>> In [14]: s = Chem.SDMolSupplier('ChiralTest3D.sdf',sanitize=False)
>>
>> In [15]: for m in s: Chem.SanitizeMol(m)
>>
>> In [16]: for m in s: print Chem.MolToSmiles(m,True)
>> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H]
>> [H]OC(=O)C([H])(N([H])[H])C([H])([H])[H]
>>
>> There is a step in the mol file parser that handles the stereochemistry
>> information from the CTAB. This step is only called if you do santization.
>> That is (probably) fixable; I'll definitely look into it.
>>
>> In the meantime, if the only reason you are skipping the sanitization
>> step is to avoid having hydrogens removed, you can avoid that as follows:
>>
>> In [18]: s = Chem.SDMolSupplier('ChiralTest3D.sdf',removeHs=False)
>>
>> In [19]: for m in s: print Chem.MolToSmiles(m,True)
>> [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]
>> [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H]
>>
>> -greg
>>
>>
>
------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to