Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing

2019-12-04 Thread Greg Landrum
On Tue, Dec 3, 2019 at 9:58 PM Rasmus "Termo" Lundsgaard <
termope...@gmail.com> wrote:

> Hi Greg.
>
> Thks for your gist. I guess the line:
>
> nbrs = [(x.GetOtherAtomIdx(1),x.GetIdx()) for x in atom.GetBonds()]
>
> should be:
> nbrs = [(x.GetOtherAtomIdx(atom.GetIdx()),x.GetIdx()) for x in 
> atom.GetBonds()]
>
>
You're absolutely right, thanks for catching that. I  updated the gist.

-greg



>
> On Tue, Dec 3, 2019 at 5:38 PM Greg Landrum 
> wrote:
>
>> What's going on here is that the RDKit defines stereochemistry based on
>> the ordering of bonds, not atom indices.
>> This has come up on the list multiple times, a relatively recent instance
>> is here:
>>
>> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg08955.html
>>
>> Here's a gist that I have laying around that may help here:[1]
>> https://gist.github.com/greglandrum/9f0e068e53171174b6797348eca64b3e
>>
>>
>> -greg
>> [1] Now if only I could find *why* have that gist laying around...
>>
>>
>> On Tue, Dec 3, 2019 at 3:32 PM Rasmus "Termo" Lundsgaard <
>> termope...@gmail.com> wrote:
>>
>>> Hi Pablo,
>>>
>>> thank you for the heads up on that removeHs is not honorred when not
>>> sanitizing (and that removeH has to be done to solve that issue here).
>>>
>>> Now I tried with the same molecule but where I also move around on the
>>> order of the atoms (attached as Ran1_neworder.sdf), and here I still get a
>>> different isomeric smiles, eventhough the chiral tag is the same:
>>> for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']:
>>> m = Chem.MolFromMolFile(f, sanitize=False)
>>> m = Chem.RemoveHs(m, sanitize=False)
>>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m),
>>> isomericSmiles=True) )
>>>
>>>
>>> C[C@@H](N)C(=O)O
>>> C[C@@H](N)C(=O)O
>>> C[C@H](N)C(=O)O
>>>
>>>
>>> On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco 
>>> wrote:
>>>
 Hi Rasmus,

 the problem is that, as stated in the rdmolfiles.MolFromMolFile()
 docs, the removeHs option is only honored when sanitize is True.

 So to obtain sensible results without sanitizing you should rather do
 something like:

 m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False)
 m1 = Chem.RemoveHs(m1, sanitize=False)
 print( Chem.MolToSmiles(set_correct_Chiral_flags(m1),
 isomericSmiles=True) )
 m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False)
 m2 = Chem.RemoveHs(m2, sanitize=False)
 print( Chem.MolToSmiles(set_correct_Chiral_flags(m2),
 isomericSmiles=True) )

 You may check the individual sanitization operations here:

 https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags

 Cheers,
 p.

 On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote:

 Hi all

 I would like to avoid sanitizing the sdf files, as information in these
 files should be seen as the ground truth.

 I however have some problems in figuring out how to read and set chiral
 information from the file and also have RDkit behave the same always.
 Attached are two sdf files with no 3d information and only stereo
 information in the atoms section for R-Aniline. The only difference as I
 see it is the order of the lines of the bond information.
 Even so I get two different smiles back with isomeric information when
 not sanitizing.

 Attached is also the minimal python code: which for me at least outputs:

 not setting chiral flags
> CC(N)C(=O)O
> CC(N)C(=O)O
>
> setting chiral flags
> [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H]
> [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]
>
> setting chiral flags and sanitize
> C[C@@H](N)C(=O)O
> C[C@@H](N)C(=O)O
>

 Any ideas to why this happens and how I can handle it strictly. Also
 what does the sanitizing exactly do?

 Regards Rasmus



 ___
 Rdkit-discuss mailing 
 listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss

 ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing

2019-12-03 Thread Rasmus "Termo" Lundsgaard
Hi Greg.

Thks for your gist. I guess the line:

nbrs = [(x.GetOtherAtomIdx(1),x.GetIdx()) for x in atom.GetBonds()]

should be:
nbrs = [(x.GetOtherAtomIdx(atom.GetIdx()),x.GetIdx()) for x in atom.GetBonds()]


On Tue, Dec 3, 2019 at 5:38 PM Greg Landrum  wrote:

> What's going on here is that the RDKit defines stereochemistry based on
> the ordering of bonds, not atom indices.
> This has come up on the list multiple times, a relatively recent instance
> is here:
>
> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg08955.html
>
> Here's a gist that I have laying around that may help here:[1]
> https://gist.github.com/greglandrum/9f0e068e53171174b6797348eca64b3e
>
>
> -greg
> [1] Now if only I could find *why* have that gist laying around...
>
>
> On Tue, Dec 3, 2019 at 3:32 PM Rasmus "Termo" Lundsgaard <
> termope...@gmail.com> wrote:
>
>> Hi Pablo,
>>
>> thank you for the heads up on that removeHs is not honorred when not
>> sanitizing (and that removeH has to be done to solve that issue here).
>>
>> Now I tried with the same molecule but where I also move around on the
>> order of the atoms (attached as Ran1_neworder.sdf), and here I still get a
>> different isomeric smiles, eventhough the chiral tag is the same:
>> for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']:
>> m = Chem.MolFromMolFile(f, sanitize=False)
>> m = Chem.RemoveHs(m, sanitize=False)
>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m),
>> isomericSmiles=True) )
>>
>>
>> C[C@@H](N)C(=O)O
>> C[C@@H](N)C(=O)O
>> C[C@H](N)C(=O)O
>>
>>
>> On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco 
>> wrote:
>>
>>> Hi Rasmus,
>>>
>>> the problem is that, as stated in the rdmolfiles.MolFromMolFile() docs,
>>> the removeHs option is only honored when sanitize is True.
>>>
>>> So to obtain sensible results without sanitizing you should rather do
>>> something like:
>>>
>>> m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False)
>>> m1 = Chem.RemoveHs(m1, sanitize=False)
>>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m1),
>>> isomericSmiles=True) )
>>> m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False)
>>> m2 = Chem.RemoveHs(m2, sanitize=False)
>>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m2),
>>> isomericSmiles=True) )
>>>
>>> You may check the individual sanitization operations here:
>>>
>>> https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags
>>>
>>> Cheers,
>>> p.
>>>
>>> On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote:
>>>
>>> Hi all
>>>
>>> I would like to avoid sanitizing the sdf files, as information in these
>>> files should be seen as the ground truth.
>>>
>>> I however have some problems in figuring out how to read and set chiral
>>> information from the file and also have RDkit behave the same always.
>>> Attached are two sdf files with no 3d information and only stereo
>>> information in the atoms section for R-Aniline. The only difference as I
>>> see it is the order of the lines of the bond information.
>>> Even so I get two different smiles back with isomeric information when
>>> not sanitizing.
>>>
>>> Attached is also the minimal python code: which for me at least outputs:
>>>
>>> not setting chiral flags
 CC(N)C(=O)O
 CC(N)C(=O)O

 setting chiral flags
 [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H]
 [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]

 setting chiral flags and sanitize
 C[C@@H](N)C(=O)O
 C[C@@H](N)C(=O)O

>>>
>>> Any ideas to why this happens and how I can handle it strictly. Also
>>> what does the sanitizing exactly do?
>>>
>>> Regards Rasmus
>>>
>>>
>>>
>>> ___
>>> Rdkit-discuss mailing 
>>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing

2019-12-03 Thread Greg Landrum
What's going on here is that the RDKit defines stereochemistry based on the
ordering of bonds, not atom indices.
This has come up on the list multiple times, a relatively recent instance
is here:
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg08955.html

Here's a gist that I have laying around that may help here:[1]
https://gist.github.com/greglandrum/9f0e068e53171174b6797348eca64b3e


-greg
[1] Now if only I could find *why* have that gist laying around...


On Tue, Dec 3, 2019 at 3:32 PM Rasmus "Termo" Lundsgaard <
termope...@gmail.com> wrote:

> Hi Pablo,
>
> thank you for the heads up on that removeHs is not honorred when not
> sanitizing (and that removeH has to be done to solve that issue here).
>
> Now I tried with the same molecule but where I also move around on the
> order of the atoms (attached as Ran1_neworder.sdf), and here I still get a
> different isomeric smiles, eventhough the chiral tag is the same:
> for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']:
> m = Chem.MolFromMolFile(f, sanitize=False)
> m = Chem.RemoveHs(m, sanitize=False)
> print( Chem.MolToSmiles(set_correct_Chiral_flags(m),
> isomericSmiles=True) )
>
>
> C[C@@H](N)C(=O)O
> C[C@@H](N)C(=O)O
> C[C@H](N)C(=O)O
>
>
> On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco 
> wrote:
>
>> Hi Rasmus,
>>
>> the problem is that, as stated in the rdmolfiles.MolFromMolFile() docs,
>> the removeHs option is only honored when sanitize is True.
>>
>> So to obtain sensible results without sanitizing you should rather do
>> something like:
>>
>> m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False)
>> m1 = Chem.RemoveHs(m1, sanitize=False)
>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m1),
>> isomericSmiles=True) )
>> m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False)
>> m2 = Chem.RemoveHs(m2, sanitize=False)
>> print( Chem.MolToSmiles(set_correct_Chiral_flags(m2),
>> isomericSmiles=True) )
>>
>> You may check the individual sanitization operations here:
>>
>> https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags
>>
>> Cheers,
>> p.
>>
>> On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote:
>>
>> Hi all
>>
>> I would like to avoid sanitizing the sdf files, as information in these
>> files should be seen as the ground truth.
>>
>> I however have some problems in figuring out how to read and set chiral
>> information from the file and also have RDkit behave the same always.
>> Attached are two sdf files with no 3d information and only stereo
>> information in the atoms section for R-Aniline. The only difference as I
>> see it is the order of the lines of the bond information.
>> Even so I get two different smiles back with isomeric information when
>> not sanitizing.
>>
>> Attached is also the minimal python code: which for me at least outputs:
>>
>> not setting chiral flags
>>> CC(N)C(=O)O
>>> CC(N)C(=O)O
>>>
>>> setting chiral flags
>>> [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H]
>>> [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]
>>>
>>> setting chiral flags and sanitize
>>> C[C@@H](N)C(=O)O
>>> C[C@@H](N)C(=O)O
>>>
>>
>> Any ideas to why this happens and how I can handle it strictly. Also what
>> does the sanitizing exactly do?
>>
>> Regards Rasmus
>>
>>
>>
>> ___
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing

2019-12-03 Thread Rasmus "Termo" Lundsgaard
Hi Pablo,

thank you for the heads up on that removeHs is not honorred when not
sanitizing (and that removeH has to be done to solve that issue here).

Now I tried with the same molecule but where I also move around on the
order of the atoms (attached as Ran1_neworder.sdf), and here I still get a
different isomeric smiles, eventhough the chiral tag is the same:
for f in ['Ran1.sdf','Ran2.sdf', 'Ran1_neworder.sdf']:
m = Chem.MolFromMolFile(f, sanitize=False)
m = Chem.RemoveHs(m, sanitize=False)
print( Chem.MolToSmiles(set_correct_Chiral_flags(m),
isomericSmiles=True) )


C[C@@H](N)C(=O)O
C[C@@H](N)C(=O)O
C[C@H](N)C(=O)O


On Tue, Dec 3, 2019 at 2:58 PM Paolo Tosco 
wrote:

> Hi Rasmus,
>
> the problem is that, as stated in the rdmolfiles.MolFromMolFile() docs,
> the removeHs option is only honored when sanitize is True.
>
> So to obtain sensible results without sanitizing you should rather do
> something like:
>
> m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False)
> m1 = Chem.RemoveHs(m1, sanitize=False)
> print( Chem.MolToSmiles(set_correct_Chiral_flags(m1), isomericSmiles=True)
> )
> m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False)
> m2 = Chem.RemoveHs(m2, sanitize=False)
> print( Chem.MolToSmiles(set_correct_Chiral_flags(m2), isomericSmiles=True)
> )
>
> You may check the individual sanitization operations here:
>
> https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags
>
> Cheers,
> p.
>
> On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote:
>
> Hi all
>
> I would like to avoid sanitizing the sdf files, as information in these
> files should be seen as the ground truth.
>
> I however have some problems in figuring out how to read and set chiral
> information from the file and also have RDkit behave the same always.
> Attached are two sdf files with no 3d information and only stereo
> information in the atoms section for R-Aniline. The only difference as I
> see it is the order of the lines of the bond information.
> Even so I get two different smiles back with isomeric information when not
> sanitizing.
>
> Attached is also the minimal python code: which for me at least outputs:
>
> not setting chiral flags
>> CC(N)C(=O)O
>> CC(N)C(=O)O
>>
>> setting chiral flags
>> [H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H]
>> [H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]
>>
>> setting chiral flags and sanitize
>> C[C@@H](N)C(=O)O
>> C[C@@H](N)C(=O)O
>>
>
> Any ideas to why this happens and how I can handle it strictly. Also what
> does the sanitizing exactly do?
>
> Regards Rasmus
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


Ran1_neworder.sdf
Description: StarMath document
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] handling of stereo information from mol files when not sanitizing

2019-12-03 Thread Paolo Tosco

Hi Rasmus,

the problem is that, as stated in the |rdmolfiles.||MolFromMolFile|() 
docs, the removeHs option is only honored when sanitize is True.


So to obtain sensible results without sanitizing you should rather do 
something like:


m1 = Chem.MolFromMolFile('Ran1.sdf', sanitize=False)
m1 = Chem.RemoveHs(m1, sanitize=False)
print( Chem.MolToSmiles(set_correct_Chiral_flags(m1), isomericSmiles=True) )
m2 = Chem.MolFromMolFile('Ran2.sdf', sanitize=False)
m2 = Chem.RemoveHs(m2, sanitize=False)
print( Chem.MolToSmiles(set_correct_Chiral_flags(m2), isomericSmiles=True) )

You may check the individual sanitization operations here:
https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html?highlight=rdmolops%20sanitizeflags#rdkit.Chem.rdmolops.SanitizeFlags

Cheers,
p.

On 03/12/2019 12:46, Rasmus "Termo" Lundsgaard wrote:

Hi all

I would like to avoid sanitizing the sdf files, as information in 
these files should be seen as the ground truth.


I however have some problems in figuring out how to read and set 
chiral information from the file and also have RDkit behave the same 
always. Attached are two sdf files with no 3d information and only 
stereo information in the atoms section for R-Aniline. The only 
difference as I see it is the order of the lines of the bond information.
Even so I get two different smiles back with isomeric information when 
not sanitizing.


Attached is also the minimal python code: which for me at least outputs:

not setting chiral flags
CC(N)C(=O)O
CC(N)C(=O)O

setting chiral flags
[H]OC(=O)[C@]([H])(N([H])[H])C([H])([H])[H]
[H]OC(=O)[C@@]([H])(N([H])[H])C([H])([H])[H]

setting chiral flags and sanitize
C[C@@H](N)C(=O)O
C[C@@H](N)C(=O)O


Any ideas to why this happens and how I can handle it strictly. Also 
what does the sanitizing exactly do?


Regards Rasmus



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss