Thank you Dan; it's done:

https://github.com/rdkit/rdkit/issues/5165#issue-1192683590

brg
Giovanni

From: Dan Nealschneider <dan.nealschnei...@schrodinger.com>
Sent: 04 April 2022 23:48
To: Giovanni Tricarico <giovanni.tricar...@glpg.com>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] issue with V3000 SD files containing enhanced 
stereochemistry information

You don't often get email from 
dan.nealschnei...@schrodinger.com<mailto:dan.nealschnei...@schrodinger.com>. 
Learn why this is important<http://aka.ms/LearnAboutSenderIdentification>
Giovanni -
Thanks for reporting this! I've heard a report like this, but in that instance 
the reporter wasn't able to share the structure. Would you mind creating an 
issue on the RDKit github issues page and attaching your problematic structure? 
Please add it as an attachment - my hunch from those other reports was that 
this is something fishy with whitespace characters.

Thanks!



dan nealschneider | senior staff developer

he/him/his

[Schr?dinger, 
Inc.]<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fschrodinger.com%2F&data=04%7C01%7C%7C6ea31c2cf71a4a13ccf308da1684d4a6%7C627f3c33bccc48bba033c0a6521f7642%7C1%7C0%7C637847057160230113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=9RxfX4jguVWNzwZCpEw6mBApLYCpOm3ZdaFgB8OqvEI%3D&reserved=0>


On Mon, Apr 4, 2022 at 11:02 AM Giovanni Tricarico 
<giovanni.tricar...@glpg.com<mailto:giovanni.tricar...@glpg.com>> wrote:
Hello,
I am trying to process V3000 MolBlock's from some SD files, and I seem to 
encounter issues when enhanced stereochemistry information is present, 
depending on the source of the SD file.

To test that the molecule to SDF and back conversion within rdkit was working 
OK, I ran this code:

    import pandas as pd
    from rdkit import Chem
    from rdkit.Chem import Draw
    from rdkit.Chem import PandasTools

    # 1. convert to molecule a CXSMILES with encoded enhanced stereochemistry
    m = 
Chem.MolFromSmiles('O=C(NC[C@@H]1CC[C@H](C2=CC=CC=C2)O1)N[C@@H]1COC[C@@H]1O 
|&1:4,7,&2:16,20|')

    # check that the V3K molblock contains the enhanced stereochemistry 
information
    print(Chem.MolToV3KMolBlock(m))

    # 2. write the molecule to an SDF
    writer = Chem.SDWriter('m_with_enh_stereo.sdf')
    writer.SetForceV3000(True)
    writer.write(m)
    writer.close()

    # 3. read the molecule back into a list ms
    with Chem.SDMolSupplier('m_with_enh_stereo.sdf') as SDF:
        ms = [m for m in SDF if m is not None]

    # check that the V3000 molblock is OK
    print(Chem.MolToV3KMolBlock(ms[0]))

This worked well.
The content of the SD file made by this script ('m_with_enh_stereo.sdf') was:


     RDKit          2D

  0  0  0  0  0  0  0  0  0  0999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 22 24 0 0 0
M  V30 BEGIN ATOM
M  V30 1 O 7.414605 -6.052405 0.000000 0
M  V30 2 C 6.201079 -6.934083 0.000000 0
M  V30 3 N 4.830761 -6.323978 0.000000 0
M  V30 4 C 4.673969 -4.832195 0.000000 0
M  V30 5 C 3.303650 -4.222090 0.000000 0
M  V30 6 C 2.004612 -4.972090 0.000000 0
M  V30 7 C 0.889895 -3.968394 0.000000 0
M  V30 8 C 1.500000 -2.598076 0.000000 0
M  V30 9 C 0.750000 -1.299038 0.000000 0
M  V30 10 C 1.500000 0.000000 0.000000 0
M  V30 11 C 0.750000 1.299038 0.000000 0
M  V30 12 C -0.750000 1.299038 0.000000 0
M  V30 13 C -1.500000 0.000000 0.000000 0
M  V30 14 C -0.750000 -1.299038 0.000000 0
M  V30 15 O 2.991783 -2.754869 0.000000 0
M  V30 16 N 6.357872 -8.425866 0.000000 0
M  V30 17 C 7.728190 -9.035971 0.000000 0
M  V30 18 C 9.027228 -8.285971 0.000000 0
M  V30 19 O 10.141946 -9.289667 0.000000 0
M  V30 20 C 9.531841 -10.659985 0.000000 0
M  V30 21 C 8.040058 -10.503192 0.000000 0
M  V30 22 O 7.036362 -11.617910 0.000000 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3
M  V30 3 1 3 4
M  V30 4 1 5 4 CFG=3
M  V30 5 1 5 6
M  V30 6 1 6 7
M  V30 7 1 8 7 CFG=3
M  V30 8 1 8 9
M  V30 9 2 9 10
M  V30 10 1 10 11
M  V30 11 2 11 12
M  V30 12 1 12 13
M  V30 13 2 13 14
M  V30 14 1 8 15
M  V30 15 1 2 16
M  V30 16 1 17 16 CFG=3
M  V30 17 1 17 18
M  V30 18 1 18 19
M  V30 19 1 19 20
M  V30 20 1 20 21
M  V30 21 1 21 22 CFG=3
M  V30 22 1 15 5
M  V30 23 1 21 17
M  V30 24 1 14 9
M  V30 END BOND
M  V30 BEGIN COLLECTION
M  V30 MDLV30/STERAC1 ATOMS=(2 5 8)
M  V30 MDLV30/STERAC2 ATOMS=(2 17 21)
M  V30 END COLLECTION
M  V30 END CTAB
M  END
>  <_CXSMILES_Data>  (1)
|&1:4,7,&2:16,20|

$$$$

Then I tried reading an SD file for the exact same molecule, made by some other 
software.
The content of that SD file ('mol_with_enhanced_stereo_2_And_groups.sdf') was:

2 And groups, from CXSMILES
  SciTegic04042214202D

  0  0  0  0  0  0            999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 22 24 0 0 0
M  V30 BEGIN ATOM
M  V30 1 O 7.4146 -6.05241 0 0
M  V30 2 C 6.20108 -6.93408 0 0
M  V30 3 N 4.83076 -6.32398 0 0
M  V30 4 C 4.67397 -4.83219 0 0
M  V30 5 C 3.30365 -4.22209 0 0 CFG=2
M  V30 6 C 2.00461 -4.97209 0 0
M  V30 7 C 0.8899 -3.96839 0 0
M  V30 8 C 1.5 -2.59808 0 0 CFG=2
M  V30 9 C 0.75 -1.29904 0 0
M  V30 10 C 1.5 0 0 0
M  V30 11 C 0.75 1.29904 0 0
M  V30 12 C -0.75 1.29904 0 0
M  V30 13 C -1.5 0 0 0
M  V30 14 C -0.75 -1.29904 0 0
M  V30 15 O 2.99178 -2.75487 0 0
M  V30 16 N 6.35787 -8.42587 0 0
M  V30 17 C 7.72819 -9.03597 0 0 CFG=2
M  V30 18 C 9.02723 -8.28597 0 0
M  V30 19 O 10.14195 -9.28967 0 0
M  V30 20 C 9.53184 -10.65999 0 0
M  V30 21 C 8.04006 -10.50319 0 0 CFG=2
M  V30 22 O 7.03636 -11.61791 0 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 2 1 2
M  V30 2 1 2 3
M  V30 3 1 3 4
M  V30 4 1 5 4 CFG=3
M  V30 5 1 5 6
M  V30 6 1 6 7
M  V30 7 1 8 7 CFG=3
M  V30 8 1 8 9
M  V30 9 2 9 10
M  V30 10 1 10 11
M  V30 11 2 11 12
M  V30 12 1 12 13
M  V30 13 2 13 14
M  V30 14 1 8 15
M  V30 15 1 2 16
M  V30 16 1 17 16 CFG=3
M  V30 17 1 17 18
M  V30 18 1 18 19
M  V30 19 1 19 20
M  V30 20 1 20 21
M  V30 21 1 21 22 CFG=3
M  V30 22 1 15 5
M  V30 23 1 21 17
M  V30 24 1 14 9
M  V30 END BOND
M  V30 BEGIN COLLECTION
M  V30 MDLV30/STERAC1 ATOMS=(2 5 8)
M  V30 MDLV30/STERAC2 ATOMS=(2 17 21)
M  V30 END COLLECTION
M  V30 END CTAB
M  END
> <Name>
2 And groups, from CXSMILES

$$$$

If I run this code:

    # 4. read the same molecule from an SDF made by different software into 
list ms2
    with Chem.SDMolSupplier('mol_with_enhanced_stereo_2_And_groups.sdf') as SDF:
        ms2 = [m for m in SDF if m is not None]

I get the error messages below, and the MolBlock is wrong (does not contain the 
enhanced stereochemistry information).

RDKit WARNING: [16:51:59] Skipping unrecognized collection type at line 58: 
MDLV30/STERAC1 ATOMS=(2 5 8)
RDKit WARNING: [16:51:59] Skipping unrecognized collection type at line 59: 
MDLV30/STERAC2 ATOMS=(2 17 21)
[16:51:59] Skipping unrecognized collection type at line 58: MDLV30/STERAC1 
ATOMS=(2 5 8)
[16:51:59] Skipping unrecognized collection type at line 59: MDLV30/STERAC2 
ATOMS=(2 17 21)

> Does anybody know why this might be the case?
> Is there something in the V3000 format in the second file that makes rdkit 
> not process it correctly?
I compared them side by side, and the main differences I can see are the CFG 
flags added to the atom block, and the name in the first line. Hard to imagine 
how either of these things could have an impact on the collection block, which 
looks identical in the two SD files.

I am using SD files made by that 'other software' in many other contexts, and 
they seem to be processed correctly.
In fact I am also using those SD files for some work in rdkit; this test made 
me discover that I am losing information (the warnings often do not imply that, 
so I tend to ignore them, but in this case they do).

Thanks
This e-mail and its attachment(s) (if any) may contain confidential and/or 
proprietary information and is intended for its addressee(s) only. Any 
unauthorized use of the information contained herein (including, but not 
limited to, alteration, reproduction, communication, distribution or any other 
form of dissemination) is strictly prohibited. If you are not the intended 
addressee, please notify the originator promptly and delete this e-mail and its 
attachment(s) (if any) subsequently. Neither Galapagos nor any of its 
affiliates shall be liable for direct, special, indirect or consequential 
damages arising from alteration of the contents of this message (by a third 
party) or as a result of a virus being passed on.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss<https://eur05.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss&data=04%7C01%7C%7C6ea31c2cf71a4a13ccf308da1684d4a6%7C627f3c33bccc48bba033c0a6521f7642%7C1%7C0%7C637847057160230113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=zCrMPM348nYX5hJ9amge2xlWwqb9Keg%2B8hAM%2BzohjTc%3D&reserved=0>
This e-mail and its attachment(s) (if any) may contain confidential and/or 
proprietary information and is intended for its addressee(s) only. Any 
unauthorized use of the information contained herein (including, but not 
limited to, alteration, reproduction, communication, distribution or any other 
form of dissemination) is strictly prohibited. If you are not the intended 
addressee, please notify the originator promptly and delete this e-mail and its 
attachment(s) (if any) subsequently. Neither Galapagos nor any of its 
affiliates shall be liable for direct, special, indirect or consequential 
damages arising from alteration of the contents of this message (by a third 
party) or as a result of a virus being passed on.
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to