Giovanni -
Thanks for reporting this! I've heard a report like this, but in that
instance the reporter wasn't able to share the structure. Would you mind
creating an issue on the RDKit github issues page and attaching your
problematic structure? Please add it as an attachment - my hunch from those
other reports was that this is something fishy with whitespace characters.

Thanks!


dan nealschneider | senior staff developer

*he/him/his*

[image: Schrödinger, Inc.] <https://schrodinger.com/>


On Mon, Apr 4, 2022 at 11:02 AM Giovanni Tricarico <
giovanni.tricar...@glpg.com> wrote:

> Hello,
>
> I am trying to process V3000 MolBlock’s from some SD files, and I seem to
> encounter issues when enhanced stereochemistry information is present,
> depending on the source of the SD file.
>
>
>
> To test that the molecule to SDF and back conversion within rdkit was
> working OK, I ran this code:
>
>
>
>     import pandas as pd
>
>     from rdkit import Chem
>
>     from rdkit.Chem import Draw
>
>     from rdkit.Chem import PandasTools
>
>
>
>     # 1. convert to molecule a CXSMILES with encoded enhanced
> stereochemistry
>
>     m = Chem.MolFromSmiles('O=C(NC[C@@H]1CC[C@H](C2=CC=CC=C2)O1)N[C@
> @H]1COC[C@@H]1O |&1:4,7,&2:16,20|')
>
>
>
>     # check that the V3K molblock contains the enhanced stereochemistry
> information
>
>     print(Chem.MolToV3KMolBlock(m))
>
>
>
>     # 2. write the molecule to an SDF
>
>     writer = Chem.SDWriter('m_with_enh_stereo.sdf')
>
>     writer.SetForceV3000(True)
>
>     writer.write(m)
>
>     writer.close()
>
>
>
>     # 3. read the molecule back into a list ms
>
>     with Chem.SDMolSupplier('m_with_enh_stereo.sdf') as SDF:
>
>         ms = [m for m in SDF if m is not None]
>
>
>
>     # check that the V3000 molblock is OK
>
>     print(Chem.MolToV3KMolBlock(ms[0]))
>
>
>
> This worked well.
>
> The content of the SD file made by this script ('m_with_enh_stereo.sdf')
> was:
>
>
>
>
>
>      RDKit          2D
>
>
>
>   0  0  0  0  0  0  0  0  0  0999 V3000
>
> M  V30 BEGIN CTAB
>
> M  V30 COUNTS 22 24 0 0 0
>
> M  V30 BEGIN ATOM
>
> M  V30 1 O 7.414605 -6.052405 0.000000 0
>
> M  V30 2 C 6.201079 -6.934083 0.000000 0
>
> M  V30 3 N 4.830761 -6.323978 0.000000 0
>
> M  V30 4 C 4.673969 -4.832195 0.000000 0
>
> M  V30 5 C 3.303650 -4.222090 0.000000 0
>
> M  V30 6 C 2.004612 -4.972090 0.000000 0
>
> M  V30 7 C 0.889895 -3.968394 0.000000 0
>
> M  V30 8 C 1.500000 -2.598076 0.000000 0
>
> M  V30 9 C 0.750000 -1.299038 0.000000 0
>
> M  V30 10 C 1.500000 0.000000 0.000000 0
>
> M  V30 11 C 0.750000 1.299038 0.000000 0
>
> M  V30 12 C -0.750000 1.299038 0.000000 0
>
> M  V30 13 C -1.500000 0.000000 0.000000 0
>
> M  V30 14 C -0.750000 -1.299038 0.000000 0
>
> M  V30 15 O 2.991783 -2.754869 0.000000 0
>
> M  V30 16 N 6.357872 -8.425866 0.000000 0
>
> M  V30 17 C 7.728190 -9.035971 0.000000 0
>
> M  V30 18 C 9.027228 -8.285971 0.000000 0
>
> M  V30 19 O 10.141946 -9.289667 0.000000 0
>
> M  V30 20 C 9.531841 -10.659985 0.000000 0
>
> M  V30 21 C 8.040058 -10.503192 0.000000 0
>
> M  V30 22 O 7.036362 -11.617910 0.000000 0
>
> M  V30 END ATOM
>
> M  V30 BEGIN BOND
>
> M  V30 1 2 1 2
>
> M  V30 2 1 2 3
>
> M  V30 3 1 3 4
>
> M  V30 4 1 5 4 CFG=3
>
> M  V30 5 1 5 6
>
> M  V30 6 1 6 7
>
> M  V30 7 1 8 7 CFG=3
>
> M  V30 8 1 8 9
>
> M  V30 9 2 9 10
>
> M  V30 10 1 10 11
>
> M  V30 11 2 11 12
>
> M  V30 12 1 12 13
>
> M  V30 13 2 13 14
>
> M  V30 14 1 8 15
>
> M  V30 15 1 2 16
>
> M  V30 16 1 17 16 CFG=3
>
> M  V30 17 1 17 18
>
> M  V30 18 1 18 19
>
> M  V30 19 1 19 20
>
> M  V30 20 1 20 21
>
> M  V30 21 1 21 22 CFG=3
>
> M  V30 22 1 15 5
>
> M  V30 23 1 21 17
>
> M  V30 24 1 14 9
>
> M  V30 END BOND
>
> M  V30 BEGIN COLLECTION
>
> M  V30 MDLV30/STERAC1 ATOMS=(2 5 8)
>
> M  V30 MDLV30/STERAC2 ATOMS=(2 17 21)
>
> M  V30 END COLLECTION
>
> M  V30 END CTAB
>
> M  END
>
> >  <_CXSMILES_Data>  (1)
>
> |&1:4,7,&2:16,20|
>
>
>
> $$$$
>
>
>
> Then I tried reading an SD file for the exact same molecule, made by some
> other software.
>
> The content of that SD file ('mol_with_enhanced_stereo_2_And_groups.sdf')
> was:
>
>
>
> 2 And groups, from CXSMILES
>
>   SciTegic04042214202D
>
>
>
>   0  0  0  0  0  0            999 V3000
>
> M  V30 BEGIN CTAB
>
> M  V30 COUNTS 22 24 0 0 0
>
> M  V30 BEGIN ATOM
>
> M  V30 1 O 7.4146 -6.05241 0 0
>
> M  V30 2 C 6.20108 -6.93408 0 0
>
> M  V30 3 N 4.83076 -6.32398 0 0
>
> M  V30 4 C 4.67397 -4.83219 0 0
>
> M  V30 5 C 3.30365 -4.22209 0 0 CFG=2
>
> M  V30 6 C 2.00461 -4.97209 0 0
>
> M  V30 7 C 0.8899 -3.96839 0 0
>
> M  V30 8 C 1.5 -2.59808 0 0 CFG=2
>
> M  V30 9 C 0.75 -1.29904 0 0
>
> M  V30 10 C 1.5 0 0 0
>
> M  V30 11 C 0.75 1.29904 0 0
>
> M  V30 12 C -0.75 1.29904 0 0
>
> M  V30 13 C -1.5 0 0 0
>
> M  V30 14 C -0.75 -1.29904 0 0
>
> M  V30 15 O 2.99178 -2.75487 0 0
>
> M  V30 16 N 6.35787 -8.42587 0 0
>
> M  V30 17 C 7.72819 -9.03597 0 0 CFG=2
>
> M  V30 18 C 9.02723 -8.28597 0 0
>
> M  V30 19 O 10.14195 -9.28967 0 0
>
> M  V30 20 C 9.53184 -10.65999 0 0
>
> M  V30 21 C 8.04006 -10.50319 0 0 CFG=2
>
> M  V30 22 O 7.03636 -11.61791 0 0
>
> M  V30 END ATOM
>
> M  V30 BEGIN BOND
>
> M  V30 1 2 1 2
>
> M  V30 2 1 2 3
>
> M  V30 3 1 3 4
>
> M  V30 4 1 5 4 CFG=3
>
> M  V30 5 1 5 6
>
> M  V30 6 1 6 7
>
> M  V30 7 1 8 7 CFG=3
>
> M  V30 8 1 8 9
>
> M  V30 9 2 9 10
>
> M  V30 10 1 10 11
>
> M  V30 11 2 11 12
>
> M  V30 12 1 12 13
>
> M  V30 13 2 13 14
>
> M  V30 14 1 8 15
>
> M  V30 15 1 2 16
>
> M  V30 16 1 17 16 CFG=3
>
> M  V30 17 1 17 18
>
> M  V30 18 1 18 19
>
> M  V30 19 1 19 20
>
> M  V30 20 1 20 21
>
> M  V30 21 1 21 22 CFG=3
>
> M  V30 22 1 15 5
>
> M  V30 23 1 21 17
>
> M  V30 24 1 14 9
>
> M  V30 END BOND
>
> M  V30 BEGIN COLLECTION
>
> M  V30 MDLV30/STERAC1 ATOMS=(2 5 8)
>
> M  V30 MDLV30/STERAC2 ATOMS=(2 17 21)
>
> M  V30 END COLLECTION
>
> M  V30 END CTAB
>
> M  END
>
> > <Name>
>
> 2 And groups, from CXSMILES
>
>
>
> $$$$
>
>
>
> If I run this code:
>
>
>
>     # 4. read the same molecule from an SDF made by different software
> into list ms2
>
>     with Chem.SDMolSupplier('mol_with_enhanced_stereo_2_And_groups.sdf')
> as SDF:
>
>         ms2 = [m for m in SDF if m is not None]
>
>
>
> I get the error messages below, and the MolBlock is wrong (does not
> contain the enhanced stereochemistry information).
>
>
>
> RDKit WARNING: [16:51:59] Skipping unrecognized collection type at line
> 58: MDLV30/STERAC1 ATOMS=(2 5 8)
>
> RDKit WARNING: [16:51:59] Skipping unrecognized collection type at line
> 59: MDLV30/STERAC2 ATOMS=(2 17 21)
>
> [16:51:59] Skipping unrecognized collection type at line 58:
> MDLV30/STERAC1 ATOMS=(2 5 8)
>
> [16:51:59] Skipping unrecognized collection type at line 59:
> MDLV30/STERAC2 ATOMS=(2 17 21)
>
>
>
> > Does anybody know why this might be the case?
>
> > Is there something in the V3000 format in the second file that makes
> rdkit not process it correctly?
> I compared them side by side, and the main differences I can see are the
> CFG flags added to the atom block, and the name in the first line. Hard to
> imagine how either of these things could have an impact on the collection
> block, which looks identical in the two SD files.
>
>
>
> I am using SD files made by that ‘other software’ in many other contexts,
> and they seem to be processed correctly.
>
> In fact I am also using those SD files for some work in rdkit; this test
> made me discover that I am losing information (the warnings often do not
> imply that, so I tend to ignore them, but in this case they do).
>
>
>
> Thanks
> This e-mail and its attachment(s) (if any) may contain confidential and/or
> proprietary information and is intended for its addressee(s) only. Any
> unauthorized use of the information contained herein (including, but not
> limited to, alteration, reproduction, communication, distribution or any
> other form of dissemination) is strictly prohibited. If you are not the
> intended addressee, please notify the originator promptly and delete this
> e-mail and its attachment(s) (if any) subsequently. Neither Galapagos nor
> any of its affiliates shall be liable for direct, special, indirect or
> consequential damages arising from alteration of the contents of this
> message (by a third party) or as a result of a virus being passed on.
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to