Hi Jan,
Many thanks for your answer.
Your convention and your "dirty" hack are interesting even if both
necessitate specific treatment.
At a first glance I would prefer the second solution as a s-group
notation is kept (easier to further separate/classify compounds based on
the molfile). If no other solution is possible I will try to implement
this one.
Thanks again,
Cheers,
Lionel
Le 30/11/2017 à 12:11, Jan Holst Jensen a écrit :
On 2017-11-30 11:45, Lionel Colliandre wrote:
1- for polymers (brackets with n label) , the ctab is not considered
as valid and the mol_from_ctab function is not working (example of a
ctab at the end of the email). I think that it is the "M STY 1 1
SRU" block that is problematic. To the best of my knowledge no
cartridge is able to search directly a polymer but I would like
simply to be able to search the monomeric motif. Even with big
warning, is there a way to read and search such polymeric molecules
with RDKit?
Hi Lionel,
I have the exact same challenge in a molecule database. We decided to
go with a convention that end users will register polymers with
exactly 3 repeats so you can do sensible SSS searches.
In the registered polymers we use MUL (multiple-group) S-group
brackets instead of the SRU bracket type which RDKit (most sensibly)
refuses to process. We have various checks in place that ensure that
polymers get registered this way. The checks are implemented through
regexp expressions - I know, processing molfiles via regexp parsing is
crazy :-) but it works OK.
If you want a simple dirty hack you can try to process the polymers
this way:
1) Replace "SRU" S-group types with "MUL".
2) Change the "n" label to "1".
That will change your example molfile into:
"
Mrv1718011301710072D
4 3 0 0 0 0 999 V2000
-6.3839 2.3661 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-5.7428 1.8469 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.9726 2.1425 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-4.3314 1.6233 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
M STY 1 1 *MUL*
M SCN 1 1 HT
M SAL 1 2 2 3
M SDI 1 4 -4.3971 2.3134 -5.0201 1.5441
M SDI 1 4 -6.3183 1.6760 -5.6953 2.4454
M SBL 1 2 1 3
M SMT 1 *1*
M END
"
which RDKit will process and you get the single monomer unit.
Cheers
-- Jan Holst Jensen
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss