Hi all,
I have a database with very diverse molecules and I would like that a
maximum a molecules should be searchable. I have "classical" organic
compounds, polymers, catalysts, resins and others stuffs. I encounter
specific problems that I would like to understand, and if possible resolve.
To resume my methodology, I follow the Chembl example with a pgSQL
database and the RDKit cartridge. I have a main table with mol/ctab
field and I create the rdk.mols table with the command based on
mol_from_ctab function. Previously I test the validity of the ctab with
the function is_valid_ctab. The search is made through the cartridge
with the sql command: "SELECT id FROM rdk.mols WHERE
m@>qmol_from_smiles('c1ccccc1')"
1- for polymers (brackets with n label) , the ctab is not considered as
valid and the mol_from_ctab function is not working (example of a ctab
at the end of the email). I think that it is the "M STY 1 1 SRU"
block that is problematic. To the best of my knowledge no cartridge is
able to search directly a polymer but I would like simply to be able to
search the monomeric motif. Even with big warning, is there a way to
read and search such polymeric molecules with RDKit?
2- for a lot of compounds, the ctab is valid and I can convert them into
mol and obtain the smile in the rdk.mols table. However I cannot found
them when I search part of the smile.
**First for molecules with metals :
m1 = [Mn+2].[Zn+2]...
m2 = [Ag+].[Na+]...
m3 = [Ca+2]....
m4 = [Na+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2ccccc2)cc1
m5 = [V+2]=O
m6 = [Rh+]...
m7 = [Cu].[Zn]
m8 = [Fe+2]...
For a database containing those molecules, these searches give:
[Mn] or [Mn+2] => 0 results (bad)
[Zn] => 0 (bad) but [Zn+2] => m1 (ok)
[Ag] or [Ag+] => m2 (ok)
[Na] => 0 (bad) why Ag is founded and not Na in the same molecule ?
but [Na+] => m2 + m4 (ok)
[Ca] => 0 (bad) but [Ca+2] => m3 (ok)
[B] or [B-] => 0 (bad)
[V] or [V+2] => 0 (bad)
[Rh] or [Rh+] => m6 (ok)
[Cu] => m7 (ok) but [Zn] => 0 (bad)
[Fe] => m8 (ok) but [Fe+2] => 0 (bad)
I cannot find a logic, sometime the atom is found and not the ion,
sometime is the invert, sometime in the same molecule one can be found
and not the other. Has someone an explanation?
** second for N3
m9 = [N-]=[N+]=[N-]
the following search gives:
[N-] or [N+] => 0 (bad)
[N-]=N => m9 (ok)
[N-]=[N+] => 0 (bad)
[N-]=[N+]=N => m9 (ok)
[N-]=[N+]=[N-] => m9 (ok)
Once again I cannot find a logic. Has someone an explanation?
Thanks in advance for your help,
Lionel
Example of ctab for a polymer:
"
Mrv1718011301710072D
4 3 0 0 0 0 999 V2000
-6.3839 2.3661 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-5.7428 1.8469 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.9726 2.1425 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
-4.3314 1.6233 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
M STY 1 1 SRU
M SCN 1 1 HT
M SAL 1 2 2 3
M SDI 1 4 -4.3971 2.3134 -5.0201 1.5441
M SDI 1 4 -6.3183 1.6760 -5.6953 2.4454
M SBL 1 2 1 3
M SMT 1 n
M END
"
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss