Hi all,

I have a database with very diverse molecules and I would like that a maximum a molecules should be searchable. I have "classical" organic compounds, polymers, catalysts, resins and others stuffs. I encounter specific problems that I would like to understand, and if possible resolve.

To resume my methodology, I follow the Chembl example with a pgSQL database and the RDKit cartridge. I have a main table with mol/ctab field and I create the rdk.mols table with the command based on mol_from_ctab function. Previously I test the validity of the ctab with the function is_valid_ctab. The search is made through the cartridge with the sql command: "SELECT id FROM rdk.mols WHERE m@>qmol_from_smiles('c1ccccc1')"


1- for polymers (brackets with n label) , the ctab is not considered as valid and the mol_from_ctab function is not working (example of a ctab at the end of the email). I think that it is the "M  STY  1   1 SRU" block that is problematic. To the best of my knowledge no cartridge is able to search directly a polymer but I would like simply to be able to search the monomeric motif. Even with big warning, is there a way to read and search such polymeric molecules with RDKit?


2- for a lot of compounds, the ctab is valid and I can convert them into mol and obtain the smile in the rdk.mols table. However I cannot found them when I search part of the smile.

**First for molecules with metals :

m1 = [Mn+2].[Zn+2]...

m2 = [Ag+].[Na+]...

m3 = [Ca+2]....

m4 = [Na+].c1ccc([B-](c2ccccc2)(c2ccccc2)c2ccccc2)cc1

m5 = [V+2]=O

m6 = [Rh+]...

m7 = [Cu].[Zn]

m8 = [Fe+2]...

For a database containing those molecules, these searches give:

[Mn] or [Mn+2] => 0 results (bad)

[Zn] => 0 (bad) but [Zn+2] => m1 (ok)

[Ag] or [Ag+] => m2 (ok)

[Na] => 0 (bad) why Ag is founded and not Na in the same molecule ?

but [Na+] => m2 + m4 (ok)

[Ca] => 0 (bad) but [Ca+2] => m3 (ok)

[B] or [B-] => 0 (bad)

[V] or [V+2] => 0 (bad)

[Rh] or [Rh+] => m6 (ok)

[Cu] => m7 (ok) but [Zn] => 0 (bad)

[Fe] => m8 (ok) but [Fe+2] => 0 (bad)

I cannot find a logic, sometime the atom is found and not the ion, sometime is the invert, sometime in the same molecule one can be found and not the other. Has someone an explanation?


** second for N3

m9 = [N-]=[N+]=[N-]

the following search gives:

[N-] or [N+] => 0 (bad)

[N-]=N => m9 (ok)

[N-]=[N+] => 0 (bad)

[N-]=[N+]=N => m9 (ok)

[N-]=[N+]=[N-] => m9 (ok)

Once again I cannot find a logic. Has someone an explanation?


Thanks in advance for your help,

Lionel




Example of ctab for a polymer:

"
  Mrv1718011301710072D

  4  3  0  0  0  0            999 V2000
   -6.3839    2.3661    0.0000 O   0  0  0  0  0  0  0  0  0  0 0  0
   -5.7428    1.8469    0.0000 C   0  0  0  0  0  0  0  0  0  0 0  0
   -4.9726    2.1425    0.0000 O   0  0  0  0  0  0  0  0  0  0 0  0
   -4.3314    1.6233    0.0000 H   0  0  0  0  0  0  0  0  0  0 0  0
  1  2  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  1  0  0  0  0
M  STY  1   1 SRU
M  SCN  1   1 HT
M  SAL   1  2   2   3
M  SDI   1  4   -4.3971    2.3134   -5.0201    1.5441
M  SDI   1  4   -6.3183    1.6760   -5.6953    2.4454
M  SBL   1  2   1   3
M  SMT   1 n
M  END
"


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to