Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)
Hi Paolo, Finally I found that my molfile is problematic. It contains brakets around the Zn atom that do not allow to find the molecule. In : mol3 = Chem.MolFromMolFile('molfile.mol') In : mol3.GetSubstructMatch(querySmiles) Out[20]: () In : mol3.GetSubstructMatch(querySmarts) Out[21]: (1,) In : newmolfile = Chem.MolToMolBlock(mol3) In : mol4 = Chem.MolFromMolBlock(newmolfile) In : mol4.GetSubstructMatch(querySmiles) Out[28]: (1,) In : mol4.GetSubstructMatch(querySmarts) Out[29]: (1,) So the problem was my input and not the search function! Many thanks Lionel Le 30/11/2017 à 18:00, Paolo Tosco a écrit : Hi Lionel, my guess (but it is only a guess) is that the molecule which have a [Zn] atom with no charge might feature bonds between the zinc and the atoms which are part of the complex with the metal, e.g.: In [1]: from rdkit import Chem In [2]: querySmiles = Chem.MolFromSmiles('[Zn]') In [3]: querySmarts = Chem.MolFromSmarts('[Zn]') In [4]: mol = Chem.MolFromSmiles('N[Zn]N') In [5]: mol.GetSubstructMatch(querySmiles) Out[5]: () In [6]: mol.GetSubstructMatch(querySmarts) Out[6]: (1,) In [7]: znAtom = mol.GetAtomWithIdx(1) In [8]: znAtom.GetFormalCharge() Out[8]: 0 Best, Paolo On 11/30/17 16:47, Lionel Colliandre wrote: Hi Paolo, I am not sure to understand. If I concentrate on these searches : (q)mol_from_smiles('[Zn]') => do not find mol containing [Zn] or mol containing [Zn+2] (q)mol_from_smiles('[Zn+2]') => find mol containing [Zn+2] mol_from_smarts('[Zn]') => find mol containing [Zn] or mol containing [Zn+2] mol_from_smarts('[Zn+2]') => find mol containing [Zn+2] I understand all results except the first one: why at least [Zn] is not retreived? For me both mol should be retreived as with the smarts search. Cheers, Lionel Le 30/11/2017 à 14:27, Paolo Tosco a écrit : Hi Lionel, the success or failure of the SMILES searches depends on the fact that you specify the exact formal charge as present in the database molecule, which in turn depends on whether (and how) it was set in the input molecule when it was loaded in the database, whereas the SMART searches based on the element only will succeed no matter which the formal charge is, as it does not take into account the formal charge at all. Best, p. On 11/30/17 13:21, Lionel Colliandre wrote: Hi all, For the question of molecules that cannot be searched, I finally found a solution in treating my queries as smarts: SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]'); All the presented queries gives the expected results, even if I am not sure what is changing when I treat the query from smiles to smarts i.e. the query are valid smiles. Lionel 2- for a lot of compounds, the ctab is valid and I can convert them into mol and obtain the smile in the rdk.mols table. However I cannot found them when I search part of the smile. **First for molecules with metals : m1 = [Mn+2].[Zn+2]... m2 = [Ag+].[Na+]... m3 = [Ca+2] m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1 m5 = [V+2]=O m6 = [Rh+]... m7 = [Cu].[Zn] m8 = [Fe+2]... For a database containing those molecules, these searches give: [Mn] or [Mn+2] => 0 results (bad) [Zn] => 0 (bad) but [Zn+2] => m1 (ok) [Ag] or [Ag+] => m2 (ok) [Na] => 0 (bad) why Ag is founded and not Na in the same molecule ? but [Na+] => m2 + m4 (ok) [Ca] => 0 (bad) but [Ca+2] => m3 (ok) [B] or [B-] => 0 (bad) [V] or [V+2] => 0 (bad) [Rh] or [Rh+] => m6 (ok) [Cu] => m7 (ok) but [Zn] => 0 (bad) [Fe] => m8 (ok) but [Fe+2] => 0 (bad) I cannot find a logic, sometime the atom is found and not the ion, sometime is the invert, sometime in the same molecule one can be found and not the other. Has someone an explanation? ** second for N3 m9 = [N-]=[N+]=[N-] the following search gives: [N-] or [N+] => 0 (bad) [N-]=N => m9 (ok) [N-]=[N+] => 0 (bad) [N-]=[N+]=N => m9 (ok) [N-]=[N+]=[N-] => m9 (ok) Once again I cannot find a logic. Has someone an explanation? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org!http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)
Hi Lionel, my guess (but it is only a guess) is that the molecule which have a [Zn] atom with no charge might feature bonds between the zinc and the atoms which are part of the complex with the metal, e.g.: In [1]: from rdkit import Chem In [2]: querySmiles = Chem.MolFromSmiles('[Zn]') In [3]: querySmarts = Chem.MolFromSmarts('[Zn]') In [4]: mol = Chem.MolFromSmiles('N[Zn]N') In [5]: mol.GetSubstructMatch(querySmiles) Out[5]: () In [6]: mol.GetSubstructMatch(querySmarts) Out[6]: (1,) In [7]: znAtom = mol.GetAtomWithIdx(1) In [8]: znAtom.GetFormalCharge() Out[8]: 0 Best, Paolo On 11/30/17 16:47, Lionel Colliandre wrote: Hi Paolo, I am not sure to understand. If I concentrate on these searches : (q)mol_from_smiles('[Zn]') => do not find mol containing [Zn] or mol containing [Zn+2] (q)mol_from_smiles('[Zn+2]') => find mol containing [Zn+2] mol_from_smarts('[Zn]') => find mol containing [Zn] or mol containing [Zn+2] mol_from_smarts('[Zn+2]') => find mol containing [Zn+2] I understand all results except the first one: why at least [Zn] is not retreived? For me both mol should be retreived as with the smarts search. Cheers, Lionel Le 30/11/2017 à 14:27, Paolo Tosco a écrit : Hi Lionel, the success or failure of the SMILES searches depends on the fact that you specify the exact formal charge as present in the database molecule, which in turn depends on whether (and how) it was set in the input molecule when it was loaded in the database, whereas the SMART searches based on the element only will succeed no matter which the formal charge is, as it does not take into account the formal charge at all. Best, p. On 11/30/17 13:21, Lionel Colliandre wrote: Hi all, For the question of molecules that cannot be searched, I finally found a solution in treating my queries as smarts: SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]'); All the presented queries gives the expected results, even if I am not sure what is changing when I treat the query from smiles to smarts i.e. the query are valid smiles. Lionel 2- for a lot of compounds, the ctab is valid and I can convert them into mol and obtain the smile in the rdk.mols table. However I cannot found them when I search part of the smile. **First for molecules with metals : m1 = [Mn+2].[Zn+2]... m2 = [Ag+].[Na+]... m3 = [Ca+2] m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1 m5 = [V+2]=O m6 = [Rh+]... m7 = [Cu].[Zn] m8 = [Fe+2]... For a database containing those molecules, these searches give: [Mn] or [Mn+2] => 0 results (bad) [Zn] => 0 (bad) but [Zn+2] => m1 (ok) [Ag] or [Ag+] => m2 (ok) [Na] => 0 (bad) why Ag is founded and not Na in the same molecule ? but [Na+] => m2 + m4 (ok) [Ca] => 0 (bad) but [Ca+2] => m3 (ok) [B] or [B-] => 0 (bad) [V] or [V+2] => 0 (bad) [Rh] or [Rh+] => m6 (ok) [Cu] => m7 (ok) but [Zn] => 0 (bad) [Fe] => m8 (ok) but [Fe+2] => 0 (bad) I cannot find a logic, sometime the atom is found and not the ion, sometime is the invert, sometime in the same molecule one can be found and not the other. Has someone an explanation? ** second for N3 m9 = [N-]=[N+]=[N-] the following search gives: [N-] or [N+] => 0 (bad) [N-]=N => m9 (ok) [N-]=[N+] => 0 (bad) [N-]=[N+]=N => m9 (ok) [N-]=[N+]=[N-] => m9 (ok) Once again I cannot find a logic. Has someone an explanation? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org!http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)
Hi Paolo, I am not sure to understand. If I concentrate on these searches : (q)mol_from_smiles('[Zn]') => do not find mol containing [Zn] or mol containing [Zn+2] (q)mol_from_smiles('[Zn+2]') => find mol containing [Zn+2] mol_from_smarts('[Zn]') => find mol containing [Zn] or mol containing [Zn+2] mol_from_smarts('[Zn+2]') => find mol containing [Zn+2] I understand all results except the first one: why at least [Zn] is not retreived? For me both mol should be retreived as with the smarts search. Cheers, Lionel Le 30/11/2017 à 14:27, Paolo Tosco a écrit : Hi Lionel, the success or failure of the SMILES searches depends on the fact that you specify the exact formal charge as present in the database molecule, which in turn depends on whether (and how) it was set in the input molecule when it was loaded in the database, whereas the SMART searches based on the element only will succeed no matter which the formal charge is, as it does not take into account the formal charge at all. Best, p. On 11/30/17 13:21, Lionel Colliandre wrote: Hi all, For the question of molecules that cannot be searched, I finally found a solution in treating my queries as smarts: SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]'); All the presented queries gives the expected results, even if I am not sure what is changing when I treat the query from smiles to smarts i.e. the query are valid smiles. Lionel 2- for a lot of compounds, the ctab is valid and I can convert them into mol and obtain the smile in the rdk.mols table. However I cannot found them when I search part of the smile. **First for molecules with metals : m1 = [Mn+2].[Zn+2]... m2 = [Ag+].[Na+]... m3 = [Ca+2] m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1 m5 = [V+2]=O m6 = [Rh+]... m7 = [Cu].[Zn] m8 = [Fe+2]... For a database containing those molecules, these searches give: [Mn] or [Mn+2] => 0 results (bad) [Zn] => 0 (bad) but [Zn+2] => m1 (ok) [Ag] or [Ag+] => m2 (ok) [Na] => 0 (bad) why Ag is founded and not Na in the same molecule ? but [Na+] => m2 + m4 (ok) [Ca] => 0 (bad) but [Ca+2] => m3 (ok) [B] or [B-] => 0 (bad) [V] or [V+2] => 0 (bad) [Rh] or [Rh+] => m6 (ok) [Cu] => m7 (ok) but [Zn] => 0 (bad) [Fe] => m8 (ok) but [Fe+2] => 0 (bad) I cannot find a logic, sometime the atom is found and not the ion, sometime is the invert, sometime in the same molecule one can be found and not the other. Has someone an explanation? ** second for N3 m9 = [N-]=[N+]=[N-] the following search gives: [N-] or [N+] => 0 (bad) [N-]=N => m9 (ok) [N-]=[N+] => 0 (bad) [N-]=[N+]=N => m9 (ok) [N-]=[N+]=[N-] => m9 (ok) Once again I cannot find a logic. Has someone an explanation? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org!http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)
Hi Lionel, the success or failure of the SMILES searches depends on the fact that you specify the exact formal charge as present in the database molecule, which in turn depends on whether (and how) it was set in the input molecule when it was loaded in the database, whereas the SMART searches based on the element only will succeed no matter which the formal charge is, as it does not take into account the formal charge at all. Best, p. On 11/30/17 13:21, Lionel Colliandre wrote: Hi all, For the question of molecules that cannot be searched, I finally found a solution in treating my queries as smarts: SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]'); All the presented queries gives the expected results, even if I am not sure what is changing when I treat the query from smiles to smarts i.e. the query are valid smiles. Lionel 2- for a lot of compounds, the ctab is valid and I can convert them into mol and obtain the smile in the rdk.mols table. However I cannot found them when I search part of the smile. **First for molecules with metals : m1 = [Mn+2].[Zn+2]... m2 = [Ag+].[Na+]... m3 = [Ca+2] m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1 m5 = [V+2]=O m6 = [Rh+]... m7 = [Cu].[Zn] m8 = [Fe+2]... For a database containing those molecules, these searches give: [Mn] or [Mn+2] => 0 results (bad) [Zn] => 0 (bad) but [Zn+2] => m1 (ok) [Ag] or [Ag+] => m2 (ok) [Na] => 0 (bad) why Ag is founded and not Na in the same molecule ? but [Na+] => m2 + m4 (ok) [Ca] => 0 (bad) but [Ca+2] => m3 (ok) [B] or [B-] => 0 (bad) [V] or [V+2] => 0 (bad) [Rh] or [Rh+] => m6 (ok) [Cu] => m7 (ok) but [Zn] => 0 (bad) [Fe] => m8 (ok) but [Fe+2] => 0 (bad) I cannot find a logic, sometime the atom is found and not the ion, sometime is the invert, sometime in the same molecule one can be found and not the other. Has someone an explanation? ** second for N3 m9 = [N-]=[N+]=[N-] the following search gives: [N-] or [N+] => 0 (bad) [N-]=N => m9 (ok) [N-]=[N+] => 0 (bad) [N-]=[N+]=N => m9 (ok) [N-]=[N+]=[N-] => m9 (ok) Once again I cannot find a logic. Has someone an explanation? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)
Hi all, For the question of molecules that cannot be searched, I finally found a solution in treating my queries as smarts: SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]'); All the presented queries gives the expected results, even if I am not sure what is changing when I treat the query from smiles to smarts i.e. the query are valid smiles. Lionel 2- for a lot of compounds, the ctab is valid and I can convert them into mol and obtain the smile in the rdk.mols table. However I cannot found them when I search part of the smile. **First for molecules with metals : m1 = [Mn+2].[Zn+2]... m2 = [Ag+].[Na+]... m3 = [Ca+2] m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1 m5 = [V+2]=O m6 = [Rh+]... m7 = [Cu].[Zn] m8 = [Fe+2]... For a database containing those molecules, these searches give: [Mn] or [Mn+2] => 0 results (bad) [Zn] => 0 (bad) but [Zn+2] => m1 (ok) [Ag] or [Ag+] => m2 (ok) [Na] => 0 (bad) why Ag is founded and not Na in the same molecule ? but [Na+] => m2 + m4 (ok) [Ca] => 0 (bad) but [Ca+2] => m3 (ok) [B] or [B-] => 0 (bad) [V] or [V+2] => 0 (bad) [Rh] or [Rh+] => m6 (ok) [Cu] => m7 (ok) but [Zn] => 0 (bad) [Fe] => m8 (ok) but [Fe+2] => 0 (bad) I cannot find a logic, sometime the atom is found and not the ion, sometime is the invert, sometime in the same molecule one can be found and not the other. Has someone an explanation? ** second for N3 m9 = [N-]=[N+]=[N-] the following search gives: [N-] or [N+] => 0 (bad) [N-]=N => m9 (ok) [N-]=[N+] => 0 (bad) [N-]=[N+]=N => m9 (ok) [N-]=[N+]=[N-] => m9 (ok) Once again I cannot find a logic. Has someone an explanation? -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss