Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)

2017-11-30 Thread Lionel Colliandre

Hi Paolo,

Finally I found that my molfile is problematic. It contains brakets 
around the Zn atom that do not allow to find the molecule.



In : mol3 = Chem.MolFromMolFile('molfile.mol')

In : mol3.GetSubstructMatch(querySmiles)
Out[20]: ()

In : mol3.GetSubstructMatch(querySmarts)
Out[21]: (1,)

In : newmolfile = Chem.MolToMolBlock(mol3)

In : mol4 = Chem.MolFromMolBlock(newmolfile)

In : mol4.GetSubstructMatch(querySmiles)
Out[28]: (1,)

In : mol4.GetSubstructMatch(querySmarts)
Out[29]: (1,)

So the problem was my input and not the search function!

Many thanks
Lionel



Le 30/11/2017 à 18:00, Paolo Tosco a écrit :


Hi Lionel,

my guess (but it is only a guess) is that the molecule which have a 
[Zn] atom with no charge might feature bonds between the zinc and the 
atoms which are part of the complex with the metal, e.g.:


In [1]: from rdkit import Chem

In [2]: querySmiles = Chem.MolFromSmiles('[Zn]')

In [3]: querySmarts = Chem.MolFromSmarts('[Zn]')

In [4]: mol = Chem.MolFromSmiles('N[Zn]N')

In [5]: mol.GetSubstructMatch(querySmiles)
Out[5]: ()

In [6]: mol.GetSubstructMatch(querySmarts)
Out[6]: (1,)

In [7]: znAtom = mol.GetAtomWithIdx(1)

In [8]: znAtom.GetFormalCharge()
Out[8]: 0

Best,
Paolo


On 11/30/17 16:47, Lionel Colliandre wrote:


Hi Paolo,

I am not sure to understand. If I concentrate on these searches :

(q)mol_from_smiles('[Zn]')     => do not find mol containing [Zn] or 
mol containing [Zn+2]


(q)mol_from_smiles('[Zn+2]') => find mol containing [Zn+2]

mol_from_smarts('[Zn]')         => find mol containing [Zn] or mol 
containing [Zn+2]


mol_from_smarts('[Zn+2]')     => find mol containing [Zn+2]

I understand all results except the first one: why at least [Zn] is 
not retreived? For me both mol should be retreived as with the smarts 
search.


Cheers,

Lionel


Le 30/11/2017 à 14:27, Paolo Tosco a écrit :


Hi Lionel,

the success or failure of the SMILES searches depends on the fact 
that you specify the exact formal charge as present in the database 
molecule, which in turn depends on whether (and how) it was set in 
the input molecule when it was loaded in the database, whereas the 
SMART searches based on the element only will succeed no matter 
which the formal charge is, as it does not take into account the 
formal charge at all.


Best,
p.


On 11/30/17 13:21, Lionel Colliandre wrote:


Hi all,


For the question of molecules that cannot be searched, I finally 
found a solution in treating my queries as smarts:


SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]');

All the presented queries gives the expected results, even if I am 
not sure what is changing when I treat the query from smiles to 
smarts i.e. the query are valid smiles.



Lionel


2- for a lot of compounds, the ctab is valid and I can convert 
them into mol and obtain the smile in the rdk.mols table. However 
I cannot found them when I search part of the smile.


**First for molecules with metals :

m1 = [Mn+2].[Zn+2]...

m2 = [Ag+].[Na+]...

m3 = [Ca+2]

m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1

m5 = [V+2]=O

m6 = [Rh+]...

m7 = [Cu].[Zn]

m8 = [Fe+2]...

For a database containing those molecules, these searches give:

[Mn] or [Mn+2] => 0 results (bad)

[Zn] => 0 (bad) but [Zn+2] => m1 (ok)

[Ag] or [Ag+] => m2 (ok)

[Na] => 0 (bad) why Ag is founded and not Na in the same molecule ?

but [Na+] => m2 + m4 (ok)

[Ca] => 0 (bad) but [Ca+2] => m3 (ok)

[B] or [B-] => 0 (bad)

[V] or [V+2] => 0 (bad)

[Rh] or [Rh+] => m6 (ok)

[Cu] => m7 (ok) but [Zn] => 0 (bad)

[Fe] => m8 (ok) but [Fe+2] => 0 (bad)

I cannot find a logic, sometime the atom is found and not the ion, 
sometime is the invert, sometime in the same molecule one can be 
found and not the other. Has someone an explanation?



** second for N3

m9 = [N-]=[N+]=[N-]

the following search gives:

[N-] or [N+] => 0 (bad)

[N-]=N => m9 (ok)

[N-]=[N+] => 0 (bad)

[N-]=[N+]=N => m9 (ok)

[N-]=[N+]=[N-] => m9 (ok)

Once again I cannot find a logic. Has someone an explanation?



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org!http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss








--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)

2017-11-30 Thread Paolo Tosco

Hi Lionel,

my guess (but it is only a guess) is that the molecule which have a [Zn] 
atom with no charge might feature bonds between the zinc and the atoms 
which are part of the complex with the metal, e.g.:


In [1]: from rdkit import Chem

In [2]: querySmiles = Chem.MolFromSmiles('[Zn]')

In [3]: querySmarts = Chem.MolFromSmarts('[Zn]')

In [4]: mol = Chem.MolFromSmiles('N[Zn]N')

In [5]: mol.GetSubstructMatch(querySmiles)
Out[5]: ()

In [6]: mol.GetSubstructMatch(querySmarts)
Out[6]: (1,)

In [7]: znAtom = mol.GetAtomWithIdx(1)

In [8]: znAtom.GetFormalCharge()
Out[8]: 0

Best,
Paolo


On 11/30/17 16:47, Lionel Colliandre wrote:


Hi Paolo,

I am not sure to understand. If I concentrate on these searches :

(q)mol_from_smiles('[Zn]') => do not find mol containing [Zn] or 
mol containing [Zn+2]


(q)mol_from_smiles('[Zn+2]') => find mol containing [Zn+2]

mol_from_smarts('[Zn]') => find mol containing [Zn] or mol 
containing [Zn+2]


mol_from_smarts('[Zn+2]') => find mol containing [Zn+2]

I understand all results except the first one: why at least [Zn] is 
not retreived? For me both mol should be retreived as with the smarts 
search.


Cheers,

Lionel


Le 30/11/2017 à 14:27, Paolo Tosco a écrit :


Hi Lionel,

the success or failure of the SMILES searches depends on the fact 
that you specify the exact formal charge as present in the database 
molecule, which in turn depends on whether (and how) it was set in 
the input molecule when it was loaded in the database, whereas the 
SMART searches based on the element only will succeed no matter which 
the formal charge is, as it does not take into account the formal 
charge at all.


Best,
p.


On 11/30/17 13:21, Lionel Colliandre wrote:


Hi all,


For the question of molecules that cannot be searched, I finally 
found a solution in treating my queries as smarts:


SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]');

All the presented queries gives the expected results, even if I am 
not sure what is changing when I treat the query from smiles to 
smarts i.e. the query are valid smiles.



Lionel


2- for a lot of compounds, the ctab is valid and I can convert them 
into mol and obtain the smile in the rdk.mols table. However I 
cannot found them when I search part of the smile.


**First for molecules with metals :

m1 = [Mn+2].[Zn+2]...

m2 = [Ag+].[Na+]...

m3 = [Ca+2]

m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1

m5 = [V+2]=O

m6 = [Rh+]...

m7 = [Cu].[Zn]

m8 = [Fe+2]...

For a database containing those molecules, these searches give:

[Mn] or [Mn+2] => 0 results (bad)

[Zn] => 0 (bad) but [Zn+2] => m1 (ok)

[Ag] or [Ag+] => m2 (ok)

[Na] => 0 (bad) why Ag is founded and not Na in the same molecule ?

but [Na+] => m2 + m4 (ok)

[Ca] => 0 (bad) but [Ca+2] => m3 (ok)

[B] or [B-] => 0 (bad)

[V] or [V+2] => 0 (bad)

[Rh] or [Rh+] => m6 (ok)

[Cu] => m7 (ok) but [Zn] => 0 (bad)

[Fe] => m8 (ok) but [Fe+2] => 0 (bad)

I cannot find a logic, sometime the atom is found and not the ion, 
sometime is the invert, sometime in the same molecule one can be 
found and not the other. Has someone an explanation?



** second for N3

m9 = [N-]=[N+]=[N-]

the following search gives:

[N-] or [N+] => 0 (bad)

[N-]=N => m9 (ok)

[N-]=[N+] => 0 (bad)

[N-]=[N+]=N => m9 (ok)

[N-]=[N+]=[N-] => m9 (ok)

Once again I cannot find a logic. Has someone an explanation?



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org!http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)

2017-11-30 Thread Lionel Colliandre

Hi Paolo,

I am not sure to understand. If I concentrate on these searches :

(q)mol_from_smiles('[Zn]')     => do not find mol containing [Zn] or mol 
containing [Zn+2]


(q)mol_from_smiles('[Zn+2]') => find mol containing [Zn+2]

mol_from_smarts('[Zn]')         => find mol containing [Zn] or mol 
containing [Zn+2]


mol_from_smarts('[Zn+2]')     => find mol containing [Zn+2]

I understand all results except the first one: why at least [Zn] is not 
retreived? For me both mol should be retreived as with the smarts search.


Cheers,

Lionel


Le 30/11/2017 à 14:27, Paolo Tosco a écrit :


Hi Lionel,

the success or failure of the SMILES searches depends on the fact that 
you specify the exact formal charge as present in the database 
molecule, which in turn depends on whether (and how) it was set in the 
input molecule when it was loaded in the database, whereas the SMART 
searches based on the element only will succeed no matter which the 
formal charge is, as it does not take into account the formal charge 
at all.


Best,
p.


On 11/30/17 13:21, Lionel Colliandre wrote:


Hi all,


For the question of molecules that cannot be searched, I finally 
found a solution in treating my queries as smarts:


SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]');

All the presented queries gives the expected results, even if I am 
not sure what is changing when I treat the query from smiles to 
smarts i.e. the query are valid smiles.



Lionel


2- for a lot of compounds, the ctab is valid and I can convert them 
into mol and obtain the smile in the rdk.mols table. However I 
cannot found them when I search part of the smile.


**First for molecules with metals :

m1 = [Mn+2].[Zn+2]...

m2 = [Ag+].[Na+]...

m3 = [Ca+2]

m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1

m5 = [V+2]=O

m6 = [Rh+]...

m7 = [Cu].[Zn]

m8 = [Fe+2]...

For a database containing those molecules, these searches give:

[Mn] or [Mn+2] => 0 results (bad)

[Zn] => 0 (bad) but [Zn+2] => m1 (ok)

[Ag] or [Ag+] => m2 (ok)

[Na] => 0 (bad) why Ag is founded and not Na in the same molecule ?

but [Na+] => m2 + m4 (ok)

[Ca] => 0 (bad) but [Ca+2] => m3 (ok)

[B] or [B-] => 0 (bad)

[V] or [V+2] => 0 (bad)

[Rh] or [Rh+] => m6 (ok)

[Cu] => m7 (ok) but [Zn] => 0 (bad)

[Fe] => m8 (ok) but [Fe+2] => 0 (bad)

I cannot find a logic, sometime the atom is found and not the ion, 
sometime is the invert, sometime in the same molecule one can be 
found and not the other. Has someone an explanation?



** second for N3

m9 = [N-]=[N+]=[N-]

the following search gives:

[N-] or [N+] => 0 (bad)

[N-]=N => m9 (ok)

[N-]=[N+] => 0 (bad)

[N-]=[N+]=N => m9 (ok)

[N-]=[N+]=[N-] => m9 (ok)

Once again I cannot find a logic. Has someone an explanation?



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org!http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)

2017-11-30 Thread Paolo Tosco

Hi Lionel,

the success or failure of the SMILES searches depends on the fact that 
you specify the exact formal charge as present in the database molecule, 
which in turn depends on whether (and how) it was set in the input 
molecule when it was loaded in the database, whereas the SMART searches 
based on the element only will succeed no matter which the formal charge 
is, as it does not take into account the formal charge at all.


Best,
p.


On 11/30/17 13:21, Lionel Colliandre wrote:


Hi all,


For the question of molecules that cannot be searched, I finally found 
a solution in treating my queries as smarts:


SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]');

All the presented queries gives the expected results, even if I am not 
sure what is changing when I treat the query from smiles to smarts 
i.e. the query are valid smiles.



Lionel


2- for a lot of compounds, the ctab is valid and I can convert them 
into mol and obtain the smile in the rdk.mols table. However I cannot 
found them when I search part of the smile.


**First for molecules with metals :

m1 = [Mn+2].[Zn+2]...

m2 = [Ag+].[Na+]...

m3 = [Ca+2]

m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1

m5 = [V+2]=O

m6 = [Rh+]...

m7 = [Cu].[Zn]

m8 = [Fe+2]...

For a database containing those molecules, these searches give:

[Mn] or [Mn+2] => 0 results (bad)

[Zn] => 0 (bad) but [Zn+2] => m1 (ok)

[Ag] or [Ag+] => m2 (ok)

[Na] => 0 (bad) why Ag is founded and not Na in the same molecule ?

but [Na+] => m2 + m4 (ok)

[Ca] => 0 (bad) but [Ca+2] => m3 (ok)

[B] or [B-] => 0 (bad)

[V] or [V+2] => 0 (bad)

[Rh] or [Rh+] => m6 (ok)

[Cu] => m7 (ok) but [Zn] => 0 (bad)

[Fe] => m8 (ok) but [Fe+2] => 0 (bad)

I cannot find a logic, sometime the atom is found and not the ion, 
sometime is the invert, sometime in the same molecule one can be 
found and not the other. Has someone an explanation?



** second for N3

m9 = [N-]=[N+]=[N-]

the following search gives:

[N-] or [N+] => 0 (bad)

[N-]=N => m9 (ok)

[N-]=[N+] => 0 (bad)

[N-]=[N+]=N => m9 (ok)

[N-]=[N+]=[N-] => m9 (ok)

Once again I cannot find a logic. Has someone an explanation?



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Strange behaviour of the substructure search (not valid ctab)

2017-11-30 Thread Lionel Colliandre

Hi all,


For the question of molecules that cannot be searched, I finally found a 
solution in treating my queries as smarts:


SELECT id FROM rdk.mols WHERE m@>*mol_from_smarts*('[Zn]');

All the presented queries gives the expected results, even if I am not 
sure what is changing when I treat the query from smiles to smarts i.e. 
the query are valid smiles.



Lionel


2- for a lot of compounds, the ctab is valid and I can convert them 
into mol and obtain the smile in the rdk.mols table. However I cannot 
found them when I search part of the smile.


**First for molecules with metals :

m1 = [Mn+2].[Zn+2]...

m2 = [Ag+].[Na+]...

m3 = [Ca+2]

m4 = [Na+].c1ccc([B-](c2c2)(c2c2)c2c2)cc1

m5 = [V+2]=O

m6 = [Rh+]...

m7 = [Cu].[Zn]

m8 = [Fe+2]...

For a database containing those molecules, these searches give:

[Mn] or [Mn+2] => 0 results (bad)

[Zn] => 0 (bad) but [Zn+2] => m1 (ok)

[Ag] or [Ag+] => m2 (ok)

[Na] => 0 (bad) why Ag is founded and not Na in the same molecule ?

but [Na+] => m2 + m4 (ok)

[Ca] => 0 (bad) but [Ca+2] => m3 (ok)

[B] or [B-] => 0 (bad)

[V] or [V+2] => 0 (bad)

[Rh] or [Rh+] => m6 (ok)

[Cu] => m7 (ok) but [Zn] => 0 (bad)

[Fe] => m8 (ok) but [Fe+2] => 0 (bad)

I cannot find a logic, sometime the atom is found and not the ion, 
sometime is the invert, sometime in the same molecule one can be found 
and not the other. Has someone an explanation?



** second for N3

m9 = [N-]=[N+]=[N-]

the following search gives:

[N-] or [N+] => 0 (bad)

[N-]=N => m9 (ok)

[N-]=[N+] => 0 (bad)

[N-]=[N+]=N => m9 (ok)

[N-]=[N+]=[N-] => m9 (ok)

Once again I cannot find a logic. Has someone an explanation?
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss