Hi Greg,
thank you for your feedback.
the tests you mentioned worked ok for me and both molecules are matched
using the specified smiles. I found that the matching problem was really
silly: I was expecting to match both molecules in the CHEMBL database I
downloaded (i.e. CHEMBL1517804 and CHEMBL2442053) which are accesible
though a search using the web interface of CHEMBL. However, for some
reason compound CHEMBL2442053 is not present in the downloadable
database (and obviously not being matched)
best regards
Alfredo
El 31/05/2018 a las 1:09, Greg Landrum escribió:
Hi Alfredo,
I can't think of any reason this would be true based on the molecules
you provide.
Certainly each of the molecules has a substructure match:
chembl_23=# select
'CCc1ccc(-n2nc3ccc(NC(=O)c4ccc5c(c4)OCO5)cc3n2)cc1'::mol@>'c1c[nH]nn1'::mol;
?column?
----------
t
(1 row)
chembl_23=# select
'COc1ncc(-c2ccc(N(Cc3ccsc3)C(=O)Cn3nnc4ccccc43)cc2)cn1'::mol@>'c1c[nH]nn1'::mol;
?column?
----------
t
(1 row)
And if I put them in a small table, add an index, and search, I also
get the expected results:
chembl_23=# create temporary table twomols (smiles text,m mol);
CREATE TABLE
chembl_23=# insert into twomols values
('CCc1ccc(-n2nc3ccc(NC(=O)c4ccc5c(c4)OCO5)cc3n2)cc1',
'CCc1ccc(-n2nc3ccc(NC(=O)c4ccc5c(c4)OCO5)cc3n2)cc1'::mol);
INSERT 0 1
chembl_23=# insert into twomols values
('COc1ncc(-c2ccc(N(Cc3ccsc3)C(=O)Cn3nnc4ccccc43)cc2)cn1',
'COc1ncc(-c2ccc(N(Cc3ccsc3)C(=O)Cn3nnc4ccccc43)cc2)cn1'::mol);
INSERT 0 1
chembl_23=# select smiles from twomols where m@>'c1c[nH]nn1'::mol;
smiles
-------------------------------------------------------
CCc1ccc(-n2nc3ccc(NC(=O)c4ccc5c(c4)OCO5)cc3n2)cc1
COc1ncc(-c2ccc(N(Cc3ccsc3)C(=O)Cn3nnc4ccccc43)cc2)cn1
(2 rows)
chembl_23=# create index tidx on twomols using gist(m);
CREATE INDEX
chembl_23=# select smiles from twomols where m@>'c1c[nH]nn1'::mol;
smiles
-------------------------------------------------------
CCc1ccc(-n2nc3ccc(NC(=O)c4ccc5c(c4)OCO5)cc3n2)cc1
COc1ncc(-c2ccc(N(Cc3ccsc3)C(=O)Cn3nnc4ccccc43)cc2)cn1
(2 rows)
Can you please check to see if this simple test works for you?
To do more detailed troubleshooting I will need to know which version
of the cartridge you are using and one which operating system.
Best,
-greg
On Tue, May 29, 2018 at 8:00 PM Alfredo Quevedo
<[email protected] <mailto:[email protected]>> wrote:
Dear user,
I am trying to perform a substructure search using smiles notation
under
the ChEMBL database I have already loaded into my postgreSQL
database. I
am here providing two sample molecules in smiles format as read by
the
RDKit cartrigde into the database:
Molecule 1: CCc1ccc(-n2nc3ccc(NC(=O)c4ccc5c(c4)OCO5)cc3n2)cc1
Molecule 2: COc1ncc(-c2ccc(N(Cc3ccsc3)C(=O)Cn3nnc4ccccc43)cc2)cn1
Both molecules contains a triazole scaffold, and I am trying to
select
both compounds among a whole database using the following smiles
genereated by RDKit for a triazole: ´c1c[nH]nn1´
My problem is that the search is only able to match molecule 1 but
not
molecule 2. Which may be the problem? Since I am serching in a
database
of compounds previously processed with the RDKit cartrigde,
shouldnt the
subtructure match?
thanks in advance for the help
regards
Alfredo
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss