Ok, I am able to reproduce this.

Here's a small reproducible (I'm using postgresql 9.4.2):

chembl_20=# drop table if exists foo; select * into temporary table foo
from rdk.mols order by molregno asc limit 100; create index midx on foo
using gist(m);select count(*) from foo where m@
='Cc1cc(-n2ncc(=O)[nH]c2=O)ccc1C(=O)c1ccccc1Cl'::mol;
DROP TABLE
SELECT 100
CREATE INDEX
 count
-------
     0
(1 row)


My first guess would be that this is something to do with the way molecular
equality is defined, but if I turn off the index things work:

chembl_20=# set enable_bitmapscan=off;
SET
chembl_20=# set enable_indexscan=off;
SET
chembl_20=# drop table if exists foo; select * into temporary table foo
from rdk.mols order by molregno asc limit 100; create index midx on foo
using gist(m);select count(*) from foo where m@
='Cc1cc(-n2ncc(=O)[nH]c2=O)ccc1C(=O)c1ccccc1Cl'::mol;
DROP TABLE
SELECT 100
CREATE INDEX
 count
-------
     1
(1 row)



And if I shrink the size of the table things work:

chembl_20=# drop table if exists foo; select * into temporary table foo
from rdk.mols order by molregno asc limit 98; create index midx on foo
using gist(m);select count(*) from foo where m@
='Cc1cc(-n2ncc(=O)[nH]c2=O)ccc1C(=O)c1ccccc1Cl'::mol;
DROP TABLE
SELECT 98
CREATE INDEX
 count
-------
     1
(1 row)


(including 99 rows fails)

So it looks like either a bug in the indexing or in the way equality is
integrated with the indexing. I will investigate and see if I can track it
down.

For the purposes of mychembl: The @= operator is ok if you don't have
anything else, but I think it would be a good idea to have an
RDKit-generated SMILES column in the database, putting an index on it, and
doing equality checking using that. It's going to end up being faster to
query (the string comparison will always be faster than the
cheminformatics) and it's useful to have access to that canonical smiles
for other things. Here's a quick demo using my example from before:

chembl_20=# drop table if exists foo; select
molregno,m,mol_to_smiles(m)::text smiles  into temporary table foo from
rdk.mols order by molregno asc limit 100; create index midx on foo
(smiles); select count(*) from foo where
smiles='Cc1cc(-n2ncc(=O)[nH]c2=O)ccc1C(=O)c1ccccc1Cl'::mol::text;DROP TABLE
SELECT 100
CREATE INDEX
 count
-------
     1
(1 row)



Best,
-greg


On Thu, Jun 18, 2015 at 6:01 PM, Michał Nowotka <[email protected]> wrote:

> Hi,
>
> I'm working on mychembl_20 and I have latest stable RDKit version
> ("Release_2015_03_1") installed and compiled as postgres cartridge.
>
> Some facts:
>
> 1. All tests went fine, including tests for postgres extension.
> 2. Substructure (@>) and similarity (tanimoto_sml) searches are working
> fine.
>
> Now I'm trying to execute this SQL statement:
>
> SELECT COUNT(*)
> FROM "mols_rdkit"
> WHERE (m@='C');
>
> This runs without errors but returns 0. It returns 0 for any SMILES
> string, even if I know that the structure exists in the database.
>
> Table "mold_rdkit" exists, is not empty and has column "m" (as I said,
> substructure and similarity are working fine).
>
> Am I doing something wrong? Is there anything I can do to verify if
> the problem is related to RDKit or something else?
>
> Kind regards,
>
> Michał Nowotka
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Rdkit-discuss mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to