Dear RDKit community,

I'm getting unexpected results when combining SMARTS substructure
comparisons in SQL statements, and I'd like to ask for feedback to help me
understand what's going on.

Given an element, say Au, when I make a query like this:

SELECT cpds.cid FROM cpds WHERE (cpds.molecule @> '[Au]' ::qmol) AND NOT
(cpds.molecule @> '[C,c]~[C,c]' ::qmol) AND NOT (cpds.molecule @>
'[C!H0,c!H0]' ::qmol)

I don't expect to see any compounds with C-C or C-H bonds in the results.
Yet I get results like [(P(C5F5)3)4Au]Cl [1], or for example with Se,
[(CH3)3Se]+ [2]. Why?

It seems that usually my 'unexpected' results are matching one of the two
"AND NOT" conditions, not both (see console output below) but I haven't
checked systematically. I want the query to return only molecules for which
the last two substructure conditions are both false. Is my understanding of
SQL conjunctions mistaken?

I'm using RDKit 2016-03 and the rdkit extension on PostgreSQL 9.4. I'm
probably not using RDKit for what it was intended, but I'm certainly
grateful that it exists and is free software. I'd very much appreciate any
feedback on this question.

Best regards,



Some console output regarding those compounds:

In [3]: mSe = Chem.MolFromSmiles('C[Se+](C)C')

In [4]: mAu =
   ...: )F)F)F.Cl[Au]')

In [5]: mSe.HasSubstructMatch(Chem.MolFromSmarts('[C,c]~[C,c]'))
Out[5]: False

In [6]: mAu.HasSubstructMatch(Chem.MolFromSmarts('[C,c]~[C,c]'))
Out[6]: True

In [7]: mSe.HasSubstructMatch(Chem.MolFromSmarts('[C!H0,c!H0]'))
Out[7]: True

In [8]: mAu.HasSubstructMatch(Chem.MolFromSmarts('[C!H0,c!H0]'))
Out[8]: False

Akos Kokai <>
PhD candidate, Department of Environmental Science, Policy & Management
Fellow, Berkeley Center for Green Chemistry <>
University of California, Berkeley
Check out the vibrant tech community on one of the world's most
engaging tech sites,!
Rdkit-discuss mailing list

Reply via email to