Re: [Rdkit-discuss] Substructure searching (possible bug ouch)

Greg Landrum Thu, 17 Mar 2011 11:51:20 -0700

Hi JP,

On Thu, Mar 17, 2011 at 2:55 PM, JP <jeanpaul.ebe...@inhibox.com> wrote:
> I am using RDKit 2010_12_1, in particular the database cartridge - but I am
> quite positive this is an RDkit core problem.


Well, problem... feature.. it's all a matter of perspective. :-S

> Now - I am trying a substructure search (using '@>' operator on the RDkit
> molecule table) using smiles [H]N([H])C(=O)C(=O)C (so two explicit hydrogens
> on the N which is bound to a C with double bond to O)  the following
> molecule is returned:
> I have the following smiles string:
> O=C(C(=O)N1CC[NH2+]CC1)c1ccc(cc1C)C
> Which RDKit::mol object
> Cc1ccc(C(C(N2CC[NH2+]CC2)=O)=O)c(C)c1
> But it shouldn't ! (No ?)
> Any help will be as usual - much appreciated.

Here's what happens: when you construct the query molecule from SMILES
the Hs are removed, so the queries for [H]N([H])C(=O)C(=O)C and
NC(=O)C(=O)C end up being identical.

At the moment, if you want to include the Hs in the query you have to
do a SMARTS query. This has the unfortunate side effect of making the
query substantially slower.

I agree that this behavior is "suboptimal". I'm going to have to think
(and read) a bit to come up with a reasonable solution.

-greg

------------------------------------------------------------------------------
Colocation vs. Managed Hosting
A question and answer guide to determining the best fit
for your organization - today and in the future.
http://p.sf.net/sfu/internap-sfd2d
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Substructure searching (possible bug ouch)

Reply via email to