Hi JP, On Thu, Mar 17, 2011 at 2:55 PM, JP <jeanpaul.ebe...@inhibox.com> wrote: > I am using RDKit 2010_12_1, in particular the database cartridge - but I am > quite positive this is an RDkit core problem.
Well, problem... feature.. it's all a matter of perspective. :-S > Now - I am trying a substructure search (using '@>' operator on the RDkit > molecule table) using smiles [H]N([H])C(=O)C(=O)C (so two explicit hydrogens > on the N which is bound to a C with double bond to O) the following > molecule is returned: > I have the following smiles string: > O=C(C(=O)N1CC[NH2+]CC1)c1ccc(cc1C)C > Which RDKit::mol object > Cc1ccc(C(C(N2CC[NH2+]CC2)=O)=O)c(C)c1 > But it shouldn't ! (No ?) > Any help will be as usual - much appreciated. Here's what happens: when you construct the query molecule from SMILES the Hs are removed, so the queries for [H]N([H])C(=O)C(=O)C and NC(=O)C(=O)C end up being identical. At the moment, if you want to include the Hs in the query you have to do a SMARTS query. This has the unfortunate side effect of making the query substantially slower. I agree that this behavior is "suboptimal". I'm going to have to think (and read) a bit to come up with a reasonable solution. -greg ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss