Yeah, this is exactly the case where using qmol_from_ctab() should help. Below is a short example demonstrating this by querying my local ChEMBL instance. Notice that the first form of the query, which uses mol_from_ctab() matches what you describe: the results include amides, esters, etc. The second query, which uses qmol_from_ctab(), only returns molecules which have a ketone.
I hope this helps, -greg chembl_28=# select * from rdk.mols where m@>mol_from_ctab('aldehyde query MJ192500 4 3 0 0 0 0 0 0 0 0999 V2000 -2.8123 1.5508 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.5267 1.1383 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -4.2412 1.5508 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 -3.5267 0.3133 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 2 4 2 0 0 0 0 2 3 1 0 0 0 0 M END ') limit 5; molregno | m ----------+---------------------------------------------------------------- 310993 | O=C(NO)c1cc(CS(=O)(=O)c2ccc(Cl)cc2)on1 310992 | O=C(NO)c1cc(CS(=O)(=O)c2cccc(Cl)c2)on1 318822 | CCC(NC(=O)C[C@H](N)C(=O)N1CCC[C@H]1C#N)c1ccccc1 310016 | O=C(CCNC(=O)c1ccccc1)NC1CCN(Cc2ccc(Cl)cc2)C1 319381 | CCOC(=O)/C=C/c1ccc(CN(C(=O)C2CCCCC2)c2cccc(/C=C/C(=O)OC)c2)cc1 (5 rows) chembl_28=# select * from rdk.mols where m@>qmol_from_ctab('aldehyde query MJ192500 4 3 0 0 0 0 0 0 0 0999 V2000 -2.8123 1.5508 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -3.5267 1.1383 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 -4.2412 1.5508 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 -3.5267 0.3133 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 2 4 2 0 0 0 0 2 3 1 0 0 0 0 M END ') limit 5; molregno | m ----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 284772 | COC(=O)NC1[C@H](C)O[C@@H](O[C@H]2C/C=C(\C)[C@@H]3C=C[C@@H]4[C@ @H](O)[C@@H](C)C[C@H](C)[C@H]4[C@]3(C)/C(O)=C3\C(=O)O[C@]4(CC(C=O)=C[C@H ](OC(C)=O)[C@H]4/C=C\2C)C3=O)CC1(C)[N+](=O)[O-] 284633 | COC(=O)NC1[C@H](C)O[C@@H](O[C@H]2C/C=C(\C)[C@@H]3C=C[C@@H]4[C@ @H](O[C@H]5CCCCO5)[C@@H](C)C[C@H](C)[C@H]4[C@]3(C)/C(O)=C3\C(=O)O[C@ ]4(CC(C=O)=C[C@H](OC(C)=O)[C@H]4/C=C\2C)C3=O)CC1(C)[N+](=O)[O-] 284865 | COC(=O)NC1[C@H](C)O[C@@H](O[C@H]2C/C=C(\C)[C@@H]3C=C[C@@H]4[C@ @H](OCc5ccc(OC)cc5)[C@@H](C)C[C@H](C)[C@H]4[C@]3(C)/C(O)=C3\C(=O)O[C@ ]4(CC(C=O)=C[C@H](OC(C)=O)[C@H]4/C=C\2C)C3=O)CC1(C)[N+](=O)[O-] 299586 | CC1(C)C2CC[C@]3(C)C(CC=C4C5CC(C)(C)[C@@H](OC(=O)c6ccccc6)[C@H ](OC(=O)/C=C/c6ccccc6)[C@]5(C=O)[C@H](O)C[C@]43C)[C@@]2(C)CC[C@@H]1O 317613 | Cn1cncc1C=O (5 rows) On Tue, Jul 20, 2021 at 11:55 PM Webster Homer < webster.ho...@milliporesigma.com> wrote: > I should have included the query. It looks like RD Kit is ignoring the H > atom > > The user put in an explicit H > > ===========MOL file after this > > aldehyde query > > MJ192500 > > > > 4 3 0 0 0 0 0 0 0 0999 V2000 > > -2.8123 1.5508 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -3.5267 1.1383 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -4.2412 1.5508 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 > > -3.5267 0.3133 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 > > 2 1 1 0 0 0 0 > > 2 4 2 0 0 0 0 > > 2 3 1 0 0 0 0 > > M END > > =================MOL file above this > > > > > > *From:* Greg Landrum <greg.land...@gmail.com> > *Sent:* Friday, July 16, 2021 11:38 PM > *To:* Webster Homer <webster.ho...@milliporesigma.com> > *Cc:* rdkit-discuss@lists.sourceforge.net > *Subject:* Re: [Rdkit-discuss] Substructure search for an aldehyde > returns ketones and acids > > > > *[WARNING – EXTERNAL EMAIL]* Do not open links or attachments unless you > recognize the sender of this email. If you are unsure please click the > button "Report suspicious email" > > > > Hi Webster, > > > > Without seeing an actual query I am inclined to believe that it’s not a > bug. The problem is more likely a query which has not been drawn explicitly > or an easily made mistake in the way the cartridge is being used. > > > > Assuming that the aldehyde queries have been drawn with an explicit H atom > connected to the C (apologies for not showing this, I’m on my phone and > don’t have a sketcher available), you should be calling the cartridge > function qmol_from_ctab(), not mol_from_ctab(), before doing the query. > qmol_from_ctab() will use the H to help define the query. > > > > If you’re doing this and still seeing incorrect search results, please > share a query and the way you’re doing the search and we can try to help > (or diagnose the bug if there is one) > > > > Best, > > -greg > > > > > > On Fri, 16 Jul 2021 at 17:53, Webster Homer < > webster.ho...@milliporesigma.com> wrote: > > We use RDKit Postgresql cartridge as our substructure searcher. When a > user sketches an aldehyde and submits the mol fle as the query. RD Kit > returns aldehydes, but also returns ketones and acids. Is this a bug? > > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > > > Click merckgroup.com/disclaimer > <https://www.merckgroup.com/en/legal-disclaimer/mail-disclaimer.html> to > access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak > versions of this disclaimer. > > > > Please find our Privacy Statement information by clicking here > merckgroup.com/en/privacy-statement.html > <https://www.merckgroup.com/en/privacy-statement.html> > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > > > Click merckgroup.com/disclaimer > <https://www.merckgroup.com/en/legal-disclaimer/mail-disclaimer.html> to > access the German, French, Spanish, Portuguese, Turkish, Polish and Slovak > versions of this disclaimer. > > > > Please find our Privacy Statement information by clicking here > merckgroup.com/en/privacy-statement.html > <https://www.merckgroup.com/en/privacy-statement.html> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss