On Mon, Nov 4, 2019 at 5:33 PM Webster Homer < webster.ho...@milliporesigma.com> wrote:
> We are currently using Mol files, they’re large especially when > urlencoded, > That's a point. They tend to be less problematic when embedded in a POST query, but that has a different set of issues attached to it. > we would prefer smiles and smarts for queries, but we would need to be > able to normalize them. > > I think it should be possible to add that so that it behaves sensibly for at least most SMARTS features (at least the same way it works with Mol files), but it will take some thought. I think we need a function that is, essentially: NormalizeMolFromDrawingPackageSMARTS() This would make some assumptions about the semantics seen in the SMARTS and, essentially do the normalization that is done to queries read from Mol files. Sound right? An aside. > As to sketchers Marvin JS seems to do a pretty good job with SMARTS. > > Yes, but that's a big part of the problem here: it does a good job of translating what the user draw into SMARTS. When the user draws: [image: image.png] and asks for SMARTS they get: [#6]-1=[#6]-[#6]=[#6]-[#6]=[#6]-1 That is what they drew, and is a legal SMARTS, but it's not what they meant. Unfortunately I would guess that asking them to draw what they meant is not an option. If it were, this: [image: image.png] produces this SMARTS: c1ccccc1 which is fine. -greg > > Webster > > > > *From:* Greg Landrum <greg.land...@gmail.com> > *Sent:* Friday, November 01, 2019 11:21 PM > *To:* Webster Homer <webster.ho...@milliporesigma.com> > *Cc:* rdkit-discuss@lists.sourceforge.net > *Subject:* Re: [Rdkit-discuss] SMARTS Query Normalization? > > > > Hi Webster, > > > > That's a really good question. > > At the moment there isn't any way to do SMARTS normalization. The > assumption throughout the code is that if you've gone to the trouble to > create a SMARTS then you captured the aromaticity that you intend to search > for there. I think your use case makes sense though, so this would be an > interesting thing for us to take a look at for a future release. > > > > What you might be able to do in the meantime, and what I usually suggest > when coming from a chemical sketcher, is to get an MDL molfile from the > sketcher and then use that to do your queries. You can use mol_from_ctab() > in the cartridge along with mol_adjust_query_properties: > > > > chembl_25=# select * from rdk.mols where > m@>mol_adjust_query_properties(mol_from_ctab(' > > > Mrv1810 11021905152D > > > > 9 9 0 0 0 0 999 V2000 > > -2.2782 -0.0547 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -2.9927 -0.4672 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -2.9927 -1.2922 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -2.2782 -1.7047 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -1.5637 -1.2922 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -1.5637 -0.4672 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 > > -2.2782 0.7703 0.0000 A 0 0 0 0 0 0 0 0 0 0 0 0 > > -0.8493 -0.0547 0.0000 A 0 0 0 0 0 0 0 0 0 0 0 0 > > -0.8493 -1.7047 0.0000 A 0 0 0 0 0 0 0 0 0 0 0 0 > > 1 2 1 0 0 0 0 > > 2 3 2 0 0 0 0 > > 3 4 1 0 0 0 0 > > 4 5 2 0 0 0 0 > > 5 6 1 0 0 0 0 > > 1 6 2 0 0 0 0 > > 1 7 1 0 0 0 0 > > 6 8 1 0 0 0 0 > > 5 9 1 0 0 0 0 > > M END > > ')) limit 5; > > > > The chemical sketchers that I have tried tend to do a better job of > generating queries in Mol files, and the RDKit deals with converting from > kekule->aromatic form for you. > > > > Does that help? > > -greg > > > > > > On Thu, Oct 31, 2019 at 5:42 PM Webster Homer < > webster.ho...@milliporesigma.com> wrote: > > I am working on evaluating the RD Kit Postgresql data cartridge for use as > the back end of a Web Application. The app will use a JavaScript sketcher > to allow the user to input a SMILES of SMARTS that will be sent to the RD > Kit cartridge. In evaluating RD Kit I found that it doesn’t support > aromatic normalization on SMARTS. As a test case I used Marvin JS to > generate a SMARTS: C(=CN=C1)C(=C1N2)N=C2 > > > > Used it as a query: > > select structure_id from rdk.mols where m@ > >mol_adjust_query_properties(mol_from_smarts('C(=CN=C1)C(=C1N2)N=C2')); > > structure_id > > -------------- > > (0 rows) > > Not surprisingly it had no hits. Looked at the mol_adjust_query_properties > function: > > select > mol_adjust_query_properties(mol_from_smarts('C(=CN=C1)C(=C1N2)N=C2')); > > mol_adjust_query_properties > > ----------------------------- > > c1cc2ncnc2cn1 > > > > That looked good. > > select structure_id from rdk.mols where m@ > >mol_adjust_query_properties(mol_from_smarts('c1cc2ncnc2cn1')); > > structure_id > > -------------- > > 30183725 > > (1 row) > > But wait there should be more hits! > > select count(*) from rdk.mols where m@>'c1cc2ncnc2cn1'::qmol; > > count > > ------- > > 27 > > Then I tried this: > > select structure_id from rdk.mols where m@ > >mol_adjust_query_properties(mol_from_smarts('c1cc2ncnc2cn1'),'{"adjustDegree":false}'); > > (27 rows) > > OK, but what I really need to have work is this: > > select structure_id from rdk.mols where m@ > >mol_adjust_query_properties(mol_from_smarts('C(=CN=C1)C(=C1N2)N=C2'),'{"adjustDegree":false}'); > > structure_id > > -------------- > > (0 rows) > > Which it does not. Is mol_adjust_query_properties misnamed? It doesn’t > really seem to want a query. Am I missing an option? Unless I can make this > work I don’t see how I can use RD Kit in my application. > > > > What am I missing? Or does RD Kit just not allow for normalizing SMARTS? > > > > Thanks > > Webster Homer > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. Click http://www.merckgroup.com/disclaimer to access the > German, French, Spanish and Portuguese versions of this disclaimer. > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. Click http://www.merckgroup.com/disclaimer to access the > German, French, Spanish and Portuguese versions of this disclaimer. >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss