Alex, I'm glad that looks right. Unfortunately those changes are in the 2016.09 version of the RDKit, which was just finalized today. We haven't completed the anaconda builds for that yet.
-greg On Wed, Nov 23, 2016 at 3:29 PM, Alexander Klenner-Bajaja <aklen...@epo.org> wrote: > Dear Greg, > > > > Thank you very much, looking at the results that function was exactly what > I was looking for – only I can’t find it in my updated anaconda > installation. > > > > “conda update rdkit” tells me I have the latest version 2016.03.4 and > postgres tells me I have the 3.4 version of the RDKit extension > > > > If I understand your blog post correctly it should be in 2016.03 version? > What am I missing? > > > > > > Best, > > > > Alex > > > > > > > > *From:* Greg Landrum [mailto:greg.land...@gmail.com] > *Sent:* Wednesday, November 23, 2016 11:42 AM > *To:* Alexander Klenner-Bajaja > *Cc:* rdkit-discuss@lists.sourceforge.net > *Subject:* Re: [Rdkit-discuss] smarts vs smiles database queries and > explicit hydrogens > > > > Hi Alex, > > > > The new version of the cartridge has some capabilities that, I think, > address this. > > > > There's a blog post about this: http://rdkit.blogspot.com/2016 > /07/tuning-substructure-queries-ii.html > > but the short version is that you can do the kind of queries it seems like > you want to do quite simply: > > > > chembl_21=# select * from rdk.mols where > m@>mol_adjust_query_properties('*c1ncccn1') > limit 3; > > molregno | m > > > ----------+------------------------------------------------- > -------------------------------------- > > 601707 | CCCc1nc(-c2ccc(F)cc2)oc1C(=O)NC(CC)CN1CCN(c2ncccn2)CC1 > > 289103 | CC1C(=N)/C(=N/Nc2ccc(S(=O)(=O)Nc3ncccn3)cc2)C(=O)C(C)C1=O > > 607646 | CCNC(=O)[C@@H]1OC(n2cnc3c(NC(=O)Nc4ccc(S(=O)(=O)Nc5ncccn5)cc > 4)ncnc32)[C@@H](O)[C@H]1O > > (3 rows) > > > > chembl_21=# select * from rdk.mols where > m@>mol_adjust_query_properties('*c1nc(*)ccn1') > limit 3; > > molregno | m > > ----------+------------------------------------------------------- > > 158659 | CCNc1nccc(-c2c(-c3ccc(F)cc3)ncn2C2CCN(C)CC2)n1 > > 158743 | Nc1nccc(-c2c(-c3ccc(F)cc3)ncn2C2CCN(Cc3ccccc3)CC2)n1 > > 158843 | CC1(C)CC(n2cnc(-c3ccc(F)cc3)c2-c2ccnc(N)n2)CC(C)(C)N1 > > (3 rows) > > > > chembl_21=# select * from rdk.mols where > m@>mol_adjust_query_properties('*c1nc(*)cc(*)n1') > limit 3; > > molregno | m > > > ----------+------------------------------------------------- > ------------------------- > > 726443 | CN=C(S)NNc1nc(C)cc(C)n1 > > 561136 | C[C@H](Nc1cc(NC2CCCCCC2)nc(C(F)(F)F)n1)[C@@H](Cc1ccc(Cl)cc1) > c1cccc(Br)c1 > > 205784 | CCN(CC)C(=O)CSc1nc(N)cc(Cl)n1 > > (3 rows) > > > > There's more detail in the blog post, but the default behavior is to > convert dummies into generic query atoms and to constrain the substitution > at any other *ring* position. > > > > Best Regards, > > -greg > > > > > > On Wed, Nov 23, 2016 at 9:20 AM, Alexander Klenner-Bajaja < > aklen...@epo.org> wrote: > > Hi all, > > > > I am currently exploring the possibilities of the RDKit database cartridge > for substructure search- I installed everything following the tutorial > from http://www.rdkit.org/docs/Install.html > > > > Very nice tutorial - worked perfectly fine. > > > > Since we are exploring solutions for browser based gui searches I created > a test page using Ketcher (http://lifescience.opensource.epam.com/ketcher/) > which communicates with the database through PHP. > > > > Ketcher returns a SMILES representation from the drawn molecule. The raw > data of the molecules in the database are canonical SMILES created from > RDKIT canonical SMILES from the rdkit KNIME node (they are text-mined from > patents). > > > > When doing substructure searches, as long as we query for well-defined > compounds the results make sense – however looking at R1,…-groups things > get a little odd. > > > > I found a very old discussion on the mailing list from 2009 where this has > been discussed and I understood from that dialog that when looking at > SMILES with a “*” representation this is interpreted as a dummy atom and > the same dummy atom is expected in the search space to produce a hit. While > a SMARTS representation of the same string actually leads to the behaviour > that “any atom” is matched at that position. > > > > I ended up with the very cumbersome query, I am sure there are more > elegant ways of doing this using ::qmol notation, but as I said I am > currently exploring J > > > > That’s the query (in PHP) in question for PostgreSQL: > > > > *$search_result = pg_query($dbconn, "select m from pat.mols where > m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('".$_POST['smiles > <m@%3emol_from_smarts(mol_to_smiles(mol_from_smiles('%22.$_POST['smiles>']. > "'))) LIMIT 20;"); * > > > > Extracting rdkit functionality leaves me with: > > > > *m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('".$_POST['smiles > <m@%3emol_from_smarts(mol_to_smiles(mol_from_smiles('%22.$_POST['smiles>']. > "')))* > > and adding a smiles string to make it more readable: > > > > *m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('* C([*])1=CC=CC=C1*'))) > (This is how Ketcher creates the smiles string, using explicit double > bonds)* > > > > This query does actually work and returns structures that are correct > (visually inspected a few examples) > > > > The same query without all the molecule conversion methods does not return > anything > > > > *m@>'* C([*])1=CC=CC=C1*'* > > > > I guess the reason for this is that the default interpretation is smiles > and it is looking for actual dummy atoms in the database (there are none). > > > > That’s my first question: Is this assumption correct? > > > > My next issue is a query with explicit hydrogens: > > > > Using > > > > *“C([*])1=C([H])C([H])=C([H])C([H])=C1[H]” * > > > > as a query with the all the molecule conversion as shown above to make > SMARTS happen, returns among others: > > > > *“C(C)1=CC=C(C)C=C1”* > > > > Which is correct for implicit hydrogens but not for explicit – so my guess > is they are lost. > > > > Can I enforce at query time against the cartridge to work with explicit > hydrogens so that only molecules are found that have different substitutes > at the “*” position? > > > > I could not find a pre-defined function for that. > > > > Thank you very much for any hints or solutions, > > > > Best regards, > > > > Alex > > > > > > > > Best regards / Mit freundlichen Grüßen / Sincères salutations > > > > Dr. Alexander Garvin Klenner-Bajaja > > Administrator Requirements Engineering-Solution Design | Dir. 2.8.3.3 > > European Patent Office > > Patentlaan 3-9 | 2288 EE Rijswijk | The Netherlands > Tel. +31(0)70340-1991 > > aklen...@epo.org > > www.epo.org > > > > Please consider the environment before printing this email. > > > > > ------------------------------------------------------------ > ------------------ > > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > >
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss