Alex,

I'm glad that looks right.
Unfortunately those changes are in the 2016.09 version of the RDKit, which
was just finalized today.
We haven't completed the anaconda builds for that yet.

-greg


On Wed, Nov 23, 2016 at 3:29 PM, Alexander Klenner-Bajaja <aklen...@epo.org>
wrote:

> Dear Greg,
>
>
>
> Thank you very much, looking at the results that function was exactly what
> I was looking for – only I can’t find it in my updated anaconda
> installation.
>
>
>
> “conda update rdkit” tells me I have the latest version 2016.03.4 and
> postgres tells me I have the 3.4 version of the RDKit extension
>
>
>
> If I understand your blog post correctly it should be in 2016.03 version?
> What am I missing?
>
>
>
>
>
> Best,
>
>
>
> Alex
>
>
>
>
>
>
>
> *From:* Greg Landrum [mailto:greg.land...@gmail.com]
> *Sent:* Wednesday, November 23, 2016 11:42 AM
> *To:* Alexander Klenner-Bajaja
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] smarts vs smiles database queries and
> explicit hydrogens
>
>
>
> Hi Alex,
>
>
>
> The new version of the cartridge has some capabilities that, I think,
> address this.
>
>
>
> There's a blog post about this: http://rdkit.blogspot.com/2016
> /07/tuning-substructure-queries-ii.html
>
> but the short version is that you can do the kind of queries it seems like
> you want to do quite simply:
>
>
>
> chembl_21=# select * from rdk.mols where 
> m@>mol_adjust_query_properties('*c1ncccn1')
> limit 3;
>
>  molregno |                                           m
>
>
> ----------+-------------------------------------------------
> --------------------------------------
>
>    601707 | CCCc1nc(-c2ccc(F)cc2)oc1C(=O)NC(CC)CN1CCN(c2ncccn2)CC1
>
>    289103 | CC1C(=N)/C(=N/Nc2ccc(S(=O)(=O)Nc3ncccn3)cc2)C(=O)C(C)C1=O
>
>    607646 | CCNC(=O)[C@@H]1OC(n2cnc3c(NC(=O)Nc4ccc(S(=O)(=O)Nc5ncccn5)cc
> 4)ncnc32)[C@@H](O)[C@H]1O
>
> (3 rows)
>
>
>
> chembl_21=# select * from rdk.mols where 
> m@>mol_adjust_query_properties('*c1nc(*)ccn1')
> limit 3;
>
>  molregno |                           m
>
> ----------+-------------------------------------------------------
>
>    158659 | CCNc1nccc(-c2c(-c3ccc(F)cc3)ncn2C2CCN(C)CC2)n1
>
>    158743 | Nc1nccc(-c2c(-c3ccc(F)cc3)ncn2C2CCN(Cc3ccccc3)CC2)n1
>
>    158843 | CC1(C)CC(n2cnc(-c3ccc(F)cc3)c2-c2ccnc(N)n2)CC(C)(C)N1
>
> (3 rows)
>
>
>
> chembl_21=# select * from rdk.mols where 
> m@>mol_adjust_query_properties('*c1nc(*)cc(*)n1')
> limit 3;
>
>  molregno |                                    m
>
>
> ----------+-------------------------------------------------
> -------------------------
>
>    726443 | CN=C(S)NNc1nc(C)cc(C)n1
>
>    561136 | C[C@H](Nc1cc(NC2CCCCCC2)nc(C(F)(F)F)n1)[C@@H](Cc1ccc(Cl)cc1)
> c1cccc(Br)c1
>
>    205784 | CCN(CC)C(=O)CSc1nc(N)cc(Cl)n1
>
> (3 rows)
>
>
>
> There's more detail in the blog post, but the default behavior is to
> convert dummies into generic query atoms and to constrain the substitution
> at any other *ring* position.
>
>
>
> Best Regards,
>
> -greg
>
>
>
>
>
> On Wed, Nov 23, 2016 at 9:20 AM, Alexander Klenner-Bajaja <
> aklen...@epo.org> wrote:
>
> Hi all,
>
>
>
> I am currently exploring the possibilities of the RDKit database cartridge
> for substructure search- I installed everything following the  tutorial
> from http://www.rdkit.org/docs/Install.html
>
>
>
> Very nice tutorial  - worked perfectly fine.
>
>
>
> Since we are exploring solutions for browser based gui searches I created
> a test page using Ketcher (http://lifescience.opensource.epam.com/ketcher/)
> which communicates with the database through PHP.
>
>
>
> Ketcher returns a SMILES representation from the drawn molecule. The raw
> data of the molecules in the database are canonical SMILES created from
> RDKIT canonical SMILES from the rdkit KNIME node (they are text-mined from
> patents).
>
>
>
> When doing substructure searches, as long as we query for well-defined
> compounds the results make sense – however looking at R1,…-groups things
> get a little odd.
>
>
>
> I found a very old discussion on the mailing list from 2009 where this has
> been discussed and I understood from that dialog that when looking at
> SMILES with a “*” representation this is interpreted as a dummy atom and
> the same dummy atom is expected in the search space to produce a hit. While
> a SMARTS representation of the same string actually leads to the behaviour
> that “any atom” is matched at that position.
>
>
>
> I ended up with the very cumbersome query, I am sure there are more
> elegant ways of doing this using ::qmol notation, but as I said I am
> currently exploring J
>
>
>
> That’s the query (in PHP) in question for PostgreSQL:
>
>
>
> *$search_result = pg_query($dbconn, "select m from pat.mols where
> m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('".$_POST['smiles
> <m@%3emol_from_smarts(mol_to_smiles(mol_from_smiles('%22.$_POST['smiles>'].
> "'))) LIMIT 20;"); *
>
>
>
> Extracting rdkit functionality leaves me with:
>
>
>
> *m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('".$_POST['smiles
> <m@%3emol_from_smarts(mol_to_smiles(mol_from_smiles('%22.$_POST['smiles>'].
> "')))*
>
> and adding a smiles string to make it more readable:
>
>
>
> *m@>mol_from_smarts(mol_to_smiles(mol_from_smiles('* C([*])1=CC=CC=C1*')))
> (This is how Ketcher creates the smiles string, using explicit double
> bonds)*
>
>
>
> This query does actually work and returns structures that are correct
> (visually inspected a few examples)
>
>
>
> The same query without all the molecule conversion methods does not return
> anything
>
>
>
> *m@>'* C([*])1=CC=CC=C1*'*
>
>
>
> I guess the reason for this is that the default interpretation is smiles
> and it is looking for actual dummy atoms in the database (there are none).
>
>
>
> That’s my first question: Is this assumption correct?
>
>
>
> My next issue is a query with explicit hydrogens:
>
>
>
> Using
>
>
>
> *“C([*])1=C([H])C([H])=C([H])C([H])=C1[H]” *
>
>
>
> as a query with the all the molecule conversion as shown above to make
> SMARTS happen, returns among others:
>
>
>
> *“C(C)1=CC=C(C)C=C1”*
>
>
>
> Which is correct for implicit hydrogens but not for explicit – so my guess
> is they are lost.
>
>
>
> Can I enforce at query time against the cartridge to work with explicit
> hydrogens so that only molecules are found that have different substitutes
> at the “*” position?
>
>
>
> I could not find a pre-defined function for that.
>
>
>
> Thank you very much for any hints or solutions,
>
>
>
> Best regards,
>
>
>
> Alex
>
>
>
>
>
>
>
> Best regards / Mit freundlichen Grüßen / Sincères salutations
>
>
>
> Dr. Alexander Garvin Klenner-Bajaja
>
> Administrator Requirements Engineering-Solution Design | Dir. 2.8.3.3
>
> European Patent Office
>
> Patentlaan 3-9 | 2288 EE Rijswijk | The Netherlands
> Tel. +31(0)70340-1991
>
> aklen...@epo.org
>
> www.epo.org
>
>
>
> Please consider the environment before printing this email.
>
>
>
>
> ------------------------------------------------------------
> ------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to