Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-22 Thread Thomas Evangelidis
OK, I am almost there!

First, I tried the AllChem.ConstrainedEmbed(qmol, core) function to
generate conformers, where core was a mol object created from the MCS with
3D coordinates copied from template's MCS. But is seems that this functions
works only when core is an intact molecule, because I get this error:

[15:48:22] Explicit valence for atom # 8 C, 5, is greater than permitted
> Traceback (most recent call last):
>   File "/usr/local/bin/align_lig.py", line 392, in 
> embed_conformers2(qmolname, refmolname, writer)
>   File "/usr/local/bin/align_lig.py", line 295, in embed_conformers2
> Chem.AllChem.EmbedMultipleConfs(patt)
> ValueError: Sanitization error: Explicit valence for atom # 8 C, 5, is
> greater than permitted
>


So I ended up working with distance restraints during optimization. At the
end of the email is the relevant part of the code. The problem is that most
of the geometries are wrong, namely distorted rings, out-of-plane
hydrogens, etc. At first, my code selected as the most representative
conformer of each query compound that one with the lowest RMSD and highest
shape similarity value to the template. But in most of the cases the
selected conformer had wrong geometry. How can I rule out such conformers?
One thought was to measure the energy and select the lowest energy
conformer, but this does not necessarily mean that it will be the one
resembling the template ligand the most.

Lastly and most importantly (!!!), RDKit fails to generate conformers for
many ligands. The error I get is:

[16:29:30] Could not triangle bounds smooth molecule.
> WARNING: No conformations generated for molecule erk36
>

What does this mean? Am I using an old version of RDKit (2015.03.1, the one
in Ubuntu repositories) ?


Thanks in advance,
Thomas


### THE RELEVANT CODE 


def embed_conformers1(qmolname, refmolname, writer):
>
> global qmolname2qmol_dict, refmolname2refmol_dict,
> qmolname_refmolname_fullscaffMCS_multidict, rmsthreshold, shapethreshold,
> nconf
> global forcetol, RMSD_cutoff
>
> qmol = qmolname2qmol_dict[qmolname]
> refmolname = qmolname_bestRefmolname_dict[qmolname] # the best
> reference ligand
> qmol.SetProp('refmolname', refmolname)  # save in its properties the
> name of the template ligand
> refmol = refmolname2refmol_dict[refmolname]
> refconf = refmol.GetConformer() # the original reference conformer
> qconf = qmol.GetConformer() # the original query conformer
>
> # nsuccess = 0# number of query compound conformations that passed
> the RMSD and SHAPE SIM threshold criteria
>
> #
> # : matchValences=True,completeRingsOnly=True,bondCompare="bondtypes"
> mcs = qmolname_refmolname_fullscaffMCS_multidict[qmolname][refmolname]
> patt  = Chem.MolFromSmarts(mcs.smartsString)
> refatoms = refmol.GetSubstructMatch(patt)  # reference ligand atoms of
> the MCS
> qatoms  = qmol.GetSubstructMatch(patt)# query compound atoms of
> the MCS
>
> aliatoms = []   # the atomMap for the restrained conformer generation
> (include hydrogens)
> coordmap = {}   # the coordinates of the restrained atoms in conformer
> generation
>
> # AVENUE4.1: copy the coordinates of the MCS of the reference to the
> query 1st conformer, generate N conformers using distance restraints and
> optimize
> # their geometry using distance restraints again.
> for k in range(0, len(refatoms) ):
> pt3 = refconf.GetAtomPosition(refatoms[k])
> qconf.SetAtomPosition(qatoms[k],pt3)
> coordmap[qatoms[k]] = pt3
> aliatoms.append( [qatoms[k], refatoms[k]] )
>
> qmol.AddConformer(qconf, assignId=0)# replace the original query
> conformer with the one which has a MCS with the reference ligand coordinates
> # Now generate N query conformer and save their conformer IDs
> newconfIDs = AllChem.EmbedMultipleConfs(qmol, coordMap=coordmap,
> forceTol=forcetol, numConfs=args.N, pruneRmsThresh=RMSD_cutoff)
>
> if(len(newconfIDs) == 0):
> print "WARNING: No conformations generated for molecule %s"
> %(qmolname)
> else:
> bestshape_sim = -1
> bestrms = 1e10
> bestconfID= -1
>
> confID_isvalid_dict = []# confID -> 1 if the conformer is not
> too similar to the other conformers, or 0 otherwise
> # loop over all generated conformations
> confIDEnergy_list = []
> for qconfID in range(0, len(newconfIDs)):
> # print "DEBUG: optimizing confomrer", qconfID
>
> confID_isvalid_dict.append(1)
> # Optimize current conformation
> # from ConstrainedEmbed() , minimization using constraints
> defined by MCS-matching atoms
> ff = AllChem.UFFGetMoleculeForceField(qmol, confId=qconfID)
> # Add distance constraints for minimization
> for k in range(0, len(refatoms) ):
> pt3 = refconf.GetAtomPosition(refatoms[k])
>   

Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-21 Thread Thomas Evangelidis
Hi Greg,

Many thanks for the awesome post! I am still having trouble though to
select the right reference ligand for each query molecule. In particular I
have 9 crystal ligands, so these are the possible scenarios given one query
compound:

1) Measure the overall fingerprint similarity between the query and each
crystal ligands. Use as reference the crystal ligand with the highest
fingerprint similarity to the query.

2) Do the same but considering only the scaffolds of the query and the
crystal ligands.

3) Find the MCS between the query and each crystal ligand and use as
reference the crystal ligand with the biggest MCS (excluding hydrogens).

4) Like 3) but considering only the scaffolds of the query and the crystal
ligands.

If someone had experience the same problem and could recommend which way
works best or suggest any better way I would be very grateful.

Thomas





On 21 February 2017 at 07:35, Greg Landrum  wrote:

>
>
> On Mon, Feb 20, 2017 at 6:17 PM, Thomas Evangelidis 
> wrote:
>
>>
>> Thank you for your useful hints. All the compounds that I want to align
>> are supposed to belong to the same analogue series so they should shave a
>> common substructure with substantial size.
>>
>
> In that case, using an MCS based alignment should work reasonably well,
> particularly if you do the MCS of the entire series instead of doing it
> pairwise. The approach there would be to find the MCS and then use the
> RDKit's RMSD-based alignment (AllChem.AlignMol) where you provide the
> atomMap argument to specify the atom--atom mapping for alignment.
>
> Here's a short example of how to do that:
>
> # generate 3d structures (in case you don't have them already):
> mhs = [Chem.AddHs(x) for x in mols]
> [AllChem.EmbedMolecule(x,AllChem.ETKDG()) for x in mhs]
> mols = [Chem.RemoveHs(x) for x in mhs]
>
> # Find the MCS:
> from rdkit.Chem import rdFMCS
> mcs = rdFMCS.FindMCS(mols,threshold=0.8,completeRingsOnly=True,
> ringMatchesRingOnly=True)
>
> # align everything to the first molecule
> patt = Chem.MolFromSmarts(mcs.smartsString)
> refMol = mols[0]
> refMatch = refMol.GetSubstructMatch(patt)
> rmsVs = []
> for probeMol in mols[1:]:
> mv = probeMol.GetSubstructMatch(patt)
> rms = AllChem.AlignMol(probeMol,refMol,atomMap=list(zip(mv,refMatch)))
> rmsVs.append(rms)
>
>
> What I want to emulate is the "core restrained docking" with glide, where
>> you specify the common core of the query and the reference ligand using a
>> SMARTS pattern and then glide docks the query compound to the binding
>> pocket but takes care to overlay the core atoms of the query to the core
>> atoms of the reference compound. Since RDKit does not do docking, I just
>> generate 30 conformers of each query compound and select the best one by
>> measuring the RMSD between the core of the query and the core of the
>> reference after the alignment. Of course the conformations of the core
>> atoms between the query and the reference are never identical hence the bad
>> alignment. Is there any smarter way to emulate the "core restrained
>> docking" with RDKit?
>>
>
> The docking part is not doable in any straightforward way at the moment
> since it's hard to take information about the protein into account. There's
> an idea for a student summer project to solve this problem floating around,
> let's see if that gets funded and we find the right student.
>
> If the goal is to generate a set of conformations where cores are aligned
> with each other, this blog post may be interesting:
> http://rdkit.blogspot.ch/2013/12/using-allchemconstrainedembed.html
>
> -greg
>
>


-- 

==

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

  teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Greg Landrum
On Mon, Feb 20, 2017 at 6:17 PM, Thomas Evangelidis 
wrote:

>
> Thank you for your useful hints. All the compounds that I want to align
> are supposed to belong to the same analogue series so they should shave a
> common substructure with substantial size.
>

In that case, using an MCS based alignment should work reasonably well,
particularly if you do the MCS of the entire series instead of doing it
pairwise. The approach there would be to find the MCS and then use the
RDKit's RMSD-based alignment (AllChem.AlignMol) where you provide the
atomMap argument to specify the atom--atom mapping for alignment.

Here's a short example of how to do that:

# generate 3d structures (in case you don't have them already):
mhs = [Chem.AddHs(x) for x in mols]
[AllChem.EmbedMolecule(x,AllChem.ETKDG()) for x in mhs]
mols = [Chem.RemoveHs(x) for x in mhs]

# Find the MCS:
from rdkit.Chem import rdFMCS
mcs =
rdFMCS.FindMCS(mols,threshold=0.8,completeRingsOnly=True,ringMatchesRingOnly=True)

# align everything to the first molecule
patt = Chem.MolFromSmarts(mcs.smartsString)
refMol = mols[0]
refMatch = refMol.GetSubstructMatch(patt)
rmsVs = []
for probeMol in mols[1:]:
mv = probeMol.GetSubstructMatch(patt)
rms = AllChem.AlignMol(probeMol,refMol,atomMap=list(zip(mv,refMatch)))
rmsVs.append(rms)


What I want to emulate is the "core restrained docking" with glide, where
> you specify the common core of the query and the reference ligand using a
> SMARTS pattern and then glide docks the query compound to the binding
> pocket but takes care to overlay the core atoms of the query to the core
> atoms of the reference compound. Since RDKit does not do docking, I just
> generate 30 conformers of each query compound and select the best one by
> measuring the RMSD between the core of the query and the core of the
> reference after the alignment. Of course the conformations of the core
> atoms between the query and the reference are never identical hence the bad
> alignment. Is there any smarter way to emulate the "core restrained
> docking" with RDKit?
>

The docking part is not doable in any straightforward way at the moment
since it's hard to take information about the protein into account. There's
an idea for a student summer project to solve this problem floating around,
let's see if that gets funded and we find the right student.

If the goal is to generate a set of conformations where cores are aligned
with each other, this blog post may be interesting:
http://rdkit.blogspot.ch/2013/12/using-allchemconstrainedembed.html

-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Ling Chan
Hello Thomas,

This publication could be of interest to you. I have not read the whole
paper so I don't know how relevant it is.

Ling

Graph-Based Molecular Alignment (GMA)
Marialke, Korner, Tietze, Apostolakis
J. Chem. Inf. Model. 47 (2007) 591-601


On Mon, Feb 20, 2017 at 3:32 PM, Thomas Evangelidis 
wrote:

> @Peter
> I am working exactly on the scenario you described.
>
> @Brian
> I have found this thread which is pretty similar to my case and to what
> you suggested, so now I am adapting my code accordingly.
>
> https://sourceforge.net/p/rdkit/mailman/message/35034093/
>
> What you have published sounds interesting. I also have multiple
> co-crystal active sites. Could you please send me the link?
> One question: what was more successful in your experience when you had a
> crystal ligand and a series of analogues that you wanted to place into the
> binding pocket? 1) to superpose the MCS between the scaffold of the query
> and the crystal ligand, or the MCS of the whole molecules?
>
>
>
>
> On 20 February 2017 at 21:03, Peter S. Shenkin  wrote:
>
>> With Glide, IIRC, this facility is designed for the use case where the
>> coordinates of a docked ligand are known (typically from an X-ray
>> structure) and the docked ligand shares a SMARTS with the ligands in an
>> input file. The SMARTS-matching atoms of each incoming ligand are
>> superposed upon the corresponding atoms of the docked ligand and the
>> resulting pose is used as an initial guess for the docking.
>>
>> Some notes:
>>
>> 0. Greg questions whether there is really a common core in your example,
>> and if there's not, it doesn't appear as if the procedure is directly
>> applicable. But if it is applicable, read on.
>>
>> 1. If the SMARTS matches in multiple ways, all are tried, and the best
>> docking score among them wins (though there may be a way of requesting the
>> N best scores, or even all of them). So if the SMARTS specifies a phenyl
>> ring, for example, 12 initial poses will be tried. (If it contains two
>> phenyl rings, 24 will be tried)
>>
>> 2. GLIDE itself does conformer generation, but I'm not sure how it works
>> in this procedure. If the SMARTS specifies a rigid core, you probably don't
>> need to pre-generate conformers, but if the core is flexible, you are
>> probably best off generating them, which of course you are permitted to do.
>>
>> 3. If you have GLIDE, then  you probably have LigPrep as well. The
>> advantage of using LigPrep for your conformation generation would be that
>> the strain energy would be written into the output file, and then, when
>> used as the input to Glide, it would be taken into account when computing
>> the docking score. And it uses the same (or a very similar) force-field
>> that Glide itself uses.
>>
>> 4. I may have some details slightly incorrect, so you might want to
>> address your question to Schrödinger tech support.
>>
>> On Mon, Feb 20, 2017 at 2:15 PM, Brian Kelley 
>> wrote:
>>
>>> I don't know the exact glide procedure, but I did write such a system
>>> for OpenEye (POSIT).  The issue you are facing is that the RMSD portion is
>>> just a constraint used for docking, it isn't used as the "score", in fact,
>>> it can't tell if the conformation interpenetrates the active site or which
>>> orientation is better.
>>>
>>> I believe RDKit can generate conformations with a template, see
>>> AllChem.ConstrainedEmbed, this would solve half of your problem in creating
>>> conformations that match your template.  You still have the problem with
>>> scoring against your active site.  POSIT scored against the shape tanimoto
>>> of the active ligands (if any) to try to fill the same space as the known
>>> ligands. See rdkit.Chem.rdShapeHelpers.ShapeTanimotoDist
>>>
>>> This might not be what you want, but we had good success with similar
>>> methods and virtual screening, especially when using multiple co-crystal
>>> active sites.   I can send you a reference link if this interests you
>>>
>>> Cheers,
>>>  Brian
>>>
>>> On Mon, Feb 20, 2017 at 12:17 PM, Thomas Evangelidis 
>>> wrote:
>>>
 ​
 Greg and Brian,

 Thank you for your useful hints. All the compounds that I want to align
 are supposed to belong to the same analogue series so they should shave a
 common substructure with substantial size.

 What I want to emulate is the "core restrained docking" with glide,
 where you specify the common core of the query and the reference ligand
 using a SMARTS pattern and then glide docks the query compound to the
 binding pocket but takes care to overlay the core atoms of the query to the
 core atoms of the reference compound. Since RDKit does not do docking,
 I just generate 30 conformers of each query compound and select the best
 one by measuring the RMSD between the core of the query and the core
 of the reference after the alignment. Of course the 

Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Thomas Evangelidis
@Peter
I am working exactly on the scenario you described.

@Brian
I have found this thread which is pretty similar to my case and to what you
suggested, so now I am adapting my code accordingly.

https://sourceforge.net/p/rdkit/mailman/message/35034093/

What you have published sounds interesting. I also have multiple co-crystal
active sites. Could you please send me the link?
One question: what was more successful in your experience when you had a
crystal ligand and a series of analogues that you wanted to place into the
binding pocket? 1) to superpose the MCS between the scaffold of the query
and the crystal ligand, or the MCS of the whole molecules?




On 20 February 2017 at 21:03, Peter S. Shenkin  wrote:

> With Glide, IIRC, this facility is designed for the use case where the
> coordinates of a docked ligand are known (typically from an X-ray
> structure) and the docked ligand shares a SMARTS with the ligands in an
> input file. The SMARTS-matching atoms of each incoming ligand are
> superposed upon the corresponding atoms of the docked ligand and the
> resulting pose is used as an initial guess for the docking.
>
> Some notes:
>
> 0. Greg questions whether there is really a common core in your example,
> and if there's not, it doesn't appear as if the procedure is directly
> applicable. But if it is applicable, read on.
>
> 1. If the SMARTS matches in multiple ways, all are tried, and the best
> docking score among them wins (though there may be a way of requesting the
> N best scores, or even all of them). So if the SMARTS specifies a phenyl
> ring, for example, 12 initial poses will be tried. (If it contains two
> phenyl rings, 24 will be tried)
>
> 2. GLIDE itself does conformer generation, but I'm not sure how it works
> in this procedure. If the SMARTS specifies a rigid core, you probably don't
> need to pre-generate conformers, but if the core is flexible, you are
> probably best off generating them, which of course you are permitted to do.
>
> 3. If you have GLIDE, then  you probably have LigPrep as well. The
> advantage of using LigPrep for your conformation generation would be that
> the strain energy would be written into the output file, and then, when
> used as the input to Glide, it would be taken into account when computing
> the docking score. And it uses the same (or a very similar) force-field
> that Glide itself uses.
>
> 4. I may have some details slightly incorrect, so you might want to
> address your question to Schrödinger tech support.
>
> On Mon, Feb 20, 2017 at 2:15 PM, Brian Kelley 
> wrote:
>
>> I don't know the exact glide procedure, but I did write such a system for
>> OpenEye (POSIT).  The issue you are facing is that the RMSD portion is just
>> a constraint used for docking, it isn't used as the "score", in fact, it
>> can't tell if the conformation interpenetrates the active site or which
>> orientation is better.
>>
>> I believe RDKit can generate conformations with a template, see
>> AllChem.ConstrainedEmbed, this would solve half of your problem in creating
>> conformations that match your template.  You still have the problem with
>> scoring against your active site.  POSIT scored against the shape tanimoto
>> of the active ligands (if any) to try to fill the same space as the known
>> ligands. See rdkit.Chem.rdShapeHelpers.ShapeTanimotoDist
>>
>> This might not be what you want, but we had good success with similar
>> methods and virtual screening, especially when using multiple co-crystal
>> active sites.   I can send you a reference link if this interests you
>>
>> Cheers,
>>  Brian
>>
>> On Mon, Feb 20, 2017 at 12:17 PM, Thomas Evangelidis 
>> wrote:
>>
>>> ​
>>> Greg and Brian,
>>>
>>> Thank you for your useful hints. All the compounds that I want to align
>>> are supposed to belong to the same analogue series so they should shave a
>>> common substructure with substantial size.
>>>
>>> What I want to emulate is the "core restrained docking" with glide,
>>> where you specify the common core of the query and the reference ligand
>>> using a SMARTS pattern and then glide docks the query compound to the
>>> binding pocket but takes care to overlay the core atoms of the query to the
>>> core atoms of the reference compound. Since RDKit does not do docking,
>>> I just generate 30 conformers of each query compound and select the best
>>> one by measuring the RMSD between the core of the query and the core of
>>> the reference after the alignment. Of course the conformations of the core
>>> atoms between the query and the reference are never identical hence the bad
>>> alignment. Is there any smarter way to emulate the "core restrained
>>> docking" with RDKit?
>>>
>>> I will provide you with more info soon (example sdf, results, etc.).
>>>
>>>
>>> ​
>>>
>>>
>>
>> 
>> --
>> Check out the vibrant tech community on one of the 

Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Peter S. Shenkin
With Glide, IIRC, this facility is designed for the use case where the
coordinates of a docked ligand are known (typically from an X-ray
structure) and the docked ligand shares a SMARTS with the ligands in an
input file. The SMARTS-matching atoms of each incoming ligand are
superposed upon the corresponding atoms of the docked ligand and the
resulting pose is used as an initial guess for the docking.

Some notes:

0. Greg questions whether there is really a common core in your example,
and if there's not, it doesn't appear as if the procedure is directly
applicable. But if it is applicable, read on.

1. If the SMARTS matches in multiple ways, all are tried, and the best
docking score among them wins (though there may be a way of requesting the
N best scores, or even all of them). So if the SMARTS specifies a phenyl
ring, for example, 12 initial poses will be tried. (If it contains two
phenyl rings, 24 will be tried)

2. GLIDE itself does conformer generation, but I'm not sure how it works in
this procedure. If the SMARTS specifies a rigid core, you probably don't
need to pre-generate conformers, but if the core is flexible, you are
probably best off generating them, which of course you are permitted to do.

3. If you have GLIDE, then  you probably have LigPrep as well. The
advantage of using LigPrep for your conformation generation would be that
the strain energy would be written into the output file, and then, when
used as the input to Glide, it would be taken into account when computing
the docking score. And it uses the same (or a very similar) force-field
that Glide itself uses.

4. I may have some details slightly incorrect, so you might want to address
your question to Schrödinger tech support.

On Mon, Feb 20, 2017 at 2:15 PM, Brian Kelley  wrote:

> I don't know the exact glide procedure, but I did write such a system for
> OpenEye (POSIT).  The issue you are facing is that the RMSD portion is just
> a constraint used for docking, it isn't used as the "score", in fact, it
> can't tell if the conformation interpenetrates the active site or which
> orientation is better.
>
> I believe RDKit can generate conformations with a template, see
> AllChem.ConstrainedEmbed, this would solve half of your problem in creating
> conformations that match your template.  You still have the problem with
> scoring against your active site.  POSIT scored against the shape tanimoto
> of the active ligands (if any) to try to fill the same space as the known
> ligands. See rdkit.Chem.rdShapeHelpers.ShapeTanimotoDist
>
> This might not be what you want, but we had good success with similar
> methods and virtual screening, especially when using multiple co-crystal
> active sites.   I can send you a reference link if this interests you
>
> Cheers,
>  Brian
>
> On Mon, Feb 20, 2017 at 12:17 PM, Thomas Evangelidis 
> wrote:
>
>> ​
>> Greg and Brian,
>>
>> Thank you for your useful hints. All the compounds that I want to align
>> are supposed to belong to the same analogue series so they should shave a
>> common substructure with substantial size.
>>
>> What I want to emulate is the "core restrained docking" with glide, where
>> you specify the common core of the query and the reference ligand using a
>> SMARTS pattern and then glide docks the query compound to the binding
>> pocket but takes care to overlay the core atoms of the query to the core
>> atoms of the reference compound. Since RDKit does not do docking, I just
>> generate 30 conformers of each query compound and select the best one by
>> measuring the RMSD between the core of the query and the core of the
>> reference after the alignment. Of course the conformations of the core
>> atoms between the query and the reference are never identical hence the bad
>> alignment. Is there any smarter way to emulate the "core restrained
>> docking" with RDKit?
>>
>> I will provide you with more info soon (example sdf, results, etc.).
>>
>>
>> ​
>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Brian Kelley
I don't know the exact glide procedure, but I did write such a system for
OpenEye (POSIT).  The issue you are facing is that the RMSD portion is just
a constraint used for docking, it isn't used as the "score", in fact, it
can't tell if the conformation interpenetrates the active site or which
orientation is better.

I believe RDKit can generate conformations with a template, see
AllChem.ConstrainedEmbed, this would solve half of your problem in creating
conformations that match your template.  You still have the problem with
scoring against your active site.  POSIT scored against the shape tanimoto
of the active ligands (if any) to try to fill the same space as the known
ligands. See rdkit.Chem.rdShapeHelpers.ShapeTanimotoDist

This might not be what you want, but we had good success with similar
methods and virtual screening, especially when using multiple co-crystal
active sites.   I can send you a reference link if this interests you

Cheers,
 Brian

On Mon, Feb 20, 2017 at 12:17 PM, Thomas Evangelidis 
wrote:

> ​
> Greg and Brian,
>
> Thank you for your useful hints. All the compounds that I want to align
> are supposed to belong to the same analogue series so they should shave a
> common substructure with substantial size.
>
> What I want to emulate is the "core restrained docking" with glide, where
> you specify the common core of the query and the reference ligand using a
> SMARTS pattern and then glide docks the query compound to the binding
> pocket but takes care to overlay the core atoms of the query to the core
> atoms of the reference compound. Since RDKit does not do docking, I just
> generate 30 conformers of each query compound and select the best one by
> measuring the RMSD between the core of the query and the core of the
> reference after the alignment. Of course the conformations of the core
> atoms between the query and the reference are never identical hence the bad
> alignment. Is there any smarter way to emulate the "core restrained
> docking" with RDKit?
>
> I will provide you with more info soon (example sdf, results, etc.).
>
>
> ​
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Thomas Evangelidis
​
Greg and Brian,

Thank you for your useful hints. All the compounds that I want to align are
supposed to belong to the same analogue series so they should shave a
common substructure with substantial size.

What I want to emulate is the "core restrained docking" with glide, where
you specify the common core of the query and the reference ligand using a
SMARTS pattern and then glide docks the query compound to the binding
pocket but takes care to overlay the core atoms of the query to the core
atoms of the reference compound. Since RDKit does not do docking, I just
generate 30 conformers of each query compound and select the best one by
measuring the RMSD between the core of the query and the core of the
reference after the alignment. Of course the conformations of the core
atoms between the query and the reference are never identical hence the bad
alignment. Is there any smarter way to emulate the "core restrained
docking" with RDKit?

I will provide you with more info soon (example sdf, results, etc.).


​
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Brian Kelley
I believe (Greg can correct me) to align the bemis-murcko scaffold, you
could

(1) extract the original atom pairs and send them to the RMSD algorithm
(2) take a bemis scaffold and convert it to a substructure query for use in
the RMSD algorithm.  In either case the RMSD is the rmsd of the scaffold
atoms, not the rest of the molecule.  Below is a little snippet that I
believes does this.

from rdkit.Chem import AllChem
from rdkit.Chem import rdqueries, rdMolAlign

mol = Chem.MolFromMolBlock( mol block )
print Chem.MolToSmiles(m)

murcko = AllChem.MurckoDecompose(mol)
print "murko",  Chem.MolToSmiles(murcko)

# bemis scaffolds match everything
q = rdqueries.AtomNumGreaterQueryAtom(0)
bemis = Chem.RWMol(murcko)
for atom in bemis.GetAtoms():
   bemis.ReplaceAtom(atom.GetIdx(), q)

rmsd = Chem.rdMolAlign.AlignMol( bemis, m )
print rmsd





On Mon, Feb 20, 2017 at 11:21 AM, Greg Landrum 
wrote:

> HI Thomas,
>
> To be sure we're talking about the same thing: rdMolAlign.GetO3A() is an
> implementation of the Open3DAlign algorithm. This is an unsupervised
> approach that uses a clever algorithm to come up with an atom-atom mapping
> between the two molecules you give it. It's not always going to pick the
> same atoms to align that you would.
>
> To answer the original question: if the two molecules you want to align do
> not share the same scaffold (or at least have a lot in common in the core
> of the molecule), it's unlikely that an MCS-based alignment is going to
> help.
>
> To answer your direct question here, the scaffold finding code in
> rdkit.Chem should be preserving coordinates. Here's a simple demonstration
> of that:
>
> In [3]: m =Chem.AddHs(Chem.MolFromSmiles('CC1CO1'))
>
> In [4]: AllChem.EmbedMolecule(m,AllChem.ETKDG())
> Out[4]: 0
>
> In [5]: nh = Chem.RemoveHs(m)
>
> In [6]: print(Chem.MolToMolBlock(nh))
>
>  RDKit  3D
>
>   4  4  0  0  0  0  0  0  0  0999 V2000
>-1.18500.2738   -0.1814 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.0992   -0.4987   -0.2391 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.32290.12410.2290 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.6874   -0.75581.1165 O   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
>   2  3  1  0
>   3  4  1  0
>   4  2  1  0
> M  END
>
>
> In [7]: from rdkit.Chem import MurckoDecompose
>
> In [8]: scaff = Chem.MurckoDecompose(nh)
>
> In [10]: print(Chem.MolToMolBlock(scaff))
>
>  RDKit  3D
>
>   3  3  0  0  0  0  0  0  0  0999 V2000
> 0.0992   -0.4987   -0.2391 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.32290.12410.2290 C   0  0  0  0  0  0  0  0  0  0  0  0
> 0.6874   -0.75581.1165 O   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
>   2  3  1  0
>   3  1  1  0
> M  END
>
>
> You could help me provide a better answer here by providing a couple of
> example SDFs that you'd like to align, ideally together with a bit of
> sample code showing what you have tried that produces alignments you are
> unhappy with.
>
>
> -greg
>
>
>
>
> On Mon, Feb 20, 2017 at 1:54 PM, Thomas Evangelidis 
> wrote:
>
>> As a follow up question on this topic, I would like to ask if
>> MurckoScaffold.GetScaffoldForMol(mol) returns the scaffold of mol with
>> different coordinates?
>> I am asking this because when I use the transformation matrix of the
>> alignment of the cores of the probe and the reference molecules, in order
>> to align the whole probe to the reference molecule, the two molecules don't
>> seem to be aligned (they are in distance). Basically I do this:
>>
>> qcore = MurckoScaffold.GetScaffoldForMol(qmol)
>> refcore = MurckoScaffold.GetScaffoldForMol(refmol)
>> pyO3A = rdMolAlign.GetO3A(qcore, refcore, prbCid=qconfID, refCid=0,
>> reflect=True)
>> AllChem.TransformMol(qmol, bestRMSDTrans[1], confId=bestconfID,
>> keepConfs=False)
>>
>> and then I write the qmol in an sdf file. But when I visualize it the
>> qmol is far from the refmol!
>>
>>
>>
>>
>>
>> On 20 February 2017 at 02:33, Thomas Evangelidis 
>> wrote:
>>
>>> Dear all,
>>>
>>> I want to align 250 compounds that binding to the same pocket to one of
>>> the 9 available crystal ligands. I chose the reference ligand based on the
>>> Morgan2 similarity to the probe molecule. Then I align the 2 compounds
>>> using:
>>>
>>> pyO3A = rdMolAlign.GetO3A(qmol, refmol, prbCid=qconfID, refCid=0,
>>> reflect=True)
>>> RMSD = pyO3A.Align()
>>>
>>> ​and keep only the conformer of the probe with the lowest RMSD to the
>>> reference compound. However, the alignment looks terrible when I
>>> visualize it, so I would like to ask if there is any way to align the
>>> maximum common substructure only. I tried to align only the core of both
>>> molecules as defined by MurckoScaffold.GetScaffoldForMol(mol)​, but
>>> still the alignment looks bad. I have seen in the documentation how to find
>>> the maximum common substructure with rdFMCS.FindMCS but before I engage
>>> into 

Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Greg Landrum
HI Thomas,

To be sure we're talking about the same thing: rdMolAlign.GetO3A() is an
implementation of the Open3DAlign algorithm. This is an unsupervised
approach that uses a clever algorithm to come up with an atom-atom mapping
between the two molecules you give it. It's not always going to pick the
same atoms to align that you would.

To answer the original question: if the two molecules you want to align do
not share the same scaffold (or at least have a lot in common in the core
of the molecule), it's unlikely that an MCS-based alignment is going to
help.

To answer your direct question here, the scaffold finding code in
rdkit.Chem should be preserving coordinates. Here's a simple demonstration
of that:

In [3]: m =Chem.AddHs(Chem.MolFromSmiles('CC1CO1'))

In [4]: AllChem.EmbedMolecule(m,AllChem.ETKDG())
Out[4]: 0

In [5]: nh = Chem.RemoveHs(m)

In [6]: print(Chem.MolToMolBlock(nh))

 RDKit  3D

  4  4  0  0  0  0  0  0  0  0999 V2000
   -1.18500.2738   -0.1814 C   0  0  0  0  0  0  0  0  0  0  0  0
0.0992   -0.4987   -0.2391 C   0  0  0  0  0  0  0  0  0  0  0  0
1.32290.12410.2290 C   0  0  0  0  0  0  0  0  0  0  0  0
0.6874   -0.75581.1165 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  4  1  0
  4  2  1  0
M  END


In [7]: from rdkit.Chem import MurckoDecompose

In [8]: scaff = Chem.MurckoDecompose(nh)

In [10]: print(Chem.MolToMolBlock(scaff))

 RDKit  3D

  3  3  0  0  0  0  0  0  0  0999 V2000
0.0992   -0.4987   -0.2391 C   0  0  0  0  0  0  0  0  0  0  0  0
1.32290.12410.2290 C   0  0  0  0  0  0  0  0  0  0  0  0
0.6874   -0.75581.1165 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  1  0
  3  1  1  0
M  END


You could help me provide a better answer here by providing a couple of
example SDFs that you'd like to align, ideally together with a bit of
sample code showing what you have tried that produces alignments you are
unhappy with.


-greg




On Mon, Feb 20, 2017 at 1:54 PM, Thomas Evangelidis 
wrote:

> As a follow up question on this topic, I would like to ask if
> MurckoScaffold.GetScaffoldForMol(mol) returns the scaffold of mol with
> different coordinates?
> I am asking this because when I use the transformation matrix of the
> alignment of the cores of the probe and the reference molecules, in order
> to align the whole probe to the reference molecule, the two molecules don't
> seem to be aligned (they are in distance). Basically I do this:
>
> qcore = MurckoScaffold.GetScaffoldForMol(qmol)
> refcore = MurckoScaffold.GetScaffoldForMol(refmol)
> pyO3A = rdMolAlign.GetO3A(qcore, refcore, prbCid=qconfID, refCid=0,
> reflect=True)
> AllChem.TransformMol(qmol, bestRMSDTrans[1], confId=bestconfID,
> keepConfs=False)
>
> and then I write the qmol in an sdf file. But when I visualize it the qmol
> is far from the refmol!
>
>
>
>
>
> On 20 February 2017 at 02:33, Thomas Evangelidis 
> wrote:
>
>> Dear all,
>>
>> I want to align 250 compounds that binding to the same pocket to one of
>> the 9 available crystal ligands. I chose the reference ligand based on the
>> Morgan2 similarity to the probe molecule. Then I align the 2 compounds
>> using:
>>
>> pyO3A = rdMolAlign.GetO3A(qmol, refmol, prbCid=qconfID, refCid=0,
>> reflect=True)
>> RMSD = pyO3A.Align()
>>
>> ​and keep only the conformer of the probe with the lowest RMSD to the
>> reference compound. However, the alignment looks terrible when I
>> visualize it, so I would like to ask if there is any way to align the
>> maximum common substructure only. I tried to align only the core of both
>> molecules as defined by MurckoScaffold.GetScaffoldForMol(mol)​, but
>> still the alignment looks bad. I have seen in the documentation how to find
>> the maximum common substructure with rdFMCS.FindMCS but before I engage
>> into programming it I would like to know if there is any automatic way to
>> find it on the fly while aligning the 2 molecules.
>>
>>
>>
>> --
>>
>> ==
>>
>> Thomas Evangelidis
>>
>> Research Specialist
>> CEITEC - Central European Institute of Technology
>> Masaryk University
>> Kamenice 5/A35/1S081,
>> 62500 Brno, Czech Republic
>>
>> email: tev...@pharm.uoa.gr
>>
>>   teva...@gmail.com
>>
>>
>> website: https://sites.google.com/site/thomasevangelidishomepage/
>>
>>
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> 

Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Francois BERENGER
At least for the MCS calculation, there is an R package
for chemistry:

https://bioconductor.org/packages/release/bioc/vignettes/fmcsR/inst/doc/fmcsR.html

On 02/19/2017 07:33 PM, Thomas Evangelidis wrote:
> Dear all,
>
> I want to align 250 compounds that binding to the same pocket to one of
> the 9 available crystal ligands. I chose the reference ligand based on
> the Morgan2 similarity to the probe molecule. Then I align the 2
> compounds using:
>
> pyO3A = rdMolAlign.GetO3A(qmol, refmol, prbCid=qconfID, refCid=0,
> reflect=True)
> RMSD = pyO3A.Align()
>
> ​and keep only the conformer of the probe with the lowest RMSD to the
> reference compound. However, the alignment looks terrible when I
> visualize it, so I would like to ask if there is any way to align the
> maximum common substructure only. I tried to align only the core of both
> molecules as defined by MurckoScaffold.GetScaffoldForMol(mol)​, but
> still the alignment looks bad. I have seen in the documentation how to
> find the maximum common substructure with rdFMCS.FindMCS but before I
> engage into programming it I would like to know if there is any
> automatic way to find it on the fly while aligning the 2 molecules.
>
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
>
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr 
>
>   teva...@gmail.com 
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
> 
>
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aligning maximum common substructure of 2 molecules

2017-02-20 Thread Thomas Evangelidis
As a follow up question on this topic, I would like to ask if
MurckoScaffold.GetScaffoldForMol(mol) returns the scaffold of mol with
different coordinates?
I am asking this because when I use the transformation matrix of the
alignment of the cores of the probe and the reference molecules, in order
to align the whole probe to the reference molecule, the two molecules don't
seem to be aligned (they are in distance). Basically I do this:

qcore = MurckoScaffold.GetScaffoldForMol(qmol)
refcore = MurckoScaffold.GetScaffoldForMol(refmol)
pyO3A = rdMolAlign.GetO3A(qcore, refcore, prbCid=qconfID, refCid=0,
reflect=True)
AllChem.TransformMol(qmol, bestRMSDTrans[1], confId=bestconfID,
keepConfs=False)

and then I write the qmol in an sdf file. But when I visualize it the qmol
is far from the refmol!





On 20 February 2017 at 02:33, Thomas Evangelidis  wrote:

> Dear all,
>
> I want to align 250 compounds that binding to the same pocket to one of
> the 9 available crystal ligands. I chose the reference ligand based on the
> Morgan2 similarity to the probe molecule. Then I align the 2 compounds
> using:
>
> pyO3A = rdMolAlign.GetO3A(qmol, refmol, prbCid=qconfID, refCid=0,
> reflect=True)
> RMSD = pyO3A.Align()
>
> ​and keep only the conformer of the probe with the lowest RMSD to the
> reference compound. However, the alignment looks terrible when I visualize
> it, so I would like to ask if there is any way to align the maximum common
> substructure only. I tried to align only the core of both molecules as
> defined by MurckoScaffold.GetScaffoldForMol(mol)​, but still the
> alignment looks bad. I have seen in the documentation how to find the
> maximum common substructure with rdFMCS.FindMCS but before I engage into
> programming it I would like to know if there is any automatic way to find
> it on the fly while aligning the 2 molecules.
>
>
>
> --
>
> ==
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: tev...@pharm.uoa.gr
>
>   teva...@gmail.com
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>


-- 

==

Thomas Evangelidis

Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic

email: tev...@pharm.uoa.gr

  teva...@gmail.com


website: https://sites.google.com/site/thomasevangelidishomepage/
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss