[Rdkit-discuss] sampling of ring conformation for docking

2024-05-13 Thread Pavel Polishchuk

Hello,

  I use RDKit to embed initial conformations for docking. The issue is 
with saturated rings. I can use a single random conformer but its 
geometry may be unsuitable and the whole molecule will fail to dock. I 
can use several starting conformers for docking and to avoid docking of 
very similar conformers I can select a few diverse conformers based on 
RMSD between rings only. However, the issue occurs if a molecule has 
several such saturated rings. The current workaround is to compute RMSD 
between corresponding rings individually, then average RMSD values and 
select a diverse set of conformers. It may work to some extend.
  However I'm curious whether a better solution possible? Can we sample 
rings individually and embed a molecule using pre-generated conformers 
of some parts (rings)? I know about the restricted conformer enumeration 
function, but it will work if we supply only a single connected part as 
fixed. It should not work if we have two disconnected parts (rings) with 
3D coordinates, because we do not know their relative position to 
generate 3D coordinates for the rest of atoms in a molecule.

  Maybe someone will have some ideas/suggestions?

Kind regards,
Pavel


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 6th Advanced in silico Drug Design workshop in Olomouc

2022-12-09 Thread Pavel Polishchuk

Dear colleagues,

  after covid "holidays" we are cordially invite you on the 6th Drug 
Design workshop which will be held 30 January - 3 February 2023 in 
Olomouc (Czech Republic). It is focused on practical applications of 
different chemoinformatic tools and approaches for drug development. 
This might be interesting for bachelor, master and PhD students to 
broaden their experience and sharpen skills. Basic Python skills would 
be desirable for machine learning and de novo tutorials.


  During the workshop, participants will be introduced with 
pharmacophore modeling (LigandScout), molecular docking (AutoDock Vina), 
QSAR modeling (sklearn and RDKit), high-throughput molecular dynamics 
(GROMACS) and de novo methods (CReM). A competition will be organized at 
the last day of the workshop where participants will be able to apply 
acquired knowledge and skills to solve real chemoinformatic tasks and 
win prizes.


  https://www.kfc.upol.cz/6add

  Please feel free to share this information with those who can be 
interested in participation in such event. Thank you!


Kind regards,
Pavel



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Comparison of the docking conformations with the X-ray pose

2022-03-30 Thread Pavel Polishchuk

Hi Enrico,

  you may look at this script 
https://github.com/DrrDom/rdkit-scripts/blob/master/rmsd_rdkit.py
  It takes PDBQT as input and reference files and calc rmsd for the 
largest common fragment if structures do not match each other. If there 
are multiple models in the input file it will calc rmsd for all of them. 
Do not forget supply SMILES files complementary to PDBQT, because PDB 
does not contain bond orders. There is a built-in help message in the 
script, but if you will have question regarding it you may ask me.
  At least you may use it as a source to create your own script 
suitable for your needs.


Kind regards,
Pavel.


On 30/03/2022 18:16, Enrico Martinez wrote:

Dear RDKIT users!
I am dealing with the analysis of the results of the docking poses
calculated via VINA and saved into the multi-model pdb.  I need to find
a possibility to compare each docking pose with the pose in the X-ray
structure (which has similar but not
the identical ligand!) in order to find automatically the model (=
docking solution) with the positions of the
ligand similar to the X-ray structure?

Assuming that the both pdbs (docking poses, and X-ray structure) have
been superimposed (based on the protein atoms) how could I
automatically find the model (in the ensemble of 100 docking
solutions) where the ligand may resemble the X-ray pose (e.g. taking a
part of the ligand shared between the both structures as the reference
for comparison) ?
I would be grateful for any suggestions
With kind regards,
Enrico


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2dpharmacophores

2021-08-12 Thread Pavel Polishchuk

Hi Chris,

Error reported a bin (0,2,0) which does not satisfy the triangle rule. I 
cannot check it, but I suppose that the error may disappear if you set 
trianglePruneBins=False in SigFactory, but this would be a workaround 
not a proper solution I think.


Pavel.

On 12/08/2021 10:25, Chris Swain via Rdkit-discuss wrote:

Hi,

I’m trying to generate 2Dpharmacophore fingerprints as described here

https://www.rdkit.org/docs/GettingStartedInPython.html#d-pharmacophore-fingerprints 



For the majority of molecules this works fine but for a few I get this 
error, any idea what the issue is?


Chris

qmol = 
Chem.MolFromSmiles('COc1ccc(C(C)C)cc1CN[C@H]1C2CCN(CC2)[C@H]1C(c1c1)c1c1')

fpquery = Generate.Gen2DFingerprint(qmol,sigFactory)
fpquery

---
ValueError Traceback (most recent call last)
~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/SigFactory.py  
inGetBitIdx(self, featIndices, dists, sortIndices)
248  print('\tbins:',  repr(self._bins),  type(self._bins))
--> 249bin_=  self._findBinIdx(dists,  self._bins,  self._scaffolds[len(dists)])
250  except  ValueError:

~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/SigFactory.py  
in_findBinIdx(self, dists, bins, scaffolds)
167  whichBins[i]  =  where
--> 168res=  scaffolds.index(tuple(whichBins))
169  if  _verbose:

ValueError: (0, 2, 0) is not in list

During handling of the above exception, another exception occurred:

IndexError Traceback (most recent call last)
  in
1  qmol=  
Chem.MolFromSmiles('COc1ccc(C(C)C)cc1CN[C@H]1C2CCN(CC2)[C@H]1C(c1c1)c1c1')
> 2fpquery=  Generate.Gen2DFingerprint(qmol,sigFactory)
3  fpquery

~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/Generate.py  
inGen2DFingerprint(mol, sigFactory, perms, dMat, bitInfo)
160  for  matchin  matchesToMap:
161if  sigFactory.shortestPathsOnly:
--> 162idx=  _ShortestPathsMatch(match,  perm,  sig,  dMat,  sigFactory)
163  if  idxis  not  None  and  bitInfois  not  None:
164l=  bitInfo.get(idx,  [])

~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/Generate.py  
in_ShortestPathsMatch(match, featureSet, sig, dMat, sigFactory)
71  dist[i]  =  d
72  
---> 73idx=  sigFactory.GetBitIdx(featureSet,  dist,  sortIndices=False)

74if  _verbose:
75  print('\t',  dist,  minD,  maxD,  idx)

~/miniconda3/lib/python3.7/site-packages/rdkit/Chem/Pharm2D/SigFactory.py  
inGetBitIdx(self, featIndices, dists, sortIndices)
252  fams=  [fams[x]  for  xin  featIndices]
253  raise IndexError('distance bin not found: feats: %s; dists=%s; 
bins=%s; scaffolds: %s' %
--> 254(fams, dists, self._bins, self._scaffolds)) 255  
256  return  startIdx+  offset+  bin_


IndexError: distance bin not found: feats: ['Acceptor', 'Aromatic', 
'Hydrophobe']; dists=[1, 5, 1]; bins=[(0, 2), (2, 5), (5, 8)]; scaffolds: [0, 
[(0,), (1,), (2,)], 0, [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1, 2), 
(0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1, 1, 1), 
(1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 1), (2, 0, 2), (2, 1, 0), 
(2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2)], 0]



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule

2021-04-01 Thread Pavel Polishchuk

Hi Ling,

  this can be a workaround if RDKit does not have a built-in function 
to extract a submolecule by atom ids. You may assign atom property 
labels to these atoms and then looping over atoms in EditableMol remove 
those ones which do not have this property assigned.


Kind regards,
Pavel.

On 02/04/2021 02:30, Ling Chan wrote:
Yes, Kangway, that was what I first tried, as mentioned in the first 
post. I did not have any problem with obtaining the primary fragments 
(applying all cuts) . Just that I have not yet figured out how to 
obtain the secondary fragments, either from recombining the 
primary fragments, or from fragmenting from the initial molecule (by 
not applying all cuts).


Chuang, Kangway > 於 2021年4月1日週四 下午5:20寫道:


Hi Ling,

I think I've run into something similar before, have you tried
using FragmentOnBonds followed by Chem.GetMolFrags? GetMolFrags
lets you toggle a few things (e.g. (bool)asMols=False [,
(bool)sanitizeFrags=True) to provide some workarounds with
sanitization.

Best,
Kangway

*From:* Ling Chan mailto:lingtrek...@gmail.com>>
*Sent:* Thursday, April 1, 2021 5:08 PM
*To:* Mark Mackey mailto:m...@cresset-group.com>>
*Cc:* RDKit mailto:rdkit-discuss@lists.sourceforge.net>>
*Subject:* Re: [Rdkit-discuss] rejoining pairs of fragments after
fragmenting a molecule
>> /I did manage to achieve what I want by going
through Chem.MolFragmentToSmiles and then convert the Smiles back
to a Mol. But is there a neater way?/

Oops, I wrote too soon. Actually I did not achieve what I want.
The conversion from the smiles from Chem.MolFragmentToSmiles
sometimes crashes, because of Sanitization problem.


Ling Chan mailto:lingtrek...@gmail.com>> 於
2021年4月1日週四 下午4:11寫道:

Thank you Mark for your suggestion. It sounds good and I gave
it a try. However, this leads to another question that may
sound dumb.

I have the atom indices of a fragment. For example, the
fragment comes from atoms [3,4,5,9,10,11,14] of the original
molecule. How can I extract this fragment from the molecule? 
I tried

(1) using EditableMol and deleting atoms one by one using
RemoveAtom. But this does not work since the atom numbering
changes after each deletion.
(2) going through FragmentOnBonds. But the output of
FragmentOnBonds have the atom indices reshuffled so I cannot
directly use my index list to fish out my fragment.

I did manage to achieve what I want by going
through Chem.MolFragmentToSmiles and then convert the Smiles
back to a Mol. But is there a neater way? Basically, is there
a Chem.MolFragmentToMol function?

Thank you again.

Ling


Mark Mackey mailto:m...@cresset-group.com>> 於 2021年4月1日週四
上午1:58寫道:

Hi Ling,

Having done something similar (but not in RDKit), I would
suggest a different algorithm. I think that fragmenting
the molecule first and then stitching the bits together is
always going to be very complicated. Instead, just
fragment the molecule in the ways that you want:

- Find the set B of all breakable bonds according to your
rules. I’m assuming here that B contains only acyclic bonds.

- To get all of the pairwise pieces, for each element b of
B break all bonds in B _except_ b. Keep the fragment
containing b, and clean up.

- To get all of the triplets, for each tuple (b1, b2) in
B, break all bonds in B except b1 and b2. Keep the
fragment containing b1 only if it also contains b2.

Regards,

Mark

**

*-- *

*Mark Mackey*

*Chief Scientific Officer*

*Cresset*

New Cambridge House, Bassingbourn Road, Litlington,
Cambridgeshire, SG8 0SS, UK

tel: +44 (0)1223 858890mobile: +44 (0)7595
099165fax: +44 (0)1223 853667

email:_m...@cresset-group.com
_web:www.cresset-group.com


skype: mark_cresset

*From:*Ling Chan mailto:lingtrek...@gmail.com>>
*Sent:* 31 March 2021 20:56
*To:* RDKit mailto:rdkit-discuss@lists.sourceforge.net>>
*Subject:* [Rdkit-discuss] rejoining pairs of fragments
after fragmenting a molecule

Dear Colleagues,

I am trying to do something that I think is quite simple,
but I have not figured out a simple way. 

Re: [Rdkit-discuss] ConstrainedEmbed issue

2020-07-07 Thread Pavel Polishchuk

Thank you, Sereina.
I understand importance of addition of hydrogens to get a reasonable 3D 
coordinates. But the situation may be not that simple.


1. Addition of hydrogen is only required for custom coordinates supplied 
from an external file. If coordinates of a template is generated with 
rdkit embedding it works without addition of explicit hydrogens.


2. I found an opposite example where addition of hydrogens breaks 
constrained embedding if custom coordinates of a template is used. And 
again if I generate coordinates of a template by rdkit everything is OK 
without addition of Hs.


These suggest that there is some issue with custom coordinates usage for 
constrained embedding.


I provided the code and output below.

Code:

data = [('1.mol', 'C[C@@H]1C1=O', 'C[C@@H]1CC[C@H](O)CC1=O'),
    ('2.mol', '[C@@H](CCC)NC(=O)c1ccc(F)cc1', 
'[C@H](CCC[C@@H](CCC)NC(=O)c1ccc(F)cc1)NC(=O)c1ccco1')]


for i, (mol_fname, smi_template, smi_child) in enumerate(data):

    print('iteration', i)

    mode = 'read template mol file, no AddHs'
    print(mode)
    mol_template = Chem.MolFromMolFile(mol_fname)
    mol_child = Chem.MolFromSmiles(smi_child)
    try:
    mol = AllChem.ConstrainedEmbed(mol_child, mol_template)
    print(mol.GetProp('EmbedRMS'))
    except ValueError as e:
    print(e)

    mode = 'read template mol file, AddHs'
    print(mode)
    mol_template = Chem.MolFromMolFile(mol_fname)
    mol_child = Chem.MolFromSmiles(smi_child)
    try:
    mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_template)
    print(mol.GetProp('EmbedRMS'))
    except ValueError as e:
    print(e)

    mode = 'embed template mol in rdkit, no AddHs'
    print(mode)
    mol_template = Chem.MolFromSmiles(smi_template)
    AllChem.EmbedMolecule(mol_template)
    mol_child = Chem.MolFromSmiles(smi_child)
    try:
    mol = AllChem.ConstrainedEmbed(mol_child, mol_template)
    print(mol.GetProp('EmbedRMS'))
    except ValueError as e:
    print(e)

Output:

iteration 0
read template mol file, no AddHs
Could not embed molecule.
read template mol file, AddHs
0.05014807519735495
embed template mol in rdkit, no AddHs
0.12358989886023371

iteration 1
read template mol file, no AddHs
0.057937898735270194
read template mol file, AddHs
Could not embed molecule. # <-- here rdkit spends a lot of time but 
fails

embed template mol in rdkit, no AddHs
0.1012757033705761


Pavel.



On 07/07/2020 21:41, Sunhwan Jo wrote:

Makes sense :)


On Jul 7, 2020, at 12:35 PM, Sereina Riniker 
mailto:sereina.rini...@gmail.com>> wrote:


Dear Pavel and Sunhwan,

Please note that hydrogens should always be added for the embedding 
algorithm to work properly (i.e. it’s not a walk around but what 
should be done).
See also Section “Working with 3D Molecules” in 
https://www.rdkit.org/docs/GettingStartedInPython.html


Best regards,
Sereina



On 7 Jul 2020, at 21:26, Sunhwan Jo <mailto:sunhw...@gmail.com>> wrote:



The reason constraint embed didn’t work is the molecule simply can’t 
be embedded using the rdkit’s algorithm.



In [25]: mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')

In [26]: AllChem.EmbedMolecule(mol_child)
Out[26]: -1


See more discussion here:
https://github.com/rdkit/rdkit/issues/2996


The SMILES you posted looks valid to me and doesn’t look that 
complicated, but the anyway I think
somehow the RDKit’s algorithm tripped up and couldn’t finish 
embedding without some help. Hope

someone with more in-depth insight can help here.


Anyway, for a walk around, adding H seems to do the trick:


In [39]: mol = AllChem.AddHs(mol_child)

In [40]: AllChem.EmbedMolecule(mol)
Out[40]: 0 # worked

In [41]: AllChem.ConstrainedEmbed(mol, mol_parent)
Out[41]:  # also worked




Sunhwan




On Jul 7, 2020, at 12:36 AM, Pavel Polishchuk 
mailto:pavel_polishc...@ukr.net>> wrote:


Hi all,

  I have an issue with ConstrainedEmbed and I cannot figure out 
what exactly causes this.
  I have a molecule C[C@@H]1C1=O with 3D coordinates in 1.mol 
file (attached). And I want to generate coordinates for another 
structure with this core -

C[C@@H]1CC[C@H](O)CC1=O.

  This is usual way which causes issue with embedding and the 
corresponding error.


mol_parent = Chem.MolFromMolFile('1.mol')
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
try:
    mol = AllChem.ConstrainedEmbed(mol_child, mol_parent)
except ValueError as e:
    print(e)

  If I add explicit hydrogens the issue disappears.

mol_parent = Chem.MolFromMolFile('1.mol')
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_parent)

  If I do not use pre-defined coordinates - everything works well.

mol_parent = Chem.MolFromSmiles('C[C@@H]1C1=O')
AllChem.EmbedMolecule(mol_parent)
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
mol = AllChem.ConstrainedEmbed(mol_child, mol_parent)

  Does ugly coordin

[Rdkit-discuss] ConstrainedEmbed issue

2020-07-07 Thread Pavel Polishchuk

Hi all,

  I have an issue with ConstrainedEmbed and I cannot figure out what 
exactly causes this.
  I have a molecule C[C@@H]1C1=O with 3D coordinates in 1.mol file 
(attached). And I want to generate coordinates for another structure 
with this core -

C[C@@H]1CC[C@H](O)CC1=O.

  This is usual way which causes issue with embedding and the 
corresponding error.


mol_parent = Chem.MolFromMolFile('1.mol')
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
try:
    mol = AllChem.ConstrainedEmbed(mol_child, mol_parent)
except ValueError as e:
    print(e)

  If I add explicit hydrogens the issue disappears.

mol_parent = Chem.MolFromMolFile('1.mol')
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
mol = AllChem.ConstrainedEmbed(Chem.AddHs(mol_child), mol_parent)

  If I do not use pre-defined coordinates - everything works well.

mol_parent = Chem.MolFromSmiles('C[C@@H]1C1=O')
AllChem.EmbedMolecule(mol_parent)
mol_child = Chem.MolFromSmiles('C[C@@H]1CC[C@H](O)CC1=O')
mol = AllChem.ConstrainedEmbed(mol_child, mol_parent)

  Does ugly coordinates in 1.mol file cause the embedding issue? Or the 
issue is caused by some implicit properties of a molecule? How to solve 
this properly?


Kind regards,
Pavel.


1.mol
Description: MOL mdl chemical test
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] conformations for cycles with multiple stereocenters

2020-02-21 Thread Pavel Polishchuk

Hello,

  I want to generate conformers for a stereoisomeric sugar moiety. The 
code below works (loads proccesor) but returns none of them.
  But if I remove all stereoconfiguration info in input SMILES the code 
generates conformers. Playing with this issue I noticed that conformers 
are generated if only one stereocenter is defined. If I define 
configuration of two centers the code stops working.

  I'm puzzled. Is there a solution or workaround?
  I use rdkit 2019.09.1

  mol = Chem.MolFromSmiles('OC[C@H]1O[C@H](O)C[C@@H](O)[C@H]1O')
  AllChem.EmbedMultipleConfs(mol, 50, Chem.rdDistGeom.ETKDGv2())
  mol.GetConformers()

Kind regards,
Pavel.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 5th in silico drug design workshop in Olomouc

2019-11-21 Thread Pavel Polishchuk

Sorry for the mistake, the dates are 3-7 of *February* 2020.
Pavel.

On 20/11/2019 09:39, Pavel Polishchuk wrote:

Dear colleagues,

  we would like to invite you on the 5th Drug Design workshop which 
will be held 3-7 January 2020 in Olomouc (Czech Republic). It is 
focused on practical applications of different chemoinformatic tools 
and approaches for drug development. This might be interesting for 
bachelor, master and PhD students to broaden their experience and 
sharpen skills. During the workshop, students will learn pharmacophore 
and QSAR modeling, molecular docking, PDBe services. A competition 
will be organized at the last day of the workshop where participants 
will be able to apply acquired knowledge to solve a real 
chemoinformatic task and win prizes.

  This year we will organize a poster session for participants.

https://fch.upol.cz/en/5add/

  Please feel free to share this information to those who can be 
interested in participation in such event. Thank you!


Kind regards,
Pavel.

--
Dr. Pavel Polishchuk
senior researcher
Institute of Molecular and Translational Medicine
Faculty of Medicine and Dentistry
Palacky University
Hněvotínská 1333/5
779 00 Olomouc
Czech Republic
+420 585632298



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 5th in silico drug design workshop in Olomouc

2019-11-20 Thread Pavel Polishchuk

Dear colleagues,

  we would like to invite you on the 5th Drug Design workshop which 
will be held 3-7 January 2020 in Olomouc (Czech Republic). It is focused 
on practical applications of different chemoinformatic tools and 
approaches for drug development. This might be interesting for bachelor, 
master and PhD students to broaden their experience and sharpen skills. 
During the workshop, students will learn pharmacophore and QSAR 
modeling, molecular docking, PDBe services. A competition will be 
organized at the last day of the workshop where participants will be 
able to apply acquired knowledge to solve a real chemoinformatic task 
and win prizes.

  This year we will organize a poster session for participants.

https://fch.upol.cz/en/5add/

  Please feel free to share this information to those who can be 
interested in participation in such event. Thank you!


Kind regards,
Pavel.

--
Dr. Pavel Polishchuk
senior researcher
Institute of Molecular and Translational Medicine
Faculty of Medicine and Dentistry
Palacky University
Hněvotínská 1333/5
779 00 Olomouc
Czech Republic
+420 585632298



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical SMILES of a fragment

2017-08-02 Thread Pavel Polishchuk

Thanks Greg!

  I found an alternative solution which is also no so straightforward. 
I set an isotope label to aromatic atoms, generate isomeric SMILES and 
make regex replacement.


  But your suggestion to set remove hydrogens is important, since this 
can cause other ambiguity.



import re

m = RWMol()

for i in range(3):
a = Atom(6)
a.SetNoImplicit(True)  # remove implicit Hs
m.AddAtom(a)
a = Atom(0)
m.AddAtom(a)

m.GetAtomWithIdx(0).SetIsAromatic(True)  # set aromatic
m.GetAtomWithIdx(0).SetIsotope(42)   # set isotope

m.GetAtomWithIdx(3).SetAtomMapNum(1)

m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

s = Chem.MolToSmiles(m, isomericSmiles=True)

re.sub('\[[0-9]+([a-z]+)H?[0-9]?\]', '\\1', s)  # remove isotope in 
output SMILES


OUTPUT: 'CC(c)[*:1]'

Pavel.




On 08/02/2017 06:24 AM, Greg Landrum wrote:

Hi Pavel,

It is, unfortunately, not that easy.
The canonicalization algorithm does not use atomic aromaticity when 
determining atom ordering, so as far as it is concerned there is no 
difference between atoms 0 and 2 in either of your examples. What does 
get used is the number of hydrogens, so you need to use that in order 
to get the results you are looking for.[1] For technical reasons, you 
also need to tell the RDKit that the atoms should not have implicit Hs 
attached. Here's a gist that works for me: 
https://gist.github.com/greglandrum/f4e2f2f2ad311560d8ab36874d503843


Two notes:
 1) I don't set the number of Hs on atom 1 in that gist, but I would 
suggest doing that too.
 2) If atoms 0 and 2 have the same number of Hs attached, this still 
is not going to work if you're building things from fragments. The 
canonicalization code was not really designed to be used in situations 
like this.


-greg
[1] The details of the canonicalization algorithm, including the 
contents of the atom invariants, are described here: 
http://dx.doi.org/10.1021/acs.jcim.5b00543



On Tue, Aug 1, 2017 at 2:53 PM, Pavel Polishchuk 
<pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote:


Hi all,

  canonicalization of fragment SMILES does not work properly.
Below there are two examples of identical fragments. The only
difference is the order of atoms (indices). However, it seems that
RDKit canonicalization does not take into account atom types.

  Does someone have an idea how to solve this issue with small losses?

#1 ===

m = RWMol()

for i in range(3):
a = Atom(6)
m.AddAtom(a)
a = Atom(0)
m.AddAtom(a)

m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
m.GetAtomWithIdx(3).SetAtomMapNum(1)


m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m)

OUTPUT: 'cC(C)[*:1]'

#2 ===

m2 = RWMol()

for i in range(3):
a = Atom(6)
m2.AddAtom(a)
a = Atom(0)
m2.AddAtom(a)

m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
m2.GetAtomWithIdx(3).SetAtomMapNum(1)


m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m2)

OUTPUT: 'CC(c)[*:1]'


Pavel.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] canonical SMILES of a fragment

2017-08-01 Thread Pavel Polishchuk

Hi all,

  canonicalization of fragment SMILES does not work properly. Below 
there are two examples of identical fragments. The only difference is 
the order of atoms (indices). However, it seems that RDKit 
canonicalization does not take into account atom types.


  Does someone have an idea how to solve this issue with small losses?

#1 ===

m = RWMol()

for i in range(3):
a = Atom(6)
m.AddAtom(a)
a = Atom(0)
m.AddAtom(a)

m.GetAtomWithIdx(0).SetIsAromatic(True)  # set atom 0 as aromatic
m.GetAtomWithIdx(3).SetAtomMapNum(1)


m.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m)

OUTPUT: 'cC(C)[*:1]'

#2 ===

m2 = RWMol()

for i in range(3):
a = Atom(6)
m2.AddAtom(a)
a = Atom(0)
m2.AddAtom(a)

m2.GetAtomWithIdx(2).SetIsAromatic(True) # set atom 2 as aromatic
m2.GetAtomWithIdx(3).SetAtomMapNum(1)


m2.AddBond(0, 1, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 2, Chem.rdchem.BondType.SINGLE)
m2.AddBond(1, 3, Chem.rdchem.BondType.SINGLE)

Chem.MolToSmiles(m2)

OUTPUT: 'CC(c)[*:1]'


Pavel.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using RDKit in PyCharm and Anaconda on Windows

2017-06-01 Thread Pavel Polishchuk
I had some issues to run rdkit from Python console in PyCharm (4.5.5) on 
Linux. After recent installation of PyCharm 2017.1.3 it started to work. 
Maybe updating PyCharm will help on Win as well.


Pavel.


On 05/30/2017 10:10 PM, West, Richard wrote:

We're having trouble getting RDKit to work in a PyCharm project using an 
Anaconda interpreter (Python 2.7), on Windows 8.1.
Has anyone had success with this and can guide us?
The trouble is we get an

   ImportError: DLL load failed: The specified module could not be found.

when trying to import rdkit (or rdBase).

We have tried many variations of the following, but here is a basic recipe of 
what does/doesn't work:
1. Make a new conda environment (called 'eg1'), install rdkit ('conda install 
-c rdkit rdkit')
2. From a cmd.exe prompt, use this environment ('activate eg1') load python 
('python') and import rdkit ('import rdkit') it works fine.
3. From PyCharm, create a Project Interpreter (pointing to 
'C:\Anaconda2\envs\eg1\python.exe'), and use this to run a script or create a new Python 
Console in which you 'import rdkit', leading to the "DLL load failed" message.
4. We have tried manually adding a bunch of things to the "Interpreter Paths" 
in PyCharm, but without success (perhaps we just didn't add the right thing).




Update: just before I hit "send" on this request for help, we stumbled across 
this posting of the same problem, and solution, from Christian Ribeaud:
https://intellij-support.jetbrains.com/hc/en-us/community/posts/115000244450-DLL-load-failed

It seems that if we open cmd.exe, activate the environment, and then launch 
PyCharm exe from there, it works.
I'm sharing this here because it took us a while to find the other post, but also to ask: 
is there a "better" way?

Cheers,
Richard


--
Richard H. West, Ph.D.
Assistant Professor, Department of Chemical Engineering,
Northeastern University, 360 Huntington Ave, Boston, MA 02115
http://northeastern.edu/comochengPhone: 617-373-5163


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RemoveHs and [H][*:1] mol

2017-05-30 Thread Pavel Polishchuk

Hello,

  probably this is a message mainly to developers.
  I discovered some strange behavior of removeHs function applied for 
'[H][*:1]' molecule.


  If I create that mol from smiles, RemoveHs does not remove a single H 
from the mol:


mmm = Chem.MolFromSmiles('[H][*:1]')
Chem.MolToSmiles(Chem.RemoveHs(mmm))

output:
'[H][*:1]'

  If I apply RemoveHs to fragments obtained after MMPA cuts it removes 
H and keeps only '[*:1]'


mmm = Chem.MolFromSmiles('c1c1C')
mmm = Chem.AddHs(mmm)
fr = rdMMPA.FragmentMol(mmm, pattern="[*]!@!=!#[!#1]", maxCuts=1, 
resultsAsMols=True, maxCutBonds=30)

for f in fr:
ff = Chem.GetMolFrags(f[1], asMols=True)
print(Chem.MolToSmiles(ff[0]), Chem.MolToSmiles(Chem.RemoveHs(ff[0])))
print(Chem.MolToSmiles(ff[1]), Chem.MolToSmiles(Chem.RemoveHs(ff[1])))

output:
[H]c1c([H])c([H])c([*:1])c([H])c1[H] c1ccc([*:1])cc1
[H]C([H])([H])[*:1] C[*:1]
[H]c1c([H])c([H])c([*:1])c(C([H])([H])[H])c1[H] Cc1c1[*:1]
[H][*:1] [*:1]
[H]c1c([H])c(C([H])([H])[H])c([H])c([*:1])c1[H] Cc1([*:1])c1
[H][*:1] [*:1]
[H]c1c([H])c([*:1])c([H])c([H])c1C([H])([H])[H] Cc1ccc([*:1])cc1
[H][*:1] [*:1]
[H]c1c([H])c([H])c(C([H])([H])[*:1])c([H])c1[H] c1ccc(C[*:1])cc1
[H][*:1] [*:1]

  If this is a bug I can create an issue on github

Pavel.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Redundant hydrogen atoms

2017-05-28 Thread Pavel Polishchuk
I do not know the valence model of RDkit. Therefore below there is an 
ugly but working solution: set to the S+ atom NoImplicit.


mol = Chem.MolFromSmiles('CC(C)(C)SC')

rxn = 
AllChem.ReactionFromSmarts('[CH0:1][S:2][CH3:3]>>[C:1][SH0+:2].[CH3-:3]')

ps = rxn.RunReactants((mol,))

print(Chem.MolToSmiles(ps[0][0]))
print(Chem.MolToSmiles(ps[0][1]))

for a in ps[0][0].GetAtoms():
if a.GetAtomicNum() == 16 and a.GetFormalCharge() == 1:
a.SetNoImplicit(True)

print(Chem.MolToSmiles(ps[0][0]))
print(Chem.MolToSmiles(ps[0][1]))

Pavel.


On 05/28/2017 08:26 AM, Nitzan Tzanani wrote:


mol = Chem.MolFromSmiles('CC(C)(C)SC')
  rxn = 
AllChem.ReactionFromSmarts('[CH0:1][S:2][CH3:3]>>[C:1][SH0+:2].[CH3-:3]')

  ps = rxn.RunReactants((mol,))

  print Chem.MolToSmiles(ps[0][0])
  print Chem.MolToSmiles(ps[0][1])


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] canonical smiles for fragments with map numbers

2017-05-27 Thread Pavel Polishchuk

Thank you, Brian!

Actually what I expected as output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
and so on

You gave me the right direction. I can store old-new maps in a dict and 
after relabeling and producing of canonical smiles it would be easy to 
relabel attachment points back.

Thank you again!

Pavel.

On 05/27/2017 03:03 PM, Brian Kelley wrote:
Pavel, this isn't exactly trivial so I went ahead and made an 
example.  The basics are that atomMaps are canonicalized, i.e. their 
value is used in the generation of smiles.


To solve this problem:
1) backup the atom maps and remove them
2) canonicalize *without* atom maps but figure out the order in which 
the atoms in the molecule are output
3) using the atom output order, relabel the atom maps based on output 
order.


That's a mouthful, but here's some code that should do the trick:

from rdkit import Chem

smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
   "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
   "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
   "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
   "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
   "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]


def CanonicalizeMaps(m, *a, **kw):
# atom maps are canonicalized, so rename them
#  figure out where they would have gone
#  and relabel from 1...N based on output order
atomMap = "molAtomMapNumber"
backupAtomMap = "oldMolAtomMapNumber"
for atom in m.GetAtoms():
if atom.HasProp(atomMap):
atomNum = atom.GetProp(atomMap)
atom.SetProp(backupAtomMap, atomNum)
atom.ClearProp(atomMap)

# canonicalize
smi = Chem.MolToSmiles(m, *a, **kw)
# where did the atoms end up in the output string?
atoms = [(pos, atom_idx) for atom_idx, pos in enumerate(
eval(m.GetProp("_smilesAtomOutputOrder")))]
atommap = 1
atoms.sort()

# set the new atommap based on output position
for pos, atom_idx in atoms:
atom = m.GetAtomWithIdx(atom_idx)
if atom.HasProp(backupAtomMap):
atom.SetProp(atomMap, str(atommap))
atommap +=1
return Chem.MolToSmiles(m)
for s in smi:
m = Chem.MolFromSmiles(s)
print CanonicalizeMaps(m,True)



Output:

S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]

Now, if you want the atomMaps in 1...2...3 output order, we could do 
that as well, but it is even trickier.


Enjoy,
 Brian

On Sat, May 27, 2017 at 8:36 AM, Pavel Polishchuk 
<pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote:


Hi,

  I cannot solve an issue and would like to ask for an advice.
  If there are different map numbers for attachment points for the
same fragment different canonical smiles are generated.
  I observed such behavior only for fragments with 3 attachment
points. Below is an example.
  I'm looking for a solution/workaround how to produce the "same"
smiles strings irrespectively of mapping that after removal of map
numbers smiles will become identical.
  Any advice would be appreciated.

smi = ["ClC1=C([*:1])C(=S)C([*:2])=C([*:3])N1",
   "ClC1=C([*:1])C(=S)C([*:3])=C([*:2])N1",
   "ClC1=C([*:2])C(=S)C([*:1])=C([*:3])N1",
   "ClC1=C([*:2])C(=S)C([*:3])=C([*:1])N1",
   "ClC1=C([*:3])C(=S)C([*:1])=C([*:2])N1",
   "ClC1=C([*:3])C(=S)C([*:2])=C([*:1])N1"]

for s in smi:
print(Chem.MolToSmiles(Chem.MolFromSmiles(s)))

output:
S=c1c([*:1])c(Cl)[nH]c([*:3])c1[*:2]
S=c1c([*:1])c(Cl)[nH]c([*:2])c1[*:3]
S=c1c([*:1])c([*:3])[nH]c(Cl)c1[*:2]
S=c1c([*:2])c(Cl)[nH]c([*:1])c1[*:3]
S=c1c([*:1])c([*:2])[nH]c(Cl)c1[*:3]
S=c1c([*:2])c([*:1])[nH]c(Cl)c1[*:3]

Kind regards,
Pavel.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
<https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] custom fingerprints in PostgreSQL

2017-04-26 Thread Pavel Polishchuk
Hello,

   is it possible to store custom fingerprints in psql DB and use them 
for similarity search? And how to do this?
   I foundtwo commands bfp_to_binary_text(bfp) and 
bfp_from_binary_text(bytea)in RDKit cartridge but cannot understand how 
to use them.
   I want to store pharmacophore fingerprints. There is no a built-in 
command in RDKit cartridge to calculate them so I have to calculate them 
in a Pythonscript. Then I need to store them in psql DB and create 
similarity search index but I could not find a solution yet.
   It might be of general interest how to store and use arbitrary 
fingerprints in DB.

An example of pharmacophore FP generation:

from rdkit import Chem
from rdkit import RDConfig
from rdkit.Chem.Pharm2D.SigFactory import SigFactory
from rdkit.Chem.Pharm2D import Generate

import os

fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
factory = ChemicalFeatures.BuildFeatureFactory(fdefName)
sigFactory = SigFactory(factory, minPointCount=2, maxPointCount=3, 
trianglePruneBins=False)
sigFactory.SetBins([(0,2),(2,5),(5,8)])
sigFactory.Init()

mol = Chem.MolFromSmiles('Cc1nc(CN(C)c2ncnc3ccc(-c4ccc5c(c4)OCO5)cc23)cs1')
fp = Generate.Gen2DFingerprint(mol, sigFactory)

Pavel.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Pharm2D.SigFactory and psql issues

2017-04-24 Thread Pavel Polishchuk
Hello,

   I found two issues which are reproducible in 2016.09 and 2017.3 conda 
builds.

   0. The first issue is related to PostgreSQL. If I try to install it 
from conda it asks me to downgrade rdkit to 2016.3. After downgrading of 
rdkit psql works well.

   1. Generation of 2D pharmacophore fingerprints using default 
parameters returns errors for some molecules. However if I add 
trianglePruneBins=False option to initialization procedure of a 
signature factory it starts working. It seems that Gen2DFingerprint 
ignores trianglePruneBins parameter. Below is a code example:

from rdkit import Chem
from rdkit import RDConfig
from rdkit.Chem.Pharm2D.SigFactory import SigFactory
from rdkit.Chem.Pharm2D import Generate

import os

fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
factory = ChemicalFeatures.BuildFeatureFactory(fdefName)
sigFactory = SigFactory(factory, minPointCount=2, maxPointCount=3, 
trianglePruneBins=True)   # replace to False and the error will 
disappear(True is a default value)
sigFactory.SetBins([(0,2),(2,5),(5,8)])
sigFactory.Init()

mol = Chem.MolFromSmiles('Cc1nc(CN(C)c2ncnc3ccc(-c4ccc5c(c4)OCO5)cc23)cs1')
fp = Generate.Gen2DFingerprint(mol, sigFactory)

---
ValueErrorTraceback (most recent call last)
/home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/SigFactory.py
 
in GetBitIdx(self, featIndices, dists, sortIndices)
 249 print('\tbins:', repr(self._bins), type(self._bins))
--> 250   bin_ = self._findBinIdx(dists, self._bins, 
self._scaffolds[len(dists)])
 251 except ValueError:

/home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/SigFactory.py
 
in _findBinIdx(self, dists, bins, scaffolds)
 169   whichBins[i] = where
--> 170 res = scaffolds.index(tuple(whichBins))
 171 if _verbose:

ValueError: (2, 0, 0) is not in list

During handling of the above exception, another exception occurred:

IndexErrorTraceback (most recent call last)
 in ()
   1 # from rdkit.Chem.Pharm2D import Generate
   2 mol = 
Chem.MolFromSmiles('Cc1nc(CN(C)c2ncnc3ccc(-c4ccc5c(c4)OCO5)cc23)cs1')
> 3 fp = Generate.Gen2DFingerprint(mol, sigFactory)

/home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/Generate.py
 
in Gen2DFingerprint(mol, sigFactory, perms, dMat, bitInfo)
 160 for match in matchesToMap:
 161   if sigFactory.shortestPathsOnly:
--> 162 idx = _ShortestPathsMatch(match, perm, sig, dMat, 
sigFactory)
 163 if idx is not None and bitInfo is not None:
 164   l = bitInfo.get(idx, [])

/home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/Generate.py
 
in _ShortestPathsMatch(match, featureSet, sig, dMat, sigFactory)
  71 dist[i] = d
  72
---> 73   idx = sigFactory.GetBitIdx(featureSet, dist, sortIndices=False)
  74   if _verbose:
  75 print('\t', dist, minD, maxD, idx)

/home/pavlo/anaconda3/envs/rdk_2017_3/lib/python3.6/site-packages/rdkit/Chem/Pharm2D/SigFactory.py
 
in GetBitIdx(self, featIndices, dists, sortIndices)
 253   fams = [fams[x] for x in featIndices]
 254   raise IndexError('distance bin not found: feats: %s; 
dists=%s; bins=%s; scaffolds: %s' %
--> 255(fams, dists, self._bins, self._scaffolds))
 256
 257 return startIdx + offset + bin_

IndexError: distance bin not found: feats: ['Acceptor', 'Aromatic', 
'Aromatic']; dists=[5, 1, 1]; bins=[(0, 2), (2, 5), (5, 8)]; scaffolds: 
[0, [(0,), (1,), (2,)], 0, [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), 
(0, 1, 2), (0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 
0), (1, 1, 1), (1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 1), 
(2, 0, 2), (2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 
2)], 0]


Kind regards,
Pavel.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] DeleteSubstructs vs ReplaceSubstructs

2017-03-31 Thread Pavel Polishchuk

Hi Maxim,

  if you change your query to SMARTS it would be possible to delete 
what you want


  m=Chem.MolFromSmiles('C1(C2=NC=CC=C2)=CC=CC(C)=C1')
  ss = Chem.MolFromSmarts('c1c1C')
  frag = AllChem.DeleteSubstructs(m, ss)
  print(Chem.MolToSmiles(frag))

Pavel.


On 03/31/2017 07:41 AM, Popov, Maxim (Ext) wrote:


Dear RDKit users,

I am trying to remove a common substructure from a number of molecules 
(with AllChem.DeleteSubstructs). My problem is best illustrated by 
this short code:


fromrdkit importChem

fromrdkit.Chem importAllChem

m=Chem.MolFromSmiles(/'C1(C2=NC=CC=C2)=CC=CC(C)=C1'/)

ss = Chem.MolFromSmiles(/'C1=CC=CC(C)=C1'/)

hyd=Chem.MolFromSmiles(/'[H]'/)

print(/"Substituting substructure with hydrogen"/)

frags = AllChem.ReplaceSubstructs(m, ss,hyd)

forfrag infrags:

print(Chem.MolToSmiles(frag))

print(/"\nDeleting substructure"/)

frag = AllChem.DeleteSubstructs(m, ss)

print(Chem.MolToSmiles(frag))

I create a toluene connected to pyridine and try to remove toluene.

When replacing toluene substructure with hydrogen 
(AllChem.ReplaceSubstructs), I receive two sets of results: pyridine 
(with explicit hydrogen) and single carbon plus single hydrogen plus 
aromatic open chain (what is left from pyridine after removing one 
ring atom).


When deleting the toluene substructure (AllChem.DeleteSubstructs), I 
receive just the open chain of ex-pyridine (corresponding to second 
set of the ReplaceSubstructs results).


Is there a way of directing DeleteSubstructs method to a specific 
variant (in this case, leaving pyridine as a ring seems to be logical).


Best regards,

Maxim



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] CanonicalRankAtoms

2017-03-28 Thread Pavel Polishchuk
Hi,

   I experimented a little bit with CanonicalRankAtoms and observed some 
unexpected results.

   I have two mols (actually sets of fragments):
C[*].n[*].C[*].N[*]
CC[*].CC[*].cn([*])c.CN([*])C

   In the first case, pairs of carbons and nitrogens are recognized as 
symmetrical
   [0, 2, 0, 2]
   However, nitrogens are aliphatic and aromatic and I expected they 
will be different.

   In the second case, nitrogen already different, while carbons are 
still identical. That is expected.
[0, 0, 3, 2]

   Is it a bug or a feature?
   Is there a way to solve this issue by RDKit machinery and how?


   Below is a reproducible example:

b1 = 
b'\xef\xbe\xad\xde\x00\x00\x00\x00\x07\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x04\x00\x00\x00\x80\x01\x06\x00`\x00\x00\x00\x01\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x07@h\x00\x00\x00\x03\x01\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06\x00`\x00\x00\x00\x01\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x07\x00`\x00\x00\x00\x01\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x0b\x00\x01\x00\x02\x03\x00\x04\x05\x00\x06\x07\x00\x14\x00\x17\x00\x00\x00\x00\x16'
q1 = Chem.Mol(b1)

b2 = 
b'\xef\xbe\xad\xde\x00\x00\x00\x00\x07\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x0e\x00\x00\x00\n\x00\x00\x00\x80\x01\x06\x00`\x00\x00\x00\x01\x03\x06\x00`\x00\x00\x00\x02\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06\x00`\x00\x00\x00\x01\x03\x06\x00`\x00\x00\x00\x02\x02\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06@h\x00\x00\x00\x03\x02\x02\x07@(\x00\x00\x00\x03\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06\x00`\x00\x00\x00\x01\x03\x07\x00
 
\x00\x00\x00\x03\x00$(\x00\x00\x00\x00\x01,\x01\x00\x00\x00*\x06@h\x00\x00\x00\x03\x02\x02\x06\x00`\x00\x00\x00\x01\x03\x0b\x00\x01\x00\x01\x02\x00\x03\x04\x00\x04\x05\x00\x06\x07h\x0c\x07\x08\x00\t\n\x00\n\x0b\x00\x07\x0ch\x0c\n\r\x00\x14\x00\x17\x00\x00\x00\x00\x16'
q2 = Chem.Mol(b2)

r1 = [r for a, r in zip(q1.GetAtoms(), Chem.CanonicalRankAtoms(q1, 
False, False, False)) if a.GetSymbol() == "*"]
r2 = [r for a, r in zip(q2.GetAtoms(), Chem.CanonicalRankAtoms(q2, 
False, False, False)) if a.GetSymbol() == "*"]

print(Chem.MolToSmiles(q1, canonical=False))  # to preserve the order of 
atoms and ranks
print(r1)
print(Chem.MolToSmiles(q2, canonical=False))
print(r2)

the output:

C[*].n[*].C[*].N[*]
[0, 2, 0, 2]

CC[*].CC[*].cn([*])c.CN([*])C
[0, 0, 3, 2]


Kind regards,
Pavel.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] FindAtomEnvironmentOfRadiusN

2017-03-27 Thread Pavel Polishchuk

I guessed why it occurred. I'm interested is it expected behavior?
So, if a radius is greater than the number of available bonds in all 
directions from the rooted atom the function will return empty list as 
it considers that such environment does not exist. Is this a correct 
expectation?


Pavel.

On 03/27/2017 03:53 PM, Peter Gedeck wrote:

Hello,

The atom numbers start with 0. From the middle atom, there are no 
environments with radius 2. You will get a result if you use the first 
(=0) or the last (=2) atom. Try this:


m = Chem.MolFromSmiles("NCO")
i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 0)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))
i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 0)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))
i = Chem.FindAtomEnvironmentOfRadiusN(m, 3, 0)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))

and you will get:

'CN'
'NCO'
''

Is this more intuitive to you?

Best,

Peter


On Mon, Mar 27, 2017 at 9:35 AM Pavel Polishchuk 
<pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote:


Dear RDKitters,

   I found the issue with FindAtomEnvironmentOfRadiusN but this
can be a
feature. However, I did not findthis information in help and did not
expect such behavior.
   If I apply FindAtomEnvironmentOfRadiusN function to a small
molecule
and specify the radius greater than the size of the molecule the
function returns empty list of bond indices (and empty mol).

m = Chem.MolFromSmiles("NCO")

i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 1)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))

returns "NCO"

i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 1)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))

returns ""

   In the latter case I expected the same output "NCO". Were my
expectations mistaken?

Kind regards,
Pavel.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] FindAtomEnvironmentOfRadiusN

2017-03-27 Thread Pavel Polishchuk
Dear RDKitters,

   I found the issue with FindAtomEnvironmentOfRadiusN but this can be a 
feature. However, I did not findthis information in help and did not 
expect such behavior.
   If I apply FindAtomEnvironmentOfRadiusN function to a small molecule 
and specify the radius greater than the size of the molecule the 
function returns empty list of bond indices (and empty mol).

m = Chem.MolFromSmiles("NCO")

i = Chem.FindAtomEnvironmentOfRadiusN(m, 1, 1)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))

returns "NCO"

i = Chem.FindAtomEnvironmentOfRadiusN(m, 2, 1)
Chem.MolToSmiles(Chem.PathToSubmol(m, i))

returns ""

   In the latter case I expected the same output "NCO". Were my 
expectations mistaken?

Kind regards,
Pavel.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] delete a substructure

2017-03-08 Thread Pavel Polishchuk
You might find this link useful - 
http://www.rdkit.org/docs/GettingStartedInPython.html#chemical-transformations


However, the issue in your case is SMARTS definitions. If one SMARTS 
completely covers another one it would be difficult to understand is it 
artifact or not.I think it might be reasonable to revise SMARTS to avoid 
such overlapping or create a list of rules (maybe hierarchical) which 
will define valid and not valid overlappings.


Pavel.


On 03/08/2017 06:32 PM, Chenyang Shi wrote:

Dear Hongbin,

I tried your method on a molecule, 4-Methylsalicylic acid 
(CC1=CC(=C(C=C1)C(=O)O)O). I looped through all groups defined in 
Joback method (using SMARTS), and used m.GetSubstructMatches to print 
out all atom positions. The result is summarized in the table.


We can see there are duplicated counts--coming from COOH group. As 
suggested by Hongbin, we can remove duplicated atoms by looking at 
their positions--in this case, ((9),), ((7,8,),), ((7,),), and ((8,),) 
are subsets of ((7,8,9)) from -COOH. Indeed we can get rid of these 
duplicates. However, I also noticed that Atom (3,) from =C< (ring) 
group is also a part of -OH (phenol) ((10,3),). If we apply the same 
algorithm to remove duplicates, the =C<(ring) group will be only 
counted twice instead of three times.


Greg, you mentioned as an alternative I can delete substructure using 
chemical reaction method. It would be greatly appreciated if you could 
show me (point me to) a simple example code, perhaps on a simple 
molecule? I find myself at a loss when browsing the manual. I would 
like to try also in that direction.


Thanks,
Chenyang


Inline image 1


On Mon, Mar 6, 2017 at 1:52 AM, Greg Landrum > wrote:


The solution that Hongbin proposes to the double-counting problem
is a good one. Just be sure to sort your substructure queries in
the right order so that the more complex ones come first.

Another thing you might think about is making your queries more
specific. For example, as you pointed out "[OH]" is very general
and matches parts of carboxylic acids and a number of other
functional groups. The RDKit has a set of fairly well tested
(though certainly not perfect) functional group definitions in
$RDBASE/Data/Functional_Group_Hierarchy.txt. The alcohol
definition from there looks like this:
[O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]


-greg


On Mon, Mar 6, 2017 at 7:20 AM, 杨弘宾 > wrote:

Hi, Chenyang,
You don't need to delete the substructure from the molecule.
Just check whehter the mapped atoms have been matched. For
example:

m = Chem.MolFromSmiles('CC(=O)O')
OH = Chem.MolFromSmarts('[OH]')
COOH = Chem.MolFromSmarts('C(O)=O')

m.GetSubstructMatches(OH)
>>((3,),)
m.GetSubstructMatchs(COOH)
>>((1, 3, 2),)

Since atom "3" has been already matched, it should be ignored.
So you can create a "set" to record the matched atoms to avoid
repetitive count.


Hongbin Yang 杨弘宾

*From:* Chenyang Shi 
*Date:* 2017-03-06 14:04
*To:* Greg Landrum 
*CC:* RDKit Discuss

*Subject:* Re: [Rdkit-discuss] delete a substructure
Hi Greg,

Thanks for a prompt reply. I did try
"GetSubstructMatches()" and it returns correct numbers of
substructures for CH3COOH. The potential problem with this
approach is that if the molecule is getting complicated,
it will possibly generate duplicate numbers for certain
functional groups. For example, --OH (alcohol) group will
be likely also counted in --COOH. A safer way, in my mind,
is to remove the substructure that has been counted.

Greg, you mentioned "chemical reaction functionality", can
you show me a demo script with that using CH3COOH as an
example. I will definitely delve into the manual to learn
more. But reading your code will be a good start.

Thanks,
Chenyang


On Sun, Mar 5, 2017 at 10:15 PM, Greg Landrum
>
wrote:

Hi Chenyang,

If you're really interested in counting the number of
times the substructure appears, you can do that much
quicker with `GetSubstructMatches()`:

In [2]: m = Chem.MolFromSmiles('CC(C)CCO')
In [3]:
len(m.GetSubstructMatches(Chem.MolFromSmarts('[CH3;X4]')))
Out[3]: 2

   

Re: [Rdkit-discuss] Question about generating configurational isomerism

2017-01-27 Thread Pavel Polishchuk

Hi Jacob,

  you need to call AssignStereochemistry with force=True parameter
  Chem.AssignStereochemistry(mol, force=True)

Pavel.


On 01/28/2017 05:43 AM, Jacob Durrant wrote:
I'm trying to set the configuration of a molecule with a double bond, 
but it doesn't seem to be working. Here's my code:


===
from rdkit.Chem import AllChem
from rdkit import Chem
from rdkit.Chem.rdchem import BondStereo

# Make a molecule with a double bond, no stereo specified (cis or trans)
smi = "NC(O)=C(C)S"
mol = Chem.MolFromSmiles(smi)

# Get that double bond
double_bonds = [b.GetIdx() for b in mol.GetBonds() if 
b.GetBondTypeAsDouble() == 2 and b.GetStereo() is BondStereo.STEREONONE]

double_bond_index = double_bonds[0]
double_bond = mol.GetBondWithIdx(double_bond_index)

# It's stereo is STEREONONE at this point
print double_bond.GetStereo()

# Get the bonds connected to the first and second atoms
first_atom = double_bond.GetBeginAtom()
bonds_connected_to_first_atom = first_atom.GetBonds()

second_atom = double_bond.GetEndAtom()
bonds_connected_to_second_atom = second_atom.GetBonds()

# Get the indecies of those bonds, removing the central double bond.
bonds_connected_to_first_atom_indecies = [b.GetIdx() for b in 
first_atom.GetBonds()]

bonds_connected_to_first_atom_indecies.remove(double_bond_index)

bonds_connected_to_second_atom_indecies = [b.GetIdx() for b in 
second_atom.GetBonds()]

bonds_connected_to_second_atom_indecies.remove(double_bond_index)

# Set the directions of the for bonds connected to this double bond.
mol.GetBondWithIdx(bonds_connected_to_first_atom_indecies[0]).SetBondDir(Chem.BondDir.ENDUPRIGHT)
mol.GetBondWithIdx(bonds_connected_to_first_atom_indecies[1]).SetBondDir(Chem.BondDir.ENDDOWNRIGHT)

mol.GetBondWithIdx(bonds_connected_to_second_atom_indecies[0]).SetBondDir(Chem.BondDir.ENDUPRIGHT)
mol.GetBondWithIdx(bonds_connected_to_second_atom_indecies[1]).SetBondDir(Chem.BondDir.ENDDOWNRIGHT)

# I assume this should set the stereochemistry. But it doesn't appear to.
# It's stereo is STEREONONE at this point
print double_bond.GetStereo()
Chem.AssignStereochemistry(mol)
# Still STEREONONE
print double_bond.GetStereo()

# And when I print out the smiles, no stereochemistry is given.
print Chem.MolToSmiles(mol, isomericSmiles=True, canonical=True)
# CC(S)=C(N)O
===

Any suggestions? Thanks!

~Jacob
--

Sent from my mobile.



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] multiline legend in MolsToGridImage

2016-12-20 Thread Pavel Polishchuk

Thank you, that helped!
I noticed that I need to substantially increase the height of the image 
to make the second line visible. If I'll want to draw more lined I 
should increase the height further.


Pavel.

On 12/20/2016 03:14 PM, Peter Gedeck wrote:

Hello,

I thought we had removed all of these by now. I'll open an issue and 
fix the code.


Best,

Peter


On Tue, Dec 20, 2016 at 6:01 AM David Hall <li...@cowsandmilk.net 
<mailto:li...@cowsandmilk.net>> wrote:


The replace and split methods were removed from the string module
in python3. You can replace the code as follows:

s = s.replace('\r\n', '\n')
s = s.replace('\n\r', '\n')
s = s.replace('\r', '\n')
lines = s.split('\n')


On Tue, Dec 20, 2016 at 2:45 AM, Pavel Polishchuk
<pavel_polishc...@ukr.net <mailto:pavel_polishc...@ukr.net>> wrote:

Hi,

I try to print multiline legends for molecules in a grid and
use the following code from notebook

  from rdkit import Chem
  from rdkit.Chem import Draw
  # from rdkit.Chem.Draw import IPythonConsole

  mols = [Chem.MolFromSmiles(s) for s in ["CCC", "C"]]
  legends = ["1\nCCC", "2\nC"]
  Draw.MolsToGridImage(mols, molsPerRow=5,
subImgSize=(200,200), legends=legends)

  it returns error: AttributeError: module 'string' has no
attribute 'replace'(full stack is below)

  If I uncomment IPythonConsole import then it returns the
picture with labels, but with spaces instead of new line symbols.

  What can I do in this situation? RDKit 2016.03.2

Kind regards,
Pavel.



---
AttributeError Traceback (most recent call 
last)
  in() 1 mols = [Chem.MolFromSmiles(s) for 
s in ["CCC",
"C"]]2 legends = ["1\nCCC", "2\nC"]> 3p =
Draw.MolsToGridImage(mols, molsPerRow=5, subImgSize=(200,200),

legends=legends)/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py
in MolsToGridImage(mols, molsPerRow, subImgSize, legends,
highlightAtomLists, useSVG, **kwargs) 387 else:388 return
_MolsToGridImage(mols,molsPerRow=molsPerRow,subImgSize=subImgSize,
--> 389legends=legends, highlightAtomLists=highlightAtomLists,
**kwargs) 390 391 def ReactionToImage(rxn,

subImgSize=(200,200),**kwargs):/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py
in _MolsToGridImage(mols, molsPerRow, subImgSize, legends,
highlightAtomLists, **kwargs) 334
highlights=highlightAtomLists[i]335 if mol is not None:-->
336img =
_moltoimg(mol,subImgSize,highlights,legends[i],**kwargs)337
res.paste(img,(col*subImgSize[0],row*subImgSize[1]))338 return

res/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py
in _moltoimg(mol, sz, highlights, legend, **kwargs) 302 if not
hasattr(rdMolDraw2D,'MolDraw2DCairo'):303 img =
MolToImage(mol,sz,legend=legend,highlightAtoms=highlights, -->
304**kwargs) 305 else:306 nmol =

rdMolDraw2D.PrepareMolForDrawing(mol,kekulize=kwargs.get('kekulize',True))/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/__init__.py
in MolToImage(mol, size, kekulize, wedgeBonds, fitImage,
options, canvas, **kwargs) 132 #
color=(0,0,1),fill=False,stroke=True)133
font=Font(face='sans',size=12)-->
134canvas.addCanvasText(legend,pos,font)135 136 if

kwargs.get('returnCanvas',False):/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/Chem/Draw/spingCanvas.py
in addCanvasText(self, text, pos, font, color, **kwargs) 72
labelP = pos[0]-txtWidth/2+offset,pos[1]+txtHeight/273 color =
convertColor(color)--->
74self.canvas.drawString(text,labelP[0],labelP[1],font,color=color)75
return (bw,bh,offset)76

/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/sping/PIL/pidPIL.py
in drawString(self, s, x, y, font, color, angle, **kwargs) 313
if '\n' in s or '\r' in s:314 self.drawMultiLineString(s, x,y,
font, color, angle, --> 315**kwargs) 316 return317 if not
font: font =

self.defaultFont/home/pavel/anaconda3/envs/my-rdkit-env/lib/python3.5/site-packages/rdkit/sping/pid.py
in drawMultiLineString(self, s, x, y, font, color, angle,
**kwargs) 393 dy = h * math.cos(angle*math.pi/180.0)394 dx = h
* math.sin(angle*math.pi/180.0)--> 395s = string.replace(s,
'\r\n', '\n')396 s = string.replac

Re: [Rdkit-discuss] Generating all stereochem possibilities from smile

2016-12-09 Thread Pavel Polishchuk
I just want to share my script, which I use for enumeration of 
stereoisomers. Enumeration of double bonds was added quite recently and 
thus I didn't test it extensively.

I put it on github: https://github.com/DrrDom/rdk
It seems to work well on quite complex queries like 
CC1CCC(CC1)C1CC=C(C\C(=C/I)C(=CF)C1C1C[C@@H](C)CC=C1)C1CC[C@H](O)C(Br)C1


to generate all possible stereoisomers:
gen_stereo_rdkit3.py -i input.smi -o output.smi -t -d

to discard compounds with more than 4 undefined centers/double bonds:
gen_stereo_rdkit3.py -i input.smi -o output.smi -t -d -u 4

Try it on your own risk :)
If you will find mistakes let me know.

Pavel.


On 12/10/2016 03:44 AM, Brian Cole wrote:
So I may need a little guidance on this one from someone with a little 
more historical knowledge of RDKit.


I found the findPotentialStereoBonds function inside Chirality.cpp 
that appears to do what we want, detect double bonds with unspecified 
E/Z stereo chemistry. That's at least what the comment in the file 
leads me to believe:


// Find bonds than can be cis/trans in a molecule and mark them as "any"
...
void findPotentialStereoBonds(ROMol , bool cleanIt) {


But when you look at the code, it's not setting bonds to 
Bond::STEREOANY, it's setting them to Bond::STEREONONE here:


...
// proceed only if we either want to clean the stereocode on 
this bond

// or if none is set on it yet
if (cleanIt || dblBond->getStereo() == Bond::STEREONONE) {
  dblBond->setStereo(Bond::STEREONONE);
...

Seems odd to me check that the stereo is Bond::STEREONONE, and then 
set it to that value.


While findPotentialStereoBonds isn't exposed in Python, there is some 
Java docs written for it in the Java wrappers:
%javamethodmodifiers RDKit::MolOps::findPotentialStereoBonds( 
  ROMol & mol,boolcleanIt = false   ) 
  "

/**

finds bonds that could be cis/trans in a molecule and mark them as 
Bond::STEREONONE




So that documentation is actually correct. It does find bonds that 
could be cis/trans and mark them as Bond::STEREONONE. That just isn't 
very useful since all bonds are marked Bond::STEREONONE by default.


The only direct test case of findPotentialStereoBonds I can find is 
the following:


MolOps::findPotentialStereoBonds(*m, false);
TEST_ASSERT(m->getNumAtoms() == 15);
TEST_ASSERT(m->getBondWithIdx(1)->getBondType() == Bond::DOUBLE);
TEST_ASSERT(m->getBondWithIdx(1)->getStereoAtoms().size() == 2);
TEST_ASSERT(m->getBondWithIdx(3)->getBondType() == Bond::DOUBLE);
TEST_ASSERT(m->getBondWithIdx(3)->getStereoAtoms().size() == 2);

That doesn't even test the getStereo() return value. Though it does 
test getStereoAtoms() return value. So it looks like the proper thing 
to do to detect unspecified bond stereo is to check for a non-empty 
getStereoAtoms() vector?


So ideally, I would like to write an bond stereo enumerator that tests 
if a double bond has unspecified stereo (testing the size of 
getStereoAtoms() would work) and then sets the stereo, that is:


bond.SetStereo(Bond::STEREOE);
or
bond.SetStereo(Bond::STEREOZ);

And then hopefully the Chem.MolToSmiles will "do the right thing". 
However, this commented out code in the wrapper has given me pause:


// this is no longer exposed because it requires that stereo atoms
// be set. This is a task that is tricky and "dangerous".
//.def("SetStereo",::setStereo,
// "Set the CIP-classification of the bond as a 
BondStereo\n")


So apparently I shouldn't expose an "easy" way to do this. What is the 
trickiness and dangerousness of this API? And could we make an easy 
way to enumerate bond stereo?


Thanks!



On Fri, Dec 9, 2016 at 5:44 PM, Brian Cole > wrote:


This has me quite curious now, how do we detect unspecified bond
stereo chemistry in RDKit?

m = Chem.MolFromSmiles("FC=CF")
assert m.HasProp("_StereochemDone")
for bond in m.GetBonds():
print(bond.GetBondDir(), bond.GetStereo())

Yields:

(rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE)
(rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE)
(rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE)

Trying to force the issue:

Chem.AssignStereochemistry(m, cleanIt=True, force=True,
flagPossibleStereoCenters=True)

assert m.HasProp("_StereochemDone") for bond in m.GetBonds():
print(bond.GetBondDir(), bond.GetStereo())

Still yields:

(rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE)
(rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE)
(rdkit.Chem.rdchem.BondDir.NONE, rdkit.Chem.rdchem.BondStereo.STEREONONE)

I can see the setStereo(Bond::STEREOANY) in the code here:

if ((*bondIt)->getBondDir() == Bond::EITHERDOUBLE) {