[Rdkit-discuss] Try to reproduce a code working in January

2019-11-19 Thread Guillaume GODIN
Dear community,

I try to reproduce this code

https://iwatobipen.wordpress.com/2019/01/18/generate-possible-molecules-from-a-dataset-chemoinformatics-rdkit/

but got an error un panda / rdkit during generation:

frame = frame[["ROMol", "Smiles", "Core", "R1", "R2", "R3"]]
frame['Core']=frame['Core'].apply(Chem.RemoveHs)
frame.head(2)



RDKit ERROR: [05:02:02]
RDKit ERROR:
RDKit ERROR: 
RDKit ERROR: Pre-condition Violation
RDKit ERROR: getExplicitValence() called without call to calcExplicitValence()
RDKit ERROR: Violation occurred on line 161 in file 
/opt/conda/conda-bld/rdkit_1561471048963/work/Code/GraphMol/Atom.cpp
RDKit ERROR: Failed Expression: d_explicitValence > -1
RDKit ERROR: 
RDKit ERROR:
RDKit ERROR: [05:05:04] Explicit valence for atom # 6 N, 5, is greater than 
permitted

---
ValueErrorTraceback (most recent call last)
 in 
  1 frame = frame[["ROMol", "Smiles", "Core", "R1", "R2", "R3"]]
> 2 frame['Core']=frame['Core'].apply(Chem.RemoveHs)
  3 frame.head(2)

~/miniconda/envs/py37/lib/python3.7/site-packages/pandas/core/series.py in 
apply(self, func, convert_dtype, args, **kwds)
   3589 else:
   3590 values = self.astype(object).values
-> 3591 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3592
   3593 if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

ValueError: Sanitization error: Explicit valence for atom # 6 N, 5, is greater 
than permitted



Any idea why ?

BR

Guillaume



***
DISCLAIMER  
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.  
***
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Hydrogens involved in "stereochemistry" are not removed by RemoveHs()

2019-11-19 Thread Greg Landrum
Hi Ivan,

I agree that there is a bug here, but I think the problem is actually that
the double bond is being assigned stereochemistry at all in this case.

In [2]: m = Chem.MolFromSmiles('[H]/C=C/F')



In [3]: m.Debug()


Atoms:
0 1 H chg: 0  deg: 1 exp: 1 imp: 0 hyb: 1 arom?: 0 chi: 0
1 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
2 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
3 9 F chg: 0  deg: 1 exp: 1 imp: 0 hyb: 4 arom?: 0 chi: 0
Bonds:
0 0->1 order: 1 dir: 4 conj?: 0 aromatic?: 0
1 1->2 order: 2 stereo: 3 stereoAts: (0 3) conj?: 0 aromatic?: 0
2 2->3 order: 1 dir: 4 conj?: 0 aromatic?: 0


Given that the two substituents on the first C are the same, the double
bond shouldn't be marked as STEREOE at all.

I'll get this fixed.
-greg



On Wed, Nov 6, 2019 at 4:34 PM Ivan Tubert-Brohman <
ivan.tubert-broh...@schrodinger.com> wrote:

> Hi,
>
> For reasons to complicated to get into here, I ended up with a molecule
> containing a =CH2 in which one of the hydrogens was explicit and had E/Z
> stereo info. For example, consider [H]/C=C/F.
>
> I was surprised that RemoveHs() refused to remove the hydrogen, although
> later I found that that's the documented behavior, and generally it makes
> sense as a way to prevent the loss of stereochemical information.
>
> For example, compare these two:
>
> In [7]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]/C=C/F')))
> Out[7]: '[H]/C=C/F'
>
> In [8]: Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles('[H]C=C/F')))
> Out[8]: 'C=CF'
>
> A chemist would say that these two are obviously the same molecule, and
> arguably the second representation is better, because a double bond ending
> in =CH2 can't have geometric isomers. Maybe it's unreasonable to expect
> RDKit to make that kind of inference, but still I wonder, what would be a
> good automated way to get from [H]/C=C/F to C=CF?
>
> One idea is to add a "=CH2 cleanup" step, perhaps implemented by applying
> this reaction:
>
> [H][C:1]=[*:2]>>[CH2:1]=[*:2]
>
> but perhaps there's a better way?
>
> Best,
> Ivan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The "confID" for "MMFFOptimizeMoleculeConfs"

2019-11-19 Thread topgunhaides .
Hi Paolo,

Thanks for helping me! Appreciate it.

Best,
Leon


On Tue, Nov 19, 2019 at 5:01 PM Paolo Tosco 
wrote:

> Hi Leon,
>
> you are right, that's a documentation bug: The confId parameter is
> actually ignored, as you have already found out.
>
> Thanks for reporting this, cheers
> p.
>
> On 19/11/2019 20:56, topgunhaides . wrote:
>
> Hi guys,
>
> Does the "confID" argument actually work for "MMFFOptimizeMoleculeConfs"?
> Try the following code:
>
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
> mh = Chem.AddHs(Chem.MolFromSmiles('OCCCN'))
> cids = AllChem.EmbedMultipleConfs(mh, numConfs=3, maxAttempts=1000,
>   pruneRmsThresh=0.5, numThreads=0,
> randomSeed=-1)
>
> # try to optimize one conformer at a time in the loop:
> for cid in cids:
> mmffopt_1 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=cid,
> maxIters=1000,
>   mmffVariant='MMFF94s',
> numThreads=0)
> print(mmffopt_1)
>
> # just optimize one specific conformer (ID = 0):
> mmffopt_2 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=0, maxIters=1000,
>   mmffVariant='MMFF94s',
> numThreads=0)
> print(mmffopt_2)
>
> # Or optimize all conformers:
> mmffopt_3 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=-1, maxIters=1000,
>   mmffVariant='MMFF94s',
> numThreads=0)
> print(mmffopt_3)
>
>
> In the document for MMFFOptimizeMoleculeConfs: "confId : indicates which
> conformer to optimize". However, in all three cases, it still optimize
> all conformers and give me the "whole" thing:
>
> [(0, 1.0966514172064503), (0, -1.5120724826923375), (0,
> 0.6847373779429624)]
> [(0, 1.0966514171119535), (0, -1.512072483200475), (0, 0.6847373779078172)]
> [(0, 1.0966514168939838), (0, -1.5120724834832924), (0,
> 0.6847373779001575)]
> [(0, 1.0966514168498929), (0, -1.512072483655178), (0, 0.6847371291858746)]
> [(0, 1.096651416829605), (0, -1.5120724837465005), (0, 0.6847371291858746)]
>
> Thank you.
>
> Best,
> Leon
>
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The "confID" for "MMFFOptimizeMoleculeConfs"

2019-11-19 Thread Paolo Tosco

Hi Leon,

you are right, that's a documentation bug: The confId parameter is 
actually ignored, as you have already found out.


Thanks for reporting this, cheers
p.

On 19/11/2019 20:56, topgunhaides . wrote:


Hi guys,

Does the "confID" argument actually work for 
"MMFFOptimizeMoleculeConfs"? Try the following code:



from rdkit import Chem
from rdkit.Chem import AllChem

mh = Chem.AddHs(Chem.MolFromSmiles('OCCCN'))
cids = AllChem.EmbedMultipleConfs(mh, numConfs=3, maxAttempts=1000,
                                  pruneRmsThresh=0.5, numThreads=0, 
randomSeed=-1)


# try to optimize one conformer at a time in the loop:
for cid in cids:
    mmffopt_1 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=cid, 
maxIters=1000,

mmffVariant='MMFF94s', numThreads=0)
    print(mmffopt_1)

# just optimize one specific conformer (ID = 0):
mmffopt_2 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=0, maxIters=1000,
mmffVariant='MMFF94s', numThreads=0)
print(mmffopt_2)

# Or optimize all conformers:
mmffopt_3 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=-1, 
maxIters=1000,

mmffVariant='MMFF94s', numThreads=0)
print(mmffopt_3)


In the document for MMFFOptimizeMoleculeConfs: "confId : indicates 
which conformer to optimize". However, in all three cases, it still 
optimize all conformers and give me the "whole" thing:


[(0, 1.0966514172064503), (0, -1.5120724826923375), (0, 
0.6847373779429624)]
[(0, 1.0966514171119535), (0, -1.512072483200475), (0, 
0.6847373779078172)]
[(0, 1.0966514168939838), (0, -1.5120724834832924), (0, 
0.6847373779001575)]
[(0, 1.0966514168498929), (0, -1.512072483655178), (0, 
0.6847371291858746)]
[(0, 1.096651416829605), (0, -1.5120724837465005), (0, 
0.6847371291858746)]


Thank you.

Best,
Leon




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] The "confID" for "MMFFOptimizeMoleculeConfs"

2019-11-19 Thread topgunhaides .
Hi guys,

Does the "confID" argument actually work for "MMFFOptimizeMoleculeConfs"?
Try the following code:


from rdkit import Chem
from rdkit.Chem import AllChem

mh = Chem.AddHs(Chem.MolFromSmiles('OCCCN'))
cids = AllChem.EmbedMultipleConfs(mh, numConfs=3, maxAttempts=1000,
  pruneRmsThresh=0.5, numThreads=0,
randomSeed=-1)

# try to optimize one conformer at a time in the loop:
for cid in cids:
mmffopt_1 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=cid,
maxIters=1000,
  mmffVariant='MMFF94s',
numThreads=0)
print(mmffopt_1)

# just optimize one specific conformer (ID = 0):
mmffopt_2 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=0, maxIters=1000,
  mmffVariant='MMFF94s',
numThreads=0)
print(mmffopt_2)

# Or optimize all conformers:
mmffopt_3 = AllChem.MMFFOptimizeMoleculeConfs(mh, confId=-1, maxIters=1000,
  mmffVariant='MMFF94s',
numThreads=0)
print(mmffopt_3)


In the document for MMFFOptimizeMoleculeConfs: "confId : indicates which
conformer to optimize". However, in all three cases, it still optimize all
conformers and give me the "whole" thing:

[(0, 1.0966514172064503), (0, -1.5120724826923375), (0, 0.6847373779429624)]
[(0, 1.0966514171119535), (0, -1.512072483200475), (0, 0.6847373779078172)]
[(0, 1.0966514168939838), (0, -1.5120724834832924), (0, 0.6847373779001575)]
[(0, 1.0966514168498929), (0, -1.512072483655178), (0, 0.6847371291858746)]
[(0, 1.096651416829605), (0, -1.5120724837465005), (0, 0.6847371291858746)]

Thank you.

Best,
Leon
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] assign all bond directions in SMILES

2019-11-19 Thread Andrew Dalke
Hi all,

  Is there any way to assign all bond directions (E/Z stereochemistry) to the 
output SMILES string?

For example, here's a structure:

>>> mol = Chem.MolFromSmiles(r"F/C(Cl)=C(O)/N")
>>> Chem.MolToSmiles(mol)
'N/C(O)=C(/F)Cl'

It's a minimal definition, in that I could have specified the directions for 
all of the bonds:

>>> mol = Chem.MolFromSmiles(r"F/C(/Cl)=C(\O)/N")
>>> Chem.MolToSmiles(mol)
'N/C(O)=C(/F)Cl'

Note that RDKit figured out which bond directions were minimal.

The underlying code checks for conflicting assignments: 

>>> mol = Chem.MolFromSmiles(r"F/C(/Cl)=C(/O)/N")
[18:25:25] Conflicting single bond directions around double bond at index 2.
[18:25:25]   BondStereo set to STEREONONE and single bond directions set to 
NONE.
>>> Chem.MolToSmiles(mol)
'NC(O)=C(F)Cl'

What I want is some way to go from

  N/C(O)=C(/F)Cl

to a fully specified

  F/C(/Cl)=C(\O)/N


Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Folding count vectors

2019-11-19 Thread Benjamin Datko
Hello Francois,

I am trying to replicate some of the functionality of
CreateDifferenceFingerprintForReaction [Ref 1] for my own understanding on
how the code works. The function CreateDifferenceFingerprintForReaction
allows for three difference fingerprint representation of the molecules:
AtomPair, Morgan, and TopologicalTorsion [Ref 2]. All three are count
vectors [Ref 3], and the function allows for variable fingerprint size
output.

I was following this post [Ref 4] describing how to create reaction
difference fingerprints using different fingerprints representation. Using
the code from the post I can create reaction difference fingerprints using
either Morgan or AtomPair, but comparing the output from the post [Ref 4]
to CreateDifferenceFingerprintForReaction results in different size
fingerprints, with different values within the fingerprint, and different
densities. I am assuming this due to folding the count vector down to
the default fingerprint size of 2048.

Example code snippet:

# The below defs are from the post
https://sourceforge.net/p/rdkit/mailman/message/35240736/
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit import DataStructs
import copy

def _createFP(mol,maxSize,fpType='AP'):
mol.UpdatePropertyCache(False)
if fpType == 'AP':
return AllChem.GetAtomPairFingerprint(mol, minLength=1,
maxLength=maxSize)
else:
Chem.GetSSSR(mol)
rinfo = mol.GetRingInfo()
return AllChem.GetMorganFingerprint(mol, radius=maxSize)

def getSumFps(fps):
summedFP = copy.deepcopy(fps[0])
for fp in fps[1:]:
summedFP += fp
return summedFP

def buildReactionFP(rxn, maxSize=3, fpType='AP'):
reactants = rxn.GetReactants()
products = rxn.GetProducts()
rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
reactants])
pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
products])
return pFP-rFP

>>> rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1C1>>[N:1]C1C1' ,
useSmiles=True)
>>> rxfp1 = buildReactionFP(rxn1,maxSize=2)

>>> rxfp1.GetNonzeroElements()
{558114: -2, 574497: -1, 1066050: 2, 1066081: 1}

>>> rxfp1.GetLength()
8388608


# Same reaction now using CreateDifferenceFingerprintForReaction
>>> rxn1_fp = AllChem.CreateDifferenceFingerprintForReaction(rxn1)

>>> rxn1_fp.GetNonzeroElements()
{1048: 10,
 1310: -20,
 1325: 20,
 1372: -10,
 1390: 20,
 1692: -10,
 1757: -20,
 1772: 10}

>>> print(rxn1_fp.GetLength(),rxfp1.GetLength())
2048 8388608

References
1.
https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html#rdkit.Chem.rdChemReactions.CreateDifferenceFingerprintForReaction
2.
https://www.rdkit.org/docs/cppapi/structRDKit_1_1ReactionFingerprintParams.html
3.
https://www.rdkit.org/docs/GettingStartedInPython.html#morgan-fingerprints-circular-fingerprints
4. https://sourceforge.net/p/rdkit/mailman/message/35240736/

v/r,

Ben

On Mon, Nov 18, 2019 at 10:13 PM Francois Berenger  wrote:

> On 19/11/2019 03:34, Benjamin Datko wrote:
> > Hello all,
> >
> > I am curious on how to fold a count vector fingerprint. I understand
> > when folding bit vectors the most common way is to split the vector in
> > half, and apply a bitwise OR operation. I think this is how the
> > function rdkit.DataStructs.FoldFingerprint works in RDKit, correct me
> > if I am wrong.
> >
> > How does RDKit and or what is the appropriate way to fold count
> > vectors such as AtomPair, Morgan, and Topological torsion?
>
> Can you give us some context? Why do you want to do that?
>
> Maybe, you can use the following in order to create
> shorter "fingerprints" for which the Tanimoto distance is
> still computable (despite becoming approximate then):
>
> ---
> Shrivastava, A. (2016).
> Simple and efficient weighted minwise hashing.
> In Advances in Neural Information Processing Systems (pp. 1498-1506).
>
>
> https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf
> ---
>
> Regards,
> F.
>
> > I thought about turning the fingerprint into a bit vector using their
> > respected "AsBitVect" method then folding using
> > rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't
> > have a "AsBitVect" method
> > [https://www.rdkit.org/docs/GettingStartedInPython.html].
> >
> > For an explicit example using AtomPair fingerprint we can see the
> > fingerprint is extremely sparse. Could this AtomPair fingerprint be
> > folded to increase the density?
> >
>  from rdkit import Chem
> >
>  from rdkit.Chem import AllChem
> >
>  mol = Chem.MolFromSmiles('CC1C1')
>  ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1,
> > maxLength=3)
> >
>  number_of_nonzero_elements =
> > len(ap_fp.GetNonzeroElements().values())
> >
>  print((ap_fp.GetLength(),number_of_nonzero_elements))
> > (8388608,9)
> >
> > Very Respectfully,
> >
> > Ben
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > 

Re: [Rdkit-discuss] Anaconda installation without hard dependency on Intel MKl (windows)

2019-11-19 Thread Thomas Strunz
Hi all,

In the last couple days there has been increased foucs on this on certain 
tech/social media sites (MKL crippling Ryzen) for example matlab is also 
affected. Some of you might have seen it but there seems to be a very simple 
workaround to get MKL to run properly on AMD Ryzen.

One simply needs to create a system environment variable

MKL_DEBUG_CPU_TYPE=5


And then anything using MKL will use AVX2 code path (if applicable) and run 
much faster. faster than with openblas.

Again, no extensive testing done. But this would be in my opinion the simplest 
workaround.

Best Regards,

Thomas

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss