Re: [Rdkit-discuss] question on complexity of cannonization

2023-06-15 Thread Francois Berenger

On 16/06/2023 03:49, S Joshua Swamidass wrote:

Incidentally,

I came across this O(log N) canonization algorithm for planar graphs:
https://arxiv.org/pdf/0809.2319.pdf

I wonder if this algorithm can be adapted for chemistry? Molecules are
usually planar, but I believe they occasionally are "nearly" planar,
by which I mean planar graphs plus a few edges that break the
planarity.


Dear Joshua,

Some natural product are notoriously complex 3D molecules.

What do you exactly mean by planar?

Many chemical groups are 3D: methyl, adamantane, etc.


And what (generally speaking) is the algorithm used by rdkit? Do we
know it's complexity?

On Thu, Jun 15, 2023 at 1:38 PM S Joshua Swamidass
 wrote:


Andrew,

Thanks! According to wikipedia (and my recollections of algorithms
class)...
"The problem is not known to be solvable in polynomial time [1] nor
to be NP-complete [2], and therefore may be in the computational
complexity class [3] NP-intermediate [4]."

https://en.wikipedia.org/wiki/Graph_isomorphism_problem

Your reference though is really helpful. The key phrase seems to be
"bounded valence" which is certainly true of molecular graphs. Each
atom can only bound some fairly low number of other atoms, i.e.
bounded valence. That's probably the reason why we do have a
polynomial time algorithm...

Thank you!

Joshua

On Thu, Jun 15, 2023 at 1:21 PM Andrew Dalke
 wrote:


On Jun 15, 2023, at 18:20, S Joshua Swamidass
 wrote:

It's well known that the graph-isomorphism problem is NP


While P is contained in NP, I don't think that's the NP you mean.

I suspect you may be thinking of subgraph isomorphism, which is
NP-hard. Graph isomorphism may be quasi-polynomial time, if
Babai's (unpublished) claim is correct.

Also, "Isomorphism of graphs of bounded valence can be tested in
polynomial time" -
https://www.sciencedirect.com/science/article/pii/002282900095
.


So here is my question. What are the cases that are very

difficult to canonize a graph?

As I recall, handling chirality and other non-local properties is
difficult. I have not worked on this problem.

Cheers,

Andrew
da...@dalkescientific.com



Links:
--
[1] https://en.wikipedia.org/wiki/Polynomial_time
[2] https://en.wikipedia.org/wiki/NP-complete
[3] https://en.wikipedia.org/wiki/Complexity_class
[4] https://en.wikipedia.org/wiki/NP-intermediate
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Virtual hydrogens for metals (smiles and smarts)

2023-05-18 Thread Francois Berenger

For mere humans, working with SMARTS is always a pain.
But, some nice people are trying to make your life easier:

https://smarts.plus/

[2] K. Schomburg, H.-C. Ehrlich, K. Stierand, M.Rarey; From Structure 
Diagrams to Visual Chemical Patterns, J. Chem. Inf. Model., 2010, 50 
(9), pp 1529-1535


On 19/05/2023 07:17, Jarod Younker wrote:

I’ve got the following two smiles strings:

[Zr]C
[ZrH]

The smarts string [Zr][CH3] matches [Zr]C. What’s the smarts for
[ZrH]?  I’ve tried [Zr][H].  It does not match.

Sent from my iPhone
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to decompose the UFF (or MMFF94) scoring of a small molecule?

2023-05-18 Thread Francois Berenger

Dear list,

I asked this question in rdkit's github discussions:

https://github.com/rdkit/rdkit/discussions/6377

But, apparently that's not more responsive than the ML, so here is my 
question:

---
I have a ligand, I would like to score its current conformer using 
rdkit's UFF implementation.


Later, I would like to change some rotatable bond (single bond out of 
ring) and update
the conformer's energy bu just re-evaluating the part of the energy that 
is supposed to have

changed (i.e. the dihedral component).
Bond lengths, bond angles and partial charges being constant.

I suspect it should be faster than rescoring the conformer from scratch.

How to do this with rdkit?
---

Thanks a lot,
Francois.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] UFF and MMFF94 strongly disagree on the internal energy of one conformer

2023-05-11 Thread Francois Berenger

Dear list,

I am a little bit worried.
If I start from the conformer given at the end of this message,
MMFF94 gives me a minimized conformer energy of -122.528
while UFF gives me 69.098 (I assume, both are in kcal/mol).

I am a little bit annoyed by the fact that the UFF energy
is not even negative and that the two FFs disagree by so much.

Are the two FFs using different units?

Do the two FFs compute something different?

I used the following python code for conformer minimization:
---
def minimize_conformer(ff, mol):
# ligand is supposed to be already properly protonated
# for given pH and in 3D
assert(mol.GetNumConformers() == 1)
conv_ene = []
if ff == "uff":
conv_ene = AllChem.UFFOptimizeMoleculeConfs(mol)
elif ff == "mmff":
conv_ene = AllChem.MMFFOptimizeMoleculeConfs(mol)
else:
print("minimize_conformer: unsupported FF: %s" % ff,
  file=sys.stderr)
assert(False)
not_converged, ene = conv_ene[0]
assert(not_converged == 0)
return (mol, ene)
---

--- mol2
@MOLECULE
caffeine
   2425 0 0 0
SMALL
USER_CHARGES

@ATOM
  1 C1  5.11120.7768   -0.9264 C.2   1 <0> 
0.0365
  2 C2  3.64412.3853   -1.2367 C.2   1 <0>
-0.2366
  3 C3  3.04491.2362   -0.7981 C.2   1 <0> 
0.2902
  4 C4  2.95233.5995   -1.5265 C.2   1 <0> 
0.7150
  5 C5  0.86292.2904   -0.8430 C.2   1 <0> 
0.6900
  6 C6  6.04772.9539   -1.7310 C.3   1 <0> 
0.2556
  7 C7  1.0831   -0.0921   -0.1245 C.3   1 <0> 
0.3001
  8 C8  0.75904.6603   -1.5747 C.3   1 <0> 
0.3001
  9 N1  3.95570.2327   -0.6044 N.2   1 <0>
-0.5653
 10 N2  4.97342.0838   -1.3171 N.pl3 1 <0> 
0.0476
 11 N3  1.67431.1555   -0.5931 N.am  1 <0>
-0.4231
 12 N4  1.55573.4679   -1.3036 N.am  1 <0>
-0.4201
 13 O1  3.50184.6256   -1.9190 O.2   1 <0>
-0.5700
 14 O2 -0.36162.2696   -0.6756 O.2   1 <0>
-0.5700
 15 H1  6.07150.2803   -0.8973 H 1 <0> 
0.1500
 16 H2  6.16902.8492   -2.8124 H 1 <0> 
0.
 17 H3  5.79973.9871   -1.4724 H 1 <0> 
0.
 18 H4  6.96642.6656   -1.2124 H 1 <0> 
0.
 19 H5  1.0630   -0.09860.9753 H 1 <0> 
0.
 20 H6  0.0567   -0.1790   -0.5104 H 1 <0> 
0.
 21 H7  1.6835   -0.9404   -0.4850 H 1 <0> 
0.
 22 H8  0.43104.6518   -2.6246 H 1 <0> 
0.
 23 H9 -0.12164.6705   -0.9156 H 1 <0> 
0.
 24 H10 1.36725.5577   -1.3880 H 1 <0> 
0.

@BOND
 119 2
 21   10 1
 323 2
 424 1
 52   10 1
 639 1
 73   11 1
 84   12 am
 94   13 2
105   11 am
115   12 am
125   14 2
136   10 1
147   11 1
158   12 1
161   15 1
176   16 1
186   17 1
196   18 1
207   19 1
217   20 1
227   21 1
238   22 1
248   23 1
258   24 1
---

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to get indexes and atoms with H from smiles

2023-05-09 Thread Francois Berenger

Hello,

Maybe you can use this:
Chem.MolToSmiles(mol, allHsExplicit=True)

This will place each heavy atom between '[' and ']' and give you the 
number

of hydrogens for each.
It get easier to work with SMILES strings after this (you don't need 
anymore

a full blown SMILES parser).

Regards,
F.

On 09/05/2023 14:55, Haijun Feng wrote:

[1]

Hi All,

I am trying to add atom numbers in smiles as belows,

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
smi=Chem.MolToSmiles(mol)
print(smi)

the output is: [cH:0]1[cH:1][cH:2][cH:3][cH:4][c:5]1C:6=[O:8]

then I want to split the smiles into atoms, I did it like this:

from rdkit import Chem
mol=Chem.MolFromSmiles('c1c(C(N)=O)1')
for i, atom in enumerate(mol.GetAtoms()):
  atom.SetProp('molAtomMapNumber',str(i))
  print(i,atom.GetSymbol())

the output is:

0 C
1 C
2 C
3 C
4 C
5 C
6 C
7 N
8 O

But what I do want is something like this (with fragments instead of
atoms):

0 cH
1 CH
...
7 NH2
8 O

Can anyone help me figure out how to get each atom with H from the
smiles as above. Thanks so much!

best,

Hal

Links:
--
[1] https://stackoverflow.com/posts/76197437/timeline
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Listing all UFF torsion parameters present in a molecule (V_jk, n_jk, phi0_jk)

2023-04-26 Thread Francois Berenger

Dear rdkiters,

Is it possible to list all the torsion angles UFF parameters
around single bonds out of rings (rotatable bonds) for a given molecule 
?


From what I found in the rdkit doc, it is (only?) possible to extract
the Vjk value for four consecutive atoms indexed i j k l.

But, Vjk is just one parameter (the torsion barrier in kcal/mol): for 
each torsion angle
UFF also defines the multiplicity of the barrier (n_jk, an integer) and 
phi0
(the angle in degrees at which the barrier is 0), if I understand 
correctly.


I am reading carefully the DREIDING and UFF papers, but I am not (yet?)
sure I will be able to get that correctly.
So, since rdkit has an UFF implementation, I wonder if it would not
be safer to have just rdkit list for me all those torsions parameters 
for the molecule at hand.


If rdkit cannot do that, I might post later a tentative solution so that 
another pair

of eyes might tell me if I got this correctly.

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] problem when reading in a .sdf file w/ hydrogens already present and removeHs=False

2023-04-03 Thread Francois Berenger

Dear list,

This code:
---
#!/usr/bin/env python3

import argparse, sys
from rdkit import Chem

def debug_mol(m):
for a in m.GetAtoms():
i = a.GetIdx()
anum = a.GetAtomicNum()
numHs = a.GetTotalNumHs()
print('%d %d %d' % (i, anum, numHs))

if __name__ == '__main__':
# CLI options parsing
parser = argparse.ArgumentParser(description = "test strange rdkit 
behavior")

parser.add_argument("-i", metavar = "input.sdf", dest = "input_fn",
help = "3D conformer input file ")
# parse CLI 
---

if len(sys.argv) == 1:
# user has no clue of what to do -> usage
parser.print_help(sys.stderr)
sys.exit(1)
args = parser.parse_args()
input_fn = args.input_fn
# parse CLI end 
---

for mol in Chem.SDMolSupplier(input_fn, removeHs=False):
debug_mol(mol)
---

On this file:
---
caffeine
 OpenBabel10171811233D

 24 25  0  0  0  0  0  0  0  0999 V2000
   -1.45372.78480.2699 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.01081.40830.1062 N   0  0  0  0  0  0  0  0  0  0  0  0
0.30151.13230.0489 C   0  0  0  0  0  0  0  0  0  0  0  0
1.10812.09200.1407 O   0  0  0  0  0  0  0  0  0  0  0  0
0.8161   -0.1286   -0.1033 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0929   -1.1771   -0.2031 C   0  0  0  0  0  0  0  0  0  0  0  0
0.6111   -2.3242   -0.3462 N   0  0  0  0  0  0  0  0  0  0  0  0
1.9386   -2.0269   -0.3392 C   0  0  0  0  0  0  0  0  0  0  0  0
2.0299   -0.6962   -0.1913 N   0  0  0  0  0  0  0  0  0  0  0  0
3.27290.0261   -0.1349 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.4004   -0.8770   -0.1432 N   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3540   -1.9596   -0.2459 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.86970.37710.0073 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.09740.65100.0627 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.68843.31910.8569 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.50243.2204   -0.7549 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.46902.83500.7286 H   0  0  0  0  0  0  0  0  0  0  0  0
2.7299   -2.7636   -0.4379 H   0  0  0  0  0  0  0  0  0  0  0  0
3.47830.41860. H   0  0  0  0  0  0  0  0  0  0  0  0
4.1200   -0.5981   -0.4606 H   0  0  0  0  0  0  0  0  0  0  0  0
3.27000.9110   -0.8337 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8812   -2.88340.1466 H   0  0  0  0  0  0  0  0  0  0  0  0
   -2.6277   -2.0396   -1.3222 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2286   -1.70140.3855 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1 15  1  0  0  0  0
  1 16  1  0  0  0  0
  1 17  1  0  0  0  0
  2  3  1  0  0  0  0
  3  4  2  0  0  0  0
  3  5  1  0  0  0  0
  5  6  2  0  0  0  0
  6  7  1  0  0  0  0
  6 11  1  0  0  0  0
  7  8  2  0  0  0  0
  8  9  1  0  0  0  0
  8 18  1  0  0  0  0
  9 10  1  0  0  0  0
  9  5  1  0  0  0  0
 10 19  1  0  0  0  0
 10 20  1  0  0  0  0
 10 21  1  0  0  0  0
 11 12  1  0  0  0  0
 11 13  1  0  0  0  0
 12 22  1  0  0  0  0
 12 23  1  0  0  0  0
 12 24  1  0  0  0  0
 13 14  2  0  0  0  0
 13  2  1  0  0  0  0
M  END

---

Tells me that a.GetTotalNumHs() is always 0:
---
0 6 0
1 7 0
2 6 0
3 8 0
4 6 0
5 6 0
6 7 0
7 6 0
8 7 0
9 6 0
10 7 0
11 6 0
12 6 0
13 8 0
14 1 0
15 1 0
16 1 0
17 1 0
18 1 0
19 1 0
20 1 0
21 1 0
22 1 0
23 1 0
---

This is wrong: e.g. atom at index 0 (Carbon) should have 3 hydrogens.
The involved bonds are 1 15, 1 16 and 1 17 in the sdf file.
The total of Hs attached to heavy atoms should be 10.

The rdkit I am using:
---
# pip3 list rdkit | grep rdkit
rdkit2022.9.5
rdkit-pypi   2022.9.3
---

Should I feel in a bug on github, or am I doing something stupid?

If I leave the removeHs flag to its default value (of False), then the 
result

becomes correct !

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] UFF rdkit implementation: QEq partial charges are missing?

2023-03-16 Thread Francois Berenger

Dear list,

Am I right that rdkit has an UFF implementation, but the
recommended partial charges for UFF cannot be computed by rdkit?
Those charges are called QEq in the UFF seminal paper[1].

I think openbabel can compute them, but I am surprised if this is not 
included in rdkit.


Does it mean that the UFF rdkit implementation doesn't include 
electrostatic

interactions?

Will hardcore computational chemists send me to hell if I use UFF in 
combination

with the MMFF94 partial charges?

Should we include QEq charges in rdkit? (I honestly don't know yet if 
this is hard

to implement)

Thanks a lot,
Francois.

[1] UFF, a full periodic table force field for molecular mechanics and 
molecular dynamics simulations

https://pubs.acs.org/doi/abs/10.1021/ja00051a040


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MMFF94 scoring of a protein-ligand complex

2023-01-29 Thread Francois Berenger

Dear list,

Is it possible with the MMFF94 implementation in RDKit
to score a protein-ligand complex?

From what I understand currently, the implementation only
allows to work with a single isolated small molecule for
energy calculation and conformer optimization.

If there are some examples out there to process a protein-ligand
complex, that would be really nice.

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Changes in morgan fingerprint code?

2023-01-16 Thread Francois Berenger

Dear Eric,

Sure, if fingerprints are not stable over time, some people who check 
things very

carefully (as you did) will have some surprises.
This being said, you should probably be using InChI keys, if you want a 
hash

for each molecule.

Regards,
F.

On 13/01/2023 06:37, Eric Jonas wrote:

Hello! I use the crc of morgan fingerprints as a quick-and-dirty way
to keep track of different molecules, but now I realize it might have
been too quick and dirty! In particular, there appears to have been a
change in the morgan code sometime between 2021.09.02 and 2022.03.05.
The following code produces different output under these versions:

import rdkit.Chem
import pickle
from rdkit import Chem

import rdkit.Chem.rdMolDescriptors
import zlib

def get_morgan4_crc32(m):
mf = Chem.rdMolDescriptors.GetHashedMorganFingerprint(m, 4)
morgan4_crc32 = zlib.crc32(mf.ToBinary())
return morgan4_crc32

mol = Chem.AddHs(Chem.MolFromSmiles('Oc1cc(O)c(O)c(O)c1'))
print(get_morgan4_crc32(mol))

2021.09.2 : 1567135676

2022.03.5 : 204854560

I tried looking at the release notes but I didn't seem to see any
breaking changes (I might have missed them!) and I tried looking at
"blame" for the relevant source but didn't see any
seemingly-substantive changes within the relevant timeframe.

So am I doing something crazy here, or did something change
deliberately, or is it possible this is a bug?

...E
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] new RDKit FAQ

2022-12-18 Thread Francois Berenger

On 17/12/2022 22:21, Greg Landrum wrote:

Dear all,

After thinking about it but not doing anything far, far too many
times, I've finally managed to start an RDKit FAQ. For the moment I'm
using the github wiki for this since it's easy and quite visible:
https://github.com/rdkit/rdkit/wiki/FrequentlyAskedQuestions

If you have ideas for things that should be on the FAQ, ideally with
answers, please feel free to reply here or, even better, post to the
relevant topic in the github discussions:
https://github.com/rdkit/rdkit/discussions/categories/faq


I _love_ the getting started with RDKit in Python document.
I went many times back to it; and it almost always (like 95% of the 
time)

had the example/answer I was looking for.

https://www.rdkit.org/docs/GettingStartedInPython.html



Best regards,
-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] use cases for weighted sampling of a compound library

2022-12-12 Thread Francois Berenger

On 12/12/2022 01:26, Christopher Mayer-Bacon wrote:

Hello all,

I’m starting a project that explores the sampling of a large
compound library.  My question is not so much about how to do
something, but rather the specific use cases for weighted sampling
from a compound library.

Given a large compound library and a smaller, reference library, I
want to take random samples from the large library such that the
samples resemble the reference library in some way.  At the moment
I’m focused on element composition (% of carbon atoms, % of oxygen
atoms, etc.), but I’m open to using other features in the future.


- what if the smaller reference library has no overlap with the large
compound library? I.e. no overlap between the chemical space
sampled by each library.

Another strategy could be to generate new molecules using your
"smaller reference library" as a training set.

Cf. https://doi.org/10.1186/s13321-021-00566-4 for a simple method
with an open-source implementation. ;)


I have an idea of how to perform this sampling; my question for this
community concerns a possible use case.  What would be the benefit of
sampling from a compound library such that the samples resemble
another library in some way?  I can think of a use case for my
specific research niche (adaptive properties of the canonical amino
acid alphabet), but I can’t think of another potential use case.  I
know the RDKit community has a wide variety of backgrounds and
expertise, hence why I wanted to pose this question to you all.

-Chris

--

-Christopher Mayer-Bacon (_he/him/his_)
PhD student
Department of Biological Sciences
University of Maryland, Baltimore County
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Understanding and visualizing counts fingerprints using GetHashedMorganFingerprint

2022-11-06 Thread Francois Berenger

On 03/11/2022 00:52, Brianna Greenstein wrote:

Hi, I had some questions about Morgan fingerprint counts. I used
AllChem.GetHashedMorganFingerprint(mol, 2, nBits=2048) to get counts
as descriptors for ML models. I am looking at the feature importance
and some of these bits came up as important. I had a few questions on
understanding these hashed fingerprints.


I suspect the Hashed version is a bitstring.
The fact that you also pass nBits also hints towards a bitstring.

If you want the counted version of this fingerprint, use something like 
AllChem.GetMorganFingerprint(mol,2).


I am not competent to answer your other questions below.
Last time I wanted to investigate which fp bit means what in a molecule, 
I ended

up reimplementing a counted circular fingerprint myself.
I do like rdkit, but its API is quite convoluted sometimes.



* Are the structures the bits represent the same for
GetHashedMorganFingerprint and GetMorganFingerprintasBitVect?

* How can I visualize what a specific bit in the hashed morgan
fingerprint looks like? Can I use DrawMorganBit to visualize it the
same way I would for the normal fingerprint?
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GIL Lock in BulkTanimotoSimilarity

2022-10-25 Thread Francois Berenger

On 24/10/2022 19:47, David Cosgrove wrote:

For the record, I have attempted this, but got only a marginal
speed-up (130% of CPU used, with any number of threads above 2).  The
procedure I used was to extract the fingerprint pointers into a
std::vector, create a std::vector for the results, unlock the GIL to
do the bulk tanimoto calculation, then re-lock the GIL to copy the
results from the std::vector into the python:list for output.  I guess
the extra overhead to create and populate the additional std::vectors
destroyed any potential speedup.  This was on a vector of 200K
fingerprints, which suggests that the Tanimoto calculation is a small
part of the overall time.  It doesn't seem worth pursuing further.


There is probably code on github doing this in parallel already.
Think about it: any clustering algorithm using a distance matrix.
I guess many people want to initialize the Gram matrix in parallel.

I wouldn't be surprised if, for example, chemfp has such code.


Dave

On Sat, Oct 22, 2022 at 11:28 AM David Cosgrove
 wrote:


Hi Greg,
Thanks for the pointer. I’ll take a look. If it could go in the
next patch release that would be really useful.
Dave

On Sat, 22 Oct 2022 at 10:52, Greg Landrum 
wrote:

Hi Dave,

We have multiple examples of this in the code, here’s one:



https://github.com/rdkit/rdkit/blob/b208da471f8edc88e07c77ed7d7868649ac75100/Code/GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp#L40


I’m not sure how this would interact with the call to
Python::extract that’s in the bulk functions though

It might be better to handle the multithreading on the C++ side by
adding an optional nThreads argument to  the bulk similarity
functions. (Though this would have to wait for the next release
since it’s a feature addition… we can declare releasing the GIL
as a bug fix)

-greg

On Sat, 22 Oct 2022 at 09:48, David Cosgrove
 wrote:

Hi,

I'm doing a lot of tanimoto similarity calculations on large
datasets using BulkTanimotoSimilarity.  It is an obvious candidate
for parallelisation, so I am using concurrent.futures to do so.  If
I use ProcessPoolExectuor, I get good speed-up but each process
needs a copy of the fingerprint set and for the sizes I'm dealing
with that uses too much memory.  With ThreadPoolExecutor I only need
1 copy of the fingerprints, but the GIL means it only runs on 1
thread at a time so there's no gain.  Would it be possible to amend
the C++ BulkTanimotoSimilarity to free the GIL whilst it's doing the
calculation, and recapture it afterwards?  I understand things like
numpy do this for some of their functions.  I'm happy to attempt it
myself if someone who knows about these things can advise that it
could be done, it would help, and they could provide a few pointers.

Thanks,
Dave

--

David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

 --

David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk

--

David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] What is the recommended 3D-sensitive file format to use with RDKit?

2022-06-16 Thread Francois Berenger

Hi all,

I assume it's ".sdf".

But, do we have good support for ".xyz" also?

In addition, what about RDKit's support of ".mol2" these days?

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] pharmacophore

2022-03-29 Thread Francois Berenger

On 30/03/2022 03:49, Patrick Walters wrote:

One way to compare interactions (pharmacophores) in a binding site is
to use interaction fingerprints.  I've had a good experience with
ProLIF.
https://github.com/chemosim-lab/ProLIF


Additionally, I know about all those open-source ones:
- https://pharmit.csb.pitt.edu/
- https://github.com/gertthijs/pharao
- https://github.com/DrrDom/pmapper


On Tue, Mar 29, 2022 at 6:26 AM Muhammad Akram
 wrote:


Hello Everybody,

I am looking if there is a way to extract a pharmacophore from
co-crystallized ligand using RDKit.

Thank you so much in advance.

Kind Regards,

Mu ___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to get the electronegativity of an atom?

2022-03-08 Thread Francois Berenger

On 08/03/2022 17:23, Francois Berenger wrote:

Dear rdkit experts,

I am looking to access the electronegativity value
of a given atom in a molecule.

Funnily, I don't know _at_ _all_ how to do this.

I guess that there should be a way using the atomic number
to get this value from a table inside of rdkit
but my code searches on github where somewhat vain (we do have a 
constant table

somewhere with the Allen scale values * 1000, but that
doesn't look very practical to use).

Before I do the horrible (hard-coding into my program
the whole Allen scale copy-pasted from wikipedia, yes, nothing less),
I prefer to ask rdkit experts.


A less horrible option might be to extract the 
bo:electronegativityPauling
values from the elements.xml file of the latest version of the Blue 
Obelisk project.


But, I would be quite surprised if rdkit doesn't hold this value already 
somewhere...



If we have a choice between the Pauling scale or the Allen
scale, I would be interested to know about that.
If we can directly access the difference of ElNeg for two
bonded atoms of a molecule, I might be happy living with that.

Thanks a lot,
F.



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to get the electronegativity of an atom?

2022-03-08 Thread Francois Berenger

Dear rdkit experts,

I am looking to access the electronegativity value
of a given atom in a molecule.

Funnily, I don't know _at_ _all_ how to do this.

I guess that there should be a way using the atomic number
to get this value from a table inside of rdkit
but my code searches on github where somewhat vain (we do have a 
constant table

somewhere with the Allen scale values * 1000, but that
doesn't look very practical to use).

Before I do the horrible (hard-coding into my program
the whole Allen scale copy-pasted from wikipedia, yes, nothing less),
I prefer to ask rdkit experts.

If we have a choice between the Pauling scale or the Allen
scale, I would be interested to know about that.
If we can directly access the difference of ElNeg for two
bonded atoms of a molecule, I might be happy living with that.

Thanks a lot,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Dask + Rdkit Use Cases

2022-01-25 Thread Francois Berenger

On 25/01/2022 01:57, Oren Herschander wrote:

Hi Everyone,
I'm working on a research project about how Dask and other python
tools for distributed/parallel computing are used in Life Sciences.

I'm on the lookout for use cases, stories, and overall thoughts that
combine rdkit or other similar software with Dask. Emails are great,
but if you have the time for a quick conversation those are a lot of
fun too! :)


In case you don't already know, Patrick Walters gave a try at dask to
parallelize some python chemoinformatics and tells all about it here:

https://patwalters.github.io/practicalcheminformatics/jupyter/dask/parallel/2021/03/28/dask-cheminformatics.html


Thank you so much! I really appreciate the help!

Cheers,
Oren

 [1]

Oren Herschander

_Information__ Forager @ _https://coiled.io/



Links:
--
[1] https://coiled.io
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] State of the art for shape alignment

2021-11-11 Thread Francois Berenger

On 12/11/2021 01:58, Paolo Tosco wrote:

Hi Tim,

Open3DAlign is not shape-based, it is atom-based. The score is
proportional to the # of matched atoms, weighted by similarity. It
will work well for homologous series of compounds with reasonable
scaffold similarity, and will in general perform badly with scaffolds
that are very dissimilar in size and branching.
Shape-it and Align-it worked reasonably well in the past in my hands.
A Google or Google Scholar search will return many well-established
and more recent methods with different licenses, including but not
limited to:
* ShaEP
* Cresset tools
* ROCS
* LS-Align
* OptiPharm
* WEGA


I came upon this one the other day on github:

Comparison of electrostatic potential and shape
https://github.com/hesther/espsim

I did not try it, but it could be one more candidate.

Regards,
F.



Cheers,
p.

On Thu, Nov 11, 2021 at 4:55 PM Tim Dudgeon 
wrote:


I'm looking into the current status of techniques that use RDKit to
perform 3D alignments based on shape (e.g. not using AlignMol()) and
struggling to find what the best tools are and the status of each.

The Open3D align tools are relatively straightforward to use, but in
my hands do not seem to give good alignments. e.g with this
reference mol:

I get this alignment:

I also stumbled across shape-it e.g.


https://iwatobipen.wordpress.com/2021/03/17/comparison-between-native-implemented-shape-align-method-of-rdkit-and-rdkit-shape-it-chemoinformatics-rdkit-shape-based-align/

But I find this very slow and most alignments seem to fail in my
hands.

Are there any guides to what tools work well here?

Tim

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Francois Berenger

On 27/09/2021 19:22, Lewis Martin wrote:

Very interesting - thank you Francois! PDB re-do does the trick:

import requests
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://pdb-redo.eu/db/{code}/{code}_final.pdb')
return out.content

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

I think this solves it for me, but if anyone knows how to infer
correct bonding information without relying on distances, I'd love to
hear it too! So far I've noticed that Parmed and PDBFixer infer
correct bonds, but they don't determine bond orders, so it's difficult
to port the molecule into RDKit.


I just remember one paper; it might give you an entry point into the
scientific literature:

Determination of molecular topology and atomic hybridization states from 
heavy atom coordinates

Elaine C. Meng, Richard A. Lewis
https://doi.org/10.1002/jcc.540120716

Regards,
F.


Cheers
Lewis

On Mon, Sep 27, 2021 at 5:55 PM Francois Berenger 
wrote:


Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/
They might have fixed a few things; maybe this PDB will read better
in
rdkit.

Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:

Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across
situations where the distance-based bond determination leads to
'incorrect' bonds between atoms that are erroneously too close
together. PDB files have no bond information, so it's not really
'incorrect' (rather the model coordinates are off), but the bonds

are

nonphysical - and it means the Mol objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]

[1]')

binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to

the

TYR71 backbone carbonyl oxygen (this can be visualized at


https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B
,

TYR71 is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've

tried

using Parmed and PDBFixer, since they use residue templates to
generate the correct bonding topology, but they don't write CONECT
records or SDFs, so the bonds are still lost to RDKit.

Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom
invariants using RDKit's morgan fingerprinter implementation, so
ultimately I want a sanitized Mol object

Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Parsing a PDB file with atoms that are too close, causing bad bond

2021-09-27 Thread Francois Berenger

Hi Lewis,

Just an idea: you might try to load your PDB in UCSF Chimera, then
save it as a mol2 or sdf file.
Then, try to read this sdf file from rdkit.

Another idea: try to get your pdb file through the pdbredo service.
https://pdb-redo.eu/
They might have fixed a few things; maybe this PDB will read better in 
rdkit.


Regards,
F.

On 26/09/2021 17:02, Lewis Martin wrote:

Hi RDKit,
While parsing proteins from the PBD with RDKit, I've come across
situations where the distance-based bond determination leads to
'incorrect' bonds between atoms that are erroneously too close
together. PDB files have no bond information, so it's not really
'incorrect' (rather the model coordinates are off), but the bonds are
nonphysical - and it means the Mol objects won't sanitize.

Here's an example:

import requests
from io import BytesIO
import gzip
from rdkit import Chem

def getPDB(code):
out =
requests.get(f'https://files.rcsb.org/download/{code}.pdb1.gz [1]')
binary_stream =  BytesIO(out.content)
return gzip.open(binary_stream).read()

pdb_string = getPDB('3udn')
Chem.MolFromPDBBlock(pdb_string)

Error is:

RDKit ERROR: [22:38:21] Explicit valence for atom # 573 O, 3, is
greater than permitted

This is caused by the threonine 72 sidechain being too close to the
TYR71 backbone carbonyl oxygen (this can be visualized at
https://www.rcsb.org/3d-view/3UDN?preset=ligandInteraction=09B ,
TYR71 is near the ligand).

Does anyone know how to avoid this to create a Chem.Mol? I've tried
using Parmed and PDBFixer, since they use residue templates to
generate the correct bonding topology, but they don't write CONECT
records or SDFs, so the bonds are still lost to RDKit.

Thanks for your time!
Lewis
PS - why not just use PDBFixer? I'm trying to calculate atom
invariants using RDKit's morgan fingerprinter implementation, so
ultimately I want a sanitized Mol object

Links:
--
[1] https://files.rcsb.org/download/%7Bcode%7D.pdb1.gz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Are Partial Charge Calculations Dependent on Conformers?

2021-07-06 Thread Francois Berenger

On 01/07/2021 13:15, Hao wrote:

Thanks Greg!

That helps a lot, it was purely out of curiosity and understanding.
I'm working with some legacy code that requires conformer generation
before calculating partial charges. Now that I know it's unnecessary,
I can speed up this process by quite a bit. It's good to know that
there aren't really other RDKit descriptors that rely on conformers.
In the future, I'll be using QM to generate partial charges as you
have suggested.


Be careful, with some charge models the partial charges are
dependent on the conformer. For example, with AM1BCC (but this charge
model is not available in rdkit).


Best,
Hao

On Wed, Jun 30, 2021 at 11:07 PM Greg Landrum 
wrote:


Hi Hao,

The reference for how the Gasteiger charges is calculated is in the
documentation for the function:


https://www.rdkit.org/docs/source/rdkit.Chem.rdPartialCharges.html#rdkit.Chem.rdPartialCharges.ComputeGasteigerCharges


It does not use atomic coordinates.

The MMFF charges are described in the MMFF94 papers (googling for
MMFF94 will turn these up). They also do not use atomic coordinates.

If you really need partial charges which are dependent on the 3D
conformer (and I wonder why you do), the only option in the RDKit
would be to use it's implementation with the YAeHMOP package to do a
semi-empirical QM calculation:


from rdkit import Chem
from rdkit.Chem import rdDistGeom
from rdkit.Chem import rdEHTTools
m = Chem.AddHs(Chem.MolFromSmiles('OCCN'))
rdDistGeom.EmbedMolecule(m)
ok,res = rdEHTTools.RunMol(m)
res.GetAtomicCharges()


Note that I call AddHs() there before generating the 3D coordinates.
Recent versions of the RDKit generate a warning if you don't do
this. That's not one which you should ignore: you generally need the
Hs there in order to get good conformations.

There are many other methods out there which derive charges from
quantum mechanical calculations, but those all require using
external software.

Why do you want partial charges which are dependent on conformer?

-greg

On Thu, Jul 1, 2021 at 3:53 AM Hao  wrote:


Hi RDKit community,

I am not familiar with how partial charges are calculated and I
couldn't seem to find anything in my searches.

If you run the code below, you'll see that the partial charges are
always the same, even though the embedded mol is different - which
leads me to believe these partial charge calculations are not
dependent on conformers (which I always thought they were?)

Can someone with more knowledge than me confirm my hypothesis?
Also does rdkit have any partial charge calculators that are
dependent on conformers?

mol = Chem.MolFromSmiles('C[C@@](CC1=CC(O)=C(O)C=C1)(NN)C(O)=O')
AllChem.EmbedMolecule(mol, AllChem.ETKDG())
AllChem.ComputeGasteigerCharges(mol)
contribs =
[float(mol.GetAtomWithIdx(i).GetProp('_GasteigerCharge')) for i in
range(mol.GetNumAtoms())]
fps = AllChem.MMFFGetMoleculeProperties(mol)
mmff_partial_charges = [fps.GetMMFFPartialCharge(x) for x in
range(mol.GetNumAtoms())]
print(mmff_partial_charges)

print(contribs)

Thanks,
Hao ___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Shape Tanimoto distance question

2021-06-29 Thread Francois Berenger

On 29/06/2021 12:26, Greg Landrum wrote:

Hi Leon,

You can convert the tanimoto distance to similarity, but the formula
is:
Similarity = 1 - Distance


In other words:

Tanimoto_distance = 1.0 - Tanimoto_score

Worth noting: the Tanimoto distance is a metric; hence it is pretty
useful in computational geometry data structures to prune the search
space.


Best,
-greg

On Tue, Jun 29, 2021 at 3:21 AM topgunhaides 
wrote:


Hi guys,

A quick question:

RDKit computes the "Shape Tanimoto distance" by calling the
"ShapeTanimotoDist".

I assume that similarities and distances can be interconverted using
the following equation?

Shape Tanimoto similarity = 1 / (1 + Shape Tanimoto distance)

Correct?  Thank you!

Best,
Leon ___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] install on macosx with Python 3.8

2021-06-24 Thread Francois Berenger

On 25/06/2021 02:57, Michal Krompiec wrote:

Hello,
Is it possible to install RDKit on MacOSX in a Python 3.8 environment?
There is no conda binary for 3.8, so I tried homebrew. But the
following gives me an error message (brew doesn't like the
--with-python3 argument):

brew install rdkit --with-python3 --without-numpy

So I did just "brew install rdkit", but then rdkit is unimportable in
Python ("No module named 'rdkit'"). What am I doing wrong?


You are not using the python interpreter for which rdkit
was installed by brew.

Check what the brew installer of rdkit is doing, especially
look which python version it installs rdkit for.

Alternatively, fire up each and every python interpreter
installed on your computer, and try 'import rdkit'
until you find the one for which it works.

Regards,
F.


I'm using brew 3.2.0 on MacOS 11.4

Thanks in advance,

Michal Krompiec
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-22 Thread Francois Berenger

Dear JP,

To confuse you even more, you can also have a look at the ChEMBL 
open-source molecular standardizer:


https://github.com/chembl/ChEMBL_Structure_Pipeline/blob/master/chembl_structure_pipeline/standardizer.py

No need to thank me. :D

On 18/06/2021 03:12, JP Ebejer wrote:

Dear all,

I am trying to standardize(/normalize?) some molecules from different
sources, to generate a set of descriptors for them.  I have done this
a number of times, and each time I find the process slightly
confusing.  I have the following questions please, if you don't mind:

1.  What is the relation between molvs and rdkit (I remember there was
an integration project between the two a while back).  When I call
rdMolStandardize does rdkit code or molvs code get called?  The github
repo for molvs hasn't been updated in a while (2 yrs), but
rdMolStandardize has.
2.  What is the difference between standardization and normalization
of a molecule?  Does one automatically imply the other or should these
two processes be both run on a molecule?
3.  Specifically, what is the difference between
rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
rdMolStandardize.Normalize(mol).  Should I call any of these manually
three after I run "standardization/cleaning operations" such as
uncharging, reionizing, etc?
4.  I understand what uncharge does, but what does reionizer do?
5.  Is there a way to chain operations together
standardize+ChooseLargestFragment+uncharge+normalize (am not sure the
order makes sense here), other than creating a class instance for each
calling the method, returning a new mol and using this mol in the next
operation?

Apologies for the many questions.  Have I missed the documentation
about this?  I have found some excellent examples here:
https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
(thanks!).  This is not exactly a cleaning pipeline, but still quite
helpful to understand these methods.

Many thanks,
JP
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Are the path-based fingerprints formally described in the scientific literature?

2021-05-19 Thread Francois Berenger

Dear list,

The other day, I was looking for a paper describing them
but the only thing I found was a reference to some Daylight
product.

I know there is a paper (maybe several in fact) for ECFP for example.
Weren't the path-based FPs formally described somewhere?

Thanks a lot,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to prevent a SMILES from starting with a specific atom?

2021-05-11 Thread Francois Berenger

Hello,

I have some molecules with unspecified atoms ('*' in SMILES notation).

I would like that when such a molecule is written out, the resulting
SMILES never starts by one of those atoms (since the molecule
also has plenty of "normal" atoms).

Is it possible to do that with rdkit?

Or, more generally, flag a given atom in a molecule
and ask rdkit to not start the corresponding SMILES with
this atom, any unflagged atom being fine.

Thanks a lot!
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Do we have an exact implementation of Bemis-Murcko scaffolds in rdkit?

2021-04-26 Thread Francois Berenger

On 27/04/2021 10:12, Francois Berenger wrote:

On 26/04/2021 23:35, Greg Landrum wrote:

Hi Francois,

The implementation which is there does, I believe, the right thing.
However... first you need to find the Murcko Scaffold, then you can
convert that scaffold to the generic form:


In [5]: m = Chem.MolFromSmiles('CCc1ccc(O)cc1C(=O)C1CC1')
In [6]: scaff = MurckoScaffold.GetScaffoldForMol(m)
In [7]: Chem.MolToSmiles(scaff)
Out[7]: 'O=C(c1c1)C1CC1'
In [8]: framework = MurckoScaffold.MakeScaffoldGeneric(scaff)
In [9]: print(Chem.MolToSmiles(framework))
CC(C1C1)C1CC1


Ok, maybe this two steps process is a little bit better, but still
not exactly what I would expect in some cases.

I'll say if I program something which I prefer.


Hello,

I end up with this:
---
def find_terminal_atoms(mol):
res = []
for a in mol.GetAtoms():
if len(a.GetBonds()) == 1:
res.append(a)
return res

# Bemis, G. W., & Murcko, M. A. (1996).
# "The properties of known drugs. 1. Molecular frameworks."
# Journal of medicinal chemistry, 39(15), 2887-2893.
def BemisMurckoFramework(mol):
# keep only Heavy Atoms (HA)
only_HA = rdkit.Chem.rdmolops.RemoveHs(mol)
# switch all HA to Carbon
rw_mol = Chem.RWMol(only_HA)
for i in range(rw_mol.GetNumAtoms()):
rw_mol.ReplaceAtom(i, Chem.Atom(6))
# switch all non single bonds to single
non_single_bonds = []
for b in rw_mol.GetBonds():
if b.GetBondType() != Chem.BondType.SINGLE:
non_single_bonds.append(b)
for b in non_single_bonds:
j = b.GetBeginAtomIdx()
k = b.GetEndAtomIdx()
rw_mol.RemoveBond(j, k)
rw_mol.AddBond(j, k, Chem.BondType.SINGLE)
# as long as there are terminal atoms, remove them
terminal_atoms = find_terminal_atoms(rw_mol)
while terminal_atoms != []:
for a in terminal_atoms:
for b in a.GetBonds():
rw_mol.RemoveBond(b.GetBeginAtomIdx(), 
b.GetEndAtomIdx())

rw_mol.RemoveAtom(a.GetIdx())
terminal_atoms = find_terminal_atoms(rw_mol)
return rw_mol.GetMol()
---

I don't claim this is very efficient Python code. I am not very good at 
snake charming.


Regards,
F.


Best,
-greg

On Mon, Apr 26, 2021 at 11:15 AM Francois Berenger 
wrote:


Hello,

I am trying MurckoScaffold.MakeScaffoldGeneric(mol),
but this keeps the side chains.

While my understanding of BM scaffolds is that only rings
and ring linkers should be kept.

The fact that the rdkit implementation keeps the
side chains makes Murcko scaffolds a much less powerful filter
to enforce molecular diversity.

And I don't even see any option to force the standard/vanilla
behavior.
Or, am I missing something?

Regards,
F.

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Do we have an exact implementation of Bemis-Murcko scaffolds in rdkit?

2021-04-26 Thread Francois Berenger

Hello,

I am trying MurckoScaffold.MakeScaffoldGeneric(mol),
but this keeps the side chains.

While my understanding of BM scaffolds is that only rings
and ring linkers should be kept.

The fact that the rdkit implementation keeps the
side chains makes Murcko scaffolds a much less powerful filter
to enforce molecular diversity.

And I don't even see any option to force the standard/vanilla behavior.
Or, am I missing something?

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule

2021-04-04 Thread Francois Berenger

On 03/04/2021 00:03, Andrew Dalke wrote:

Hi Ling,


On Apr 2, 2021, at 16:23, Ling Chan  wrote:

Thank you Francois, I took a look at your code and borrowed parts of 
it to rejoin two molecules. It seems like my problem is solved. I 
eventually arrived at something like example 4 in

https://www.programcreek.com/python/example/123334/rdkit.Chem.CombineMols
(which I discovered a bit late).

Still, I am not sure if the code is safe. In particular, I wonder if 
the following conditions are always valid.
	• Chem.CombineMols simply concatenates the atomic indices from the 
input molecules.
	• The Chem.EditableMol constructor preserves atom ordering from the 
input.
	• RemoveAtom in EditableMol results in all indices above the deleted 
to decrease by one, i.e. atom ordering is preserved.


I've found that it's very hard to work with molecular graphs and
preserve stereochemistry.

Consider F/C=C/Cl breaking on the first bond, and the code I pointed 
you to.


FragmentOnBonds() using '9' as the labels gives:  [9*]/C=C/Cl.[9*]F

My "smiles_weld" code converts that to: CC\%99=C/Cl.F%99 which can be
re-canonicalized to the original: F/C=C/Cl .

Or, with F[C@H](Cl)Br again, breaking on the first bond.

FragmentOnBonds() gives [9*]F.[9*][C@H](Cl)Br

smiles_weld converts that to F%99.[C@@H]%99(Cl)Br which is
re-canonicalized as  F[C@H](Cl)Br

Handling this correctly in the molecule API requires paying careful
attention to the bond direction, and bond attachment order around the
atom, which changes with RemoveAtom() calls. I didn't see
stereochemistry support in Francois's "bind_molecules()" nor in the
connect_mols() at


After discussing with an organic chemist, we decided that fragmenting 
(on the computer) molecules
on bonds which are involved in stereochemistry (stereo bond or linked to 
a stereo center)

is not desirable.
I.e. if the molecules being fragmented have stereochemistry assigned, we 
don't touch around it.



https://github.com/molecularsets/moses/blob/master/moses/baselines/combinatorial.py
(one of the examples from the programcreek.com link you gave).

If you don't need to support or preserve stereochemistry, then of
course there's no problem.

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rejoining pairs of fragments after fragmenting a molecule

2021-03-31 Thread Francois Berenger

On 01/04/2021 04:55, Ling Chan wrote:

Dear Colleagues,

I am trying to do something that I think is quite simple, but I have
not figured out a simple way. Don't know if I am missing something. I
am sure that ultimately I can figure it out, but I wonder if there is
a good way.

I fragmented a molecule with some rules, using FragmentOnBonds. I did
get a list of primary fragments.


I have an ad hoc fragmenting scheme and fragment assembly
implemented in there:

https://github.com/UnixJunkie/molenc/blob/master/bin/molenc_smisur.py

Sorry, but this is non trivial code.

Look for the function bind_molecules to connect two fragments.

The rdkit python doc might have some simpler examples, using 
well-known/published fragmenting schemes

(BRICS or Recap):
http://www.rdkit.org/docs/GettingStartedInPython.html

Regards,
F.


I wish to recombine pairs (and triplets, but no bigger) of these
primary fragments, but only if the resulting fragment is part of the
original molecule. I.e. I want to undo some of the cuttings.
(FragmentOnSomeBonds does not help, since you cannot ensure that the
resulting fragments consist only of pairs of primary fragments.)

What is the best way to do this? The following is what I am trying.

I see that you can mark the original cut points using the dummyLabels
argument in FragmentOnBonds. So I converted the primary fragments to
smiles. I looked for the two sides of the original cut point and
substituted the two dummyLables to [2H] and [3H]. I then tried to
rejoin the fragments using a reaction string
"[*:1][2H].[*:2][3H]>>[*:1][*:2]". Unfortunately the
ReactionFromSmarts function does not accept this string. So I'll have
to use Smarts search to look for [2H] and [3H], then create an
editable molecule from the two primary fragments, look for neighbours
of [2H] and [3H], add a bond, then delete the atoms [2H] and [3H],
then sanitize.

Thank you for your ideas.

Ling
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] The latest RDKit (2020.09.5) is now available on homebrew/linuxbrew

2021-03-31 Thread Francois Berenger

Dear all,

I strongly advise current users of the previous rdkit brew formula to 
try installing this new one.

So that we can detect and correct potential problems.

If it works at least as well as the previous formula (which was 
accessible after brew tap rdkit/rdkit),

then we might stop maintaining the old brew formula.

Regards,
F.

On 20/03/2021 01:28, Yoshitaka Moriwaki wrote:

Dear RDKit users,

I'm pleased to inform macOS and Linux users that the latest stable
rdkit (2020.09.5) is now available on homebrew-core and
linuxbrew-core.

The biggest change is that the rdkit formula has been moved to the
official and default repository to make use of the test bot, which
verifies the compatibility of formula for macOS and Linux OS and
generate binary packages as 'bottle'. This allows us to install RDKit
quickly and safely.

The install procedure is simple if you've already installed 
Homebrew/Linuxbrew:


```
# brew untap rdkit/rdkit # if required
brew install rdkit
```

The latest formula includes several options such as avalon, cairo,
inchi, freesasa, yaehmop, and PgSQL. See also
https://github.com/Homebrew/homebrew-core/blob/master/Formula/rdkit.rb
.

Since homebrew/linuxbrewed-rdkit may conflict with conda-rdkit, please
decide which one to use at your own risk.

I thank Francois Berenger, Eddie Cao, and the contributors who
maintained the rdkit formula.

Sincerely,

Yoshitaka Moriwaki


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] non-element elements

2021-02-03 Thread Francois Berenger

On 04/02/2021 00:35, Brian Peterson wrote:

Hello RDKit people,

Is it possible to modify the properties of elements in the periodic
table or to create new ones?  Use case: Suppose one had some molecules
defined in terms of functional groups or united atoms or some other
entities that are not pure elemental atoms. Could one map these things
on to unused elements (e.g. my_functional_group --> U) and fix up the
properties of U so that it had the appropriate valence etc. and could
be present both in a molecule and in SMARTS patterns so that one could
do substructure matches within RDKit?


Maybe you can use the isotope number to encode some special meaning
for an atom.

Cf. http://www.rdkit.org/docs/GettingStartedInPython.html

"Other fragmentation approaches"
[...] attachment points are labelled (using isotopes) [...]

Those are preserved in the output SMILES.


Thanks,
Brian
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to construct a simple molecule with a Z stereo double bond using RWMol?

2021-01-14 Thread Francois Berenger

On 14/01/2021 18:36, Greg Landrum wrote:

Hi Francois,

I would do this by setting the stereo to either STEREOCIS or
STEREOTRANS and then calling Chem.AssignStereoChemistry():

In [6]: rwmol = Chem.RWMol()
   ...: # create the atoms
   ...: a0 = Chem.Atom(6)
   ...: a1 = Chem.Atom(7)
   ...: a2 = Chem.Atom(6)
   ...: a3 = Chem.Atom(16)
   ...: # add the atoms
   ...: rwmol.AddAtom(a0)
   ...: rwmol.AddAtom(a1)
   ...: rwmol.AddAtom(a2)
   ...: rwmol.AddAtom(a3)
   ...: # add the bonds
   ...: rwmol.AddBond(0, 1, rdkit.Chem.rdchem.BondType.SINGLE)
   ...: rwmol.AddBond(1, 2, rdkit.Chem.rdchem.BondType.DOUBLE)
   ...: rwmol.AddBond(2, 3, rdkit.Chem.rdchem.BondType.SINGLE)
Out[6]: 3

In [7]: db = rwmol.GetBondWithIdx(1)

In [8]: db.SetStereoAtoms(0,3)

In [9]: db.SetStereo(Chem.BondStereo.STEREOCIS)

In [10]: Chem.MolToSmiles(rwmol)
Out[10]: 'CN=CS'

In [11]: Chem.AssignStereochemistry(rwmol)

In [12]: Chem.MolToSmiles(rwmol)
Out[12]: 'C/N=C\\S'


Here is the fun part:

Chem.SanitizeMol(rwmol)
print(Chem.MolToSmiles(rwmol)) # --> CN=CS

"Sanitization" of the rwmol got rid of the stereo info that
we just inserted.

Is this a "feature" of SanitizeMol?

I was being a good kid, I thought that someone must always sanitize
a RWMol prior to extracting the final resulting molecule (in the end
I want a SMILES).

Regards,
F.


On Thu, Jan 14, 2021 at 9:46 AM Francois Berenger 
wrote:


Hello,

Please tell me if you understand why the code below
is not working and if you know how to change it so that it does.

Thanks a lot! :)
F.

---
#!/usr/bin/env python3

# try to construct a molecule with a Z stereo double bond using
RWMol

import rdkit
from rdkit import Chem

wanted_smi = 'C/N=C\\S'

rwmol = Chem.RWMol()
# create the atoms
a0 = Chem.Atom(6)
a1 = Chem.Atom(7)
a2 = Chem.Atom(6)
a3 = Chem.Atom(16)
# add the atoms
rwmol.AddAtom(a0)
rwmol.AddAtom(a1)
rwmol.AddAtom(a2)
rwmol.AddAtom(a3)
# add the bonds
rwmol.AddBond(0, 1, rdkit.Chem.rdchem.BondType.SINGLE)
rwmol.AddBond(1, 2, rdkit.Chem.rdchem.BondType.DOUBLE)
rwmol.AddBond(2, 3, rdkit.Chem.rdchem.BondType.SINGLE)
# let's see what we have so far
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; so far so good
# try to specify a Z stereo bond
db = rwmol.GetBondWithIdx(1)
assert(db.GetBondType() == rdkit.Chem.rdchem.BondType.DOUBLE) # just

checking
db.SetStereo(rdkit.Chem.rdchem.BondStereo.STEREOZ)
db.SetStereoAtoms(0, 3)
# let's see what we have now
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; not good enough
Chem.SanitizeMol(rwmol) # just checking
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; not getting better
---

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to construct a simple molecule with a Z stereo double bond using RWMol?

2021-01-14 Thread Francois Berenger

Hi Greg,

Thanks a lot for the working example!
Indeed, my code missing the 'Chem.AssignStereochemistry(rwmol)' call was 
the key.

I did not know about this function.

Regards,
F.

On 14/01/2021 18:36, Greg Landrum wrote:

Hi Francois,

I would do this by setting the stereo to either STEREOCIS or
STEREOTRANS and then calling Chem.AssignStereoChemistry():

In [6]: rwmol = Chem.RWMol()
   ...: # create the atoms
   ...: a0 = Chem.Atom(6)
   ...: a1 = Chem.Atom(7)
   ...: a2 = Chem.Atom(6)
   ...: a3 = Chem.Atom(16)
   ...: # add the atoms
   ...: rwmol.AddAtom(a0)
   ...: rwmol.AddAtom(a1)
   ...: rwmol.AddAtom(a2)
   ...: rwmol.AddAtom(a3)
   ...: # add the bonds
   ...: rwmol.AddBond(0, 1, rdkit.Chem.rdchem.BondType.SINGLE)
   ...: rwmol.AddBond(1, 2, rdkit.Chem.rdchem.BondType.DOUBLE)
   ...: rwmol.AddBond(2, 3, rdkit.Chem.rdchem.BondType.SINGLE)
Out[6]: 3

In [7]: db = rwmol.GetBondWithIdx(1)

In [8]: db.SetStereoAtoms(0,3)

In [9]: db.SetStereo(Chem.BondStereo.STEREOCIS)

In [10]: Chem.MolToSmiles(rwmol)
Out[10]: 'CN=CS'

In [11]: Chem.AssignStereochemistry(rwmol)

In [12]: Chem.MolToSmiles(rwmol)
Out[12]: 'C/N=C\\S'

On Thu, Jan 14, 2021 at 9:46 AM Francois Berenger 
wrote:


Hello,

Please tell me if you understand why the code below
is not working and if you know how to change it so that it does.

Thanks a lot! :)
F.

---
#!/usr/bin/env python3

# try to construct a molecule with a Z stereo double bond using
RWMol

import rdkit
from rdkit import Chem

wanted_smi = 'C/N=C\\S'

rwmol = Chem.RWMol()
# create the atoms
a0 = Chem.Atom(6)
a1 = Chem.Atom(7)
a2 = Chem.Atom(6)
a3 = Chem.Atom(16)
# add the atoms
rwmol.AddAtom(a0)
rwmol.AddAtom(a1)
rwmol.AddAtom(a2)
rwmol.AddAtom(a3)
# add the bonds
rwmol.AddBond(0, 1, rdkit.Chem.rdchem.BondType.SINGLE)
rwmol.AddBond(1, 2, rdkit.Chem.rdchem.BondType.DOUBLE)
rwmol.AddBond(2, 3, rdkit.Chem.rdchem.BondType.SINGLE)
# let's see what we have so far
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; so far so good
# try to specify a Z stereo bond
db = rwmol.GetBondWithIdx(1)
assert(db.GetBondType() == rdkit.Chem.rdchem.BondType.DOUBLE) # just

checking
db.SetStereo(rdkit.Chem.rdchem.BondStereo.STEREOZ)
db.SetStereoAtoms(0, 3)
# let's see what we have now
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; not good enough
Chem.SanitizeMol(rwmol) # just checking
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; not getting better
---

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to construct a simple molecule with a Z stereo double bond using RWMol?

2021-01-14 Thread Francois Berenger

Hello,

Please tell me if you understand why the code below
is not working and if you know how to change it so that it does.

Thanks a lot! :)
F.

---
#!/usr/bin/env python3

# try to construct a molecule with a Z stereo double bond using RWMol

import rdkit
from rdkit import Chem

wanted_smi = 'C/N=C\\S'

rwmol = Chem.RWMol()
# create the atoms
a0 = Chem.Atom(6)
a1 = Chem.Atom(7)
a2 = Chem.Atom(6)
a3 = Chem.Atom(16)
# add the atoms
rwmol.AddAtom(a0)
rwmol.AddAtom(a1)
rwmol.AddAtom(a2)
rwmol.AddAtom(a3)
# add the bonds
rwmol.AddBond(0, 1, rdkit.Chem.rdchem.BondType.SINGLE)
rwmol.AddBond(1, 2, rdkit.Chem.rdchem.BondType.DOUBLE)
rwmol.AddBond(2, 3, rdkit.Chem.rdchem.BondType.SINGLE)
# let's see what we have so far
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; so far so good
# try to specify a Z stereo bond
db = rwmol.GetBondWithIdx(1)
assert(db.GetBondType() == rdkit.Chem.rdchem.BondType.DOUBLE) # just 
checking

db.SetStereo(rdkit.Chem.rdchem.BondStereo.STEREOZ)
db.SetStereoAtoms(0, 3)
# let's see what we have now
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; not good enough
Chem.SanitizeMol(rwmol) # just checking
print(Chem.MolToSmiles(rwmol)) # --> 'CN=CS'; not getting better
---


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS pattern replacement inside a ring; without breaking the ring open...

2021-01-12 Thread Francois Berenger

On 12/01/2021 15:10, Fiorella Ruggiu wrote:

Hi Francois,

not sure if you have solved this yet. I believe it won't be possible
to use AllChem.ReplaceSubstructs without breaking the rings or
enumerating them. You can however use reactions for this problem.
Here's an example based on yours:

mol = Chem.MolFromSmiles('O=c1[nH]1')

rxn =
AllChem.ReactionFromSmarts('[c:1](=[O:2])[nH:3]>>[c:1]([O:2])[nH0:3]')

ps = rxn.RunReactants([Chem.MolFromSmiles('O=c1[nH]1')])

Chem.MolToSmiles(ps[0][0])

'Oc1n1'


I was fighting other fires. Thanks a lot for this example!


Hope this helps!

Best,

Fio

On Thu, Jan 7, 2021 at 10:33 PM Francois Berenger 
wrote:


Dear list,

I have been trying to replace this SMARTS pattern in a ring:

'c(=O)[nH]'

By this SMILES fragment:

'c(O)n'

My trials using a single SMARTS pattern search then replace
break open the ring, which is not what I want.

My not working trial code:
---
mol = Chem.MolFromSmiles('O=c1[nH]1')
pat = Chem.MolFromSmarts('c(=O)[nH]')
rep = Chem.MolFromSmarts('c(O)n')
res = AllChem.ReplaceSubstructs(mol,pat,rep)
Chem.MolToSmiles(res[0])
'c(n)O'
---

The example molecule is just an example; the ring might be smaller
and/or have more heteroatoms.

Should I use a chemical reaction for this?

Am I forced to describe full rings in both SMARTS patterns?!
I don't want to have to enumerate all the possibilities...

I can make it ~work~ using two replacements:
first 'c(=O)' to 'c(O)'
then
'[nH]' to 'n'
But this is less precise than what I really want
and I believe it will change molecules or places I don't want to
change.

Thanks a lot and happy new year!
F.

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMARTS pattern replacement inside a ring; without breaking the ring open...

2021-01-07 Thread Francois Berenger

Dear list,

I have been trying to replace this SMARTS pattern in a ring:

'c(=O)[nH]'

By this SMILES fragment:

'c(O)n'

My trials using a single SMARTS pattern search then replace
break open the ring, which is not what I want.

My not working trial code:
---
mol = Chem.MolFromSmiles('O=c1[nH]1')
pat = Chem.MolFromSmarts('c(=O)[nH]')
rep = Chem.MolFromSmarts('c(O)n')
res = AllChem.ReplaceSubstructs(mol,pat,rep)
Chem.MolToSmiles(res[0])
'c(n)O'
---

The example molecule is just an example; the ring might be smaller
and/or have more heteroatoms.

Should I use a chemical reaction for this?

Am I forced to describe full rings in both SMARTS patterns?!
I don't want to have to enumerate all the possibilities...

I can make it ~work~ using two replacements:
first 'c(=O)' to 'c(O)'
then
'[nH]' to 'n'
But this is less precise than what I really want
and I believe it will change molecules or places I don't want to change.

Thanks a lot and happy new year!
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How many bonds of a Type in a molecule

2020-12-08 Thread Francois Berenger

On 08/12/2020 23:01, José Emilio Sánchez Aparicio wrote:

Dear all,

 I need to find how many bonds of a certain type are in a molecule.
For example, for DOUBLE bonds, I would do:

 bond_number = 0
 for bond in mol.GetBonds():
 if bond.GetType() == Chem.BondType.DOUBLE:
 bond_number += 1

 However, searching for faster manners to do this, I found
"rdqueries". For example, to find how many atoms of Carbon there are
in a molecule, you could do:

 q = rdqueries.AtomNumEqualsQueryAtom(6)
 carbon_number = len(mol.GetAtomsMatchingQuery(q))

 I'm wondering if some of you know the equivalent in "rdqueries" to
find the number of bonds that match a type.


Such a function would probably do what your code is already doing.

It is not very useful to optimize such code fragment, unless
it is really taking most of your program runtime (which I doubt). ;)


 Many thanks in advance. Best,

 José-Emilio Sánchez
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GPU Implementation of shape-based 3D overlap on rdkit?

2020-11-08 Thread Francois Berenger

On 04/11/2020 04:26, Lewis Martin wrote:

Ive had an initial go at something like this using JAX. I chose JAX
since it has a shallow learning curve, essentially being numpy on a
GPU. This is great for vectorized calculations, but less so for
applications that involve a lot of control flow (ie if/else
statements), which as i understand it most point cloud registration
algorithms use, such as iterative closest point or anything available
in open3d.

No guarantee ill make any progress of course, but would someone mind
recommending a paper explaining a nice subshape alignment algorithm?


Grant, J.A.; Gallardo, M.A.; Pickup, B.T. (1996) ‘A fast method of 
molecular shape comparison: a simple application of a Gaussian 
description of molecular shape’, J. Comp. Chem. 17, 1653-1666 
[wiley/19961115]


From the abstract:
"A Gaussian description of molecular shape is used to compare the shapes 
of two molecules by analytically optimizing their volume intersection."


The Shape-it open-source program might have some code also.

Regards,
F.


Thanks :)
Lewis

On Wed, 4 Nov 2020 at 3:52 am, Andy Jennings
 wrote:


Hi Greg,

Thanks for the response and background. Here's hoping someone is
smart enough to code this up and generous enough to donate it back
to the community.

Best,
Andy

On Mon, Nov 2, 2020 at 8:52 PM Greg Landrum 
wrote:

Hi Andy,

At the moment the RDKit doesn't have either high-quality shape-based
alignment code[1] or GPU support.

I think having good shape-based alignment available would be a
really useful complement to the Open3DAlign code that's already
there, but it's certainly not a small project.

-greg
[1] The python implementation of the subshape alignment algorithm is
essentially just a proof-of-concept and not performant enough for
real usage.

On Mon, Nov 2, 2020 at 7:16 PM Andy Jennings
 wrote:

Hi,

I see that back in 2014 there was some discussion of using CUDA
inside of RDKit and how it may be possible to produce a
FastROCS-like open source alternative. I was curious if anyone had
made such a breakthrough. Since GPU availability is now so common,
and datasets are becoming so large, I figured that more and more
people would be thinking RDKit + GPU = :-)

Thanks in advance.
Andy ___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

 ___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
--
Sent from Gmail Mobile
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] unique chemical representation

2020-09-13 Thread Francois Berenger

On 12/09/2020 00:27, Mike Mazanetz wrote:

Dear Forum,

I'm curious as to how the community standardizes molecules to generate
unique chemical representations.

Please let me know what are people's referred means to treat:

* Tautomers
* Protomers
* Resonance structures
* Salts when the salt is larger than the ligand


Here is how ChEMBL does it:

https://github.com/chembl/ChEMBL_Structure_Pipeline

Not sure they handle all the cases you listed, though.

Regards,
F.


Particularly when converting between chemical representations SDF to
smiles, SMARTS to smiles, and one flavour of smiles to another.

And are there any caveats to consider, such as the correct assignment
of heterocyclic nitrogens as aromatic ?

I look forward to hearing your thoughts.

Regards,

mike
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] h-bond geometry

2020-09-08 Thread Francois Berenger

On 09/09/2020 01:33, Tim Dudgeon wrote:

Hi All,
thanks for the suggestions.

Greg, that's part of what's needed but there's also some more complex
logic needed. For instance, if the atom the H is attached to is
rotatable e.g. an OH group) then it is more complex than if it is
fixed (e.g a N in a ring).
I was wondering whether anyone had already encoded these types of
rules.

BTW I also found https://oddt.readthedocs.io/en/latest/ which seems to
handle a whole range of interaction types nicely, and can use RDKit as
its underlying toolkit (as well as OBabel).


More precisely:
https://oddt.readthedocs.io/en/latest/rst/oddt.html?module-oddt.interactions#module-oddt.interactions


Tim

On Tue, Sep 8, 2020 at 1:24 PM Greg Landrum 
wrote:


Hi Tim,

Assuming that you already have the indices of the atoms that you're
interested in looking at, it's pretty easy to calculate the angle
between three arbitrary atoms. Here's an example:

In [3]: m = Chem.AddHs(Chem.MolFromSmiles('COCO'))

In [4]: AllChem.EmbedMolecule(m)
Out[4]: 0

In [5]: conf = m.GetConformer()

In [6]: ps = [conf.GetAtomPosition(x) for x in
range(conf.GetNumAtoms())]

The atom0 - atom1 - atom2 angle:
In [7]: (ps[1]-ps[0]).AngleTo(ps[1]-ps[2])
Out[7]: 1.8295300825582068

Those happened to be bonded, but that's not necessary, here's the
atom1 - atom6 - atom3 angle:
In [15]: (ps[6]-ps[1]).AngleTo(ps[6]-ps[3])
Out[15]: 0.4862648980647286

Is that what you're looking for?

-greg

On Mon, Sep 7, 2020 at 3:06 PM Tim Dudgeon 
wrote:


Hi RDKitters,
I was wondering whether anyone has any RDKit code that checks on
the geometry of a H-bond.
e.g. once a donor and acceptor are located within a reasonable
distance of each other to check on the angles involved to
establish if that is a reasonable H-bond.
Tim

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Smallest possible size of 100*1e6 morgan fingerprints for storage and memory

2020-09-08 Thread Francois Berenger

On 09/09/2020 09:35, Lewis Martin wrote:

Hi RDKit,

Looking for advice on an rdkit-adjacent problem please. Ultimately I'd
like to fit an approximate-nearest neighbors index on a dataset of 100
million ligands, featurized by morgan fingerprint. The text file of
the smiles is ~6gb but this blows out when loaded with
pandas.read_csv() or f.readlines() due to weird memory allocation
issues.

It would take 45hrs to process the file in serial (i.e. read line,
create mol, fingerprint, convert to np.arr or sparse arrays) in a
streaming manner so now I'd like to parallelize the job with joblib,
which would multiply the memory requirements by the number of
processes running at a time.

So: what is the smallest possible representation for a binary
fingerprint? Using `sys.getsizeof` on a
rdkit.DataStructs.cDataStructs.ExplicitBitVect object tells me it is
96 bytes, but I'm not sure whether to believe that since, like
csr_matrix, the size depends on accurately returning the object's
data. Here's an example demonstrating this:

from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
smi = 'COCCN(CCO)C(=O)/C=C\\c1cccnc1'
mol = Chem.MolFromSmiles(smi)
gen_mo = rdFingerprintGenerator.GetMorganGenerator(radius=2,
fpSize=512)


Obviously, if you ask for fpSize = 512, the smallest uncompressed
representation of the fingerprint will be 512 bits (64 bytes).

10M of such fingerprints, if there is not any overhead added by the 
programming language,

would fit into 6GB of RAM.

But, the really fun things will start when you want to search fast into 
so many molecules. :)
There are many published methods, some open-source software (like 
Dalke's chemfp) and even some commercial ones
which claim they are lightning fast (even reaching real-time search 
speed!).


e.g.
https://chemaxon.com/products/madfast
https://www.nextmovesoftware.com/arthor.html

Regards,
F.


fp = gen_mo.GetFingerprint(mol)
sparse_fp = sparse.csr_matrix(fp)

print('ExplicitBitVect object size:', getsizeof(fp))
print('Sparse matrix size (naive):', getsizeof(sparse_fp))
print('Sparse matrix size (real):',
sparse_fp.data.nbytes+sparse_fp.indices.nbytes+sparse_fp.indptr.nbytes)
print('fp.ToBinary size:', getsizeof(fp.ToBinary()))
print('fp.ToBinary size:', getsizeof(fp.ToBase64()))




ExplicitBitVect object size: 96
Sparse matrix size (naive): 64
Sparse matrix size (real): 476
fp.ToBinary size: 85
fp.ToBinary size: 121

Note that even the smallest of these multiplied by 100 million would
be about 8gb, still larger than the text file storing the smiles codes
- not sure if that is to be expected or not?

Thank for your time!
Lewis
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit: While converting sdf file to fingerprint, facing several error

2020-08-06 Thread Francois Berenger

On 07/08/2020 03:15, dmaziuk via Rdkit-discuss wrote:

On 8/6/2020 7:14 AM, Pitanti Chalowa wrote:
...


DTXCID601285170
   Mrv1805 05101813452D


Does it have to have a blank line after '' ?


No: having the molecule's name/identifier in there is quite standard.


Dima



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit installation problem

2020-08-02 Thread Francois Berenger

Dear Sebastian,

Since last week, you should also be able to install rdkit on Linux
via linuxbrew:

---
sudo apt install linuxbrew-wrapper
brew tap rdkit/rdkit
brew update
brew install rdkit

# to test it
/home/linuxbrew/.linuxbrew/bin/python3
import rdkit
---

Thanks to Nuri Jung on github (@jnooree) for proposing a fix
to the brew rdkit install formula.

Regards,
F.

On 02/08/2020 03:03, Sebastián J. Castro wrote:

I have try the installation suggested at
http://www.rdkit.org/docs/Install.html:

$ conda create -c rdkit -n my-rdkit-env rdkit

But I get 2017 version instead of 2020 (last released).

I don't know how to install it. Can you help me?

I have Ubuntu 20.04 LTS

Thank you

Best regards!

--

Dr. Sebastián J. Castro
Departamento de Ciencias Farmacéuticas
Facultad de Ciencias Químicas
Universidad Nacional de Córdoba
UNITEFA-CONICET
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Is there someone who manages to compile rdkit Release_2020_03_4 from sources on a Mac?

2020-07-07 Thread Francois Berenger

Dear rdkiters,

I am trying to repair the brew formula.

Currently, I get this when I try to compile Release_2020_03_4:

$ cd build
$ cmake -DPYTHON_EXECUTABLE=/usr/local/bin/python-3.7 
-DRDK_INSTALL_INTREE=OFF -DRDK_BUILD_INCHI_SUPPORT=ON 
-DRDK_BUILD_AVALON_SUPPORT=ON -DRDK_BUILD_PYTHON_WRAPPERS=ON 
-DCMAKE_INSTALL_PREFIX=/usr ../


-->

Not sure this one is a show stopper:
---
-- Found InChI software locally
Python Install directory
CMake Error at CMakeLists.txt:290 (install):
  install DIRECTORY given no DESTINATION!
---

This one stops the show for sure:
---
CMake Error at Code/GraphMol/FilterCatalog/CMakeLists.txt:10 (if):
  if given arguments:

"VERSION_GREATER" "2.6"

  Unknown arguments specified
---

Thanks for any help,
F.

PS: I do think cmake is the worst build system I ever encountered in my 
life...



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to add a caption/legend to a 2D SVG depiction of a molecule?

2020-06-22 Thread Francois Berenger

On 22/06/2020 14:47, Francois Berenger wrote:

Dear RDKiters,

My current code looks like this:
---
AllChem.Compute2DCoords(mol) # generate 2D conformer
mol.SetProp("_Name", name)
d = rdMolDraw2D.MolDraw2DSVG(200, 200)
d.DrawMolecule(mol)


Answer to myself:
  d.DrawMolecule(mol, legend = name)


d.FinishDrawing()
out_fn = '%d.svg' % i
with open(out_fn, 'w') as out:
out.write(d.GetDrawingText())
---

How can I add some text into the generated SVG?

For example, the molecule name might be pretty useful to me.
But, I might want to annotate the molecule with whichever text I might 
need

in the future.

I could not find in the doc.
I could not find with some searches in rdkit's code...

Thanks a lot,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to add a caption/legend to a 2D SVG depiction of a molecule?

2020-06-21 Thread Francois Berenger

Dear RDKiters,

My current code looks like this:
---
AllChem.Compute2DCoords(mol) # generate 2D conformer
mol.SetProp("_Name", name)
d = rdMolDraw2D.MolDraw2DSVG(200, 200)
d.DrawMolecule(mol)
d.FinishDrawing()
out_fn = '%d.svg' % i
with open(out_fn, 'w') as out:
out.write(d.GetDrawingText())
---

How can I add some text into the generated SVG?

For example, the molecule name might be pretty useful to me.
But, I might want to annotate the molecule with whichever text I might 
need

in the future.

I could not find in the doc.
I could not find with some searches in rdkit's code...

Thanks a lot,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to calculate Tanimoto similarity score between reactions

2020-06-10 Thread Francois Berenger

On 10/06/2020 13:11, 丁邵珍 wrote:

Hi, I want to calculate Tanimoto similarity score of two reactions
('CCCO>>CCC=O', 'CC(O)C>>CC(=O)C'), I found all methods of  Tanimoto
similarity score calculation are for compounds. Could you please tell
me how to calculate the Tanimoto similarity score of reactions? I am
looking forward to your reply.


I don't know how to do it in rdkit, but if you need some inspiration,
here is how chemaxon does it:

https://docs.chemaxon.com/display/docs/Reaction_fingerprint_RF.html


Yours,
shaozhen
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Removing solvent and ions from dataset

2020-06-08 Thread Francois Berenger

On 06/06/2020 17:33, Max Pinheiro Jr wrote:

Hi RDkit team,

I am working on a chemically diverse dataset of smiles strings and I
need to do some preprocessing to clean a bit the data before starting
the modeling part. So I was looking for some tools or built-in
functions in RDkit to make such preprocessing by removing, for
instance, solvent (water) molecules and ions. I found the
"SaltRemover" module that may solve my problem with removing ions from
the database, but I could not find an equivalent module for the case
of solvent molecules. Does anyone know a specific tool in RDkit (or
any other python program) to make such preprocessing in the smile
strings? If so, could you please provide just a simple example of how
to do it? I will be really thankful for any help you may provide.


I have used this program several times:

https://github.com/flatkinson/standardiser

You can try this:
```
pip3 install chemo-standardizer
standardiser -i input.smi -o output_std.smi
```

I believe it uses rdkit under the hood.

Regards,
F.


Max Pinheiro Jr
-
Université Aix-Marseille, France
Institut de Chimie Radicalaire
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Compilation problems on Linux

2020-04-16 Thread Francois Berenger

Hi Max,

Not sure if it will help, but on Debian and Ubuntu you need the 
following

system packages to be installed in order to compile rdkit:

curl
wget
libboost-all-dev
cmake
git
g++
libeigen3-dev
python3
libpython3-all-dev
python3-numpy
python3-pip
python3-pil
python3-six
python3-pandas

What Linux distro are you using?

Doesn't your distribution provides python3 ready packages for rdkit?

Ideally, this is what you would want, especially if you install rdkit on 
all nodes of a computing cluster.


Regards,
F.

On 16/04/2020 01:33, Max Pinheiro Jr wrote:

Hi Paolo,

Thank you for your quite fast answer! Yes, I compiled Boost 1.67 using
the same gcc version, 8.1. I have seen this GLIBCXX possible solution
that you have commented before, and I also tried that but didn't work
anyway, I got the same problem with the Boost library and the
compilation can't finish. I am wondering if may exist any other
solution. I can also provide some other specific information if this
would help to map the problem and find a solution.

Thank you again!

Max Pinheiro Jr

Em qua., 15 de abr. de 2020 às 18:25, Paolo Tosco
 escreveu:


Hi Max,

you mention you are using gcc-8.1 and Boost 1.67. Did you compile
Boost with the same compiler or was it compiled with an earlier
version of gcc/g++?

If Boost was compiled with an earlier version of gcc/g++, you will
need to add to /home/mpinheiro/codes/rdkit-2020.09/CMakeLists.txt
the following line:

add_definitions("-D_GLIBCXX_USE_CXX11_ABI=0")

or the linker will fail during the compilation; see
https://github.com/rdkit/rdkit/issues/2013#issuecomment-553563418.

HTH, cheers
p.
On 15/04/2020 17:15, Max Pinheiro Jr wrote:


Dear all,

I have exhaustively tried to compile rdkit (latest git version) on
a Linux cluster but the compilation process was always failing at
the same point with an error message related to the boost library.
After searching in the forum, the only way I could surpass the
problem and finally get the program compiled was setting the flag
"RDK_USE_BOOST_SERIALIZATION" to OFF. However, when I do a simple
test trying to import the Chem module I get the following error:








from rdkit import Chem
Traceback (most recent call last):
File "", line 1, in 
File
"/home/mpinheiro/codes/rdkit-2020.09/rdkit/Chem/__init__.py", line
20, in 
from rdkit.Chem import rdchem
SystemError: initialization of rdchem raised unreported exception









I am using gcc-8.1, cmake-3.11.2 and the version 1.67 of boost
library to build RDKit. The compilation instructions I have used
are the following:

cmake -DPy_ENABLE_SHARED=1 \
-DRDK_INSTALL_INTREE=ON \
-DRDK_BUILD_CPP_TESTS=ON \
-DRDK_INSTALL_STATIC_LIBS=ON \
-DRDK_BUILD_AVALON_SUPPORT=ON \
-DRDK_BUILD_CAIRO_SUPPORT=ON \
-DRDK_BUILD_INCHI_SUPPORT=ON \
-DRDK_BUILD_PYTHON_WRAPPERS=ON \
-DRDK_BUILD_SWIG_CSHARP_WRAPPER=ON \



-DPYTHON_EXECUTABLE=/home/mpinheiro/.pyenv/versions/3.8.2/bin/python

\





-DPYTHON_LIBRARY=/home/mpinheiro/.pyenv/versions/3.8.2/lib/libpython3.8.a

\





-DPYTHON_INCLUDE_DIR=/home/mpinheiro/.pyenv/versions/3.8.2/include/python3.8

\
-DPYTHON_NUMPY_INCLUDE_PATH="$(python -c 'import numpy ;
print(numpy.get_include())')" \
-DBOOST_ROOT=/home/mpinheiro/codes/boost-1.67/ \

-DBOOST_INCLUDEDIR=/home/mpinheiro/codes/boost-1.67/include/boost
\
-DBOOST_LIBRARYDIR=/home/mpinheiro/codes/boost-1.67/lib ..

make -j 4 > make.log
make install

I have also checked the links created in the rdBase.so file as
shown below and everything seems to be fine:

linux-vdso.so.1 =>  (0x2aaab000)
libRDKitRDBoost.so.1 =>
/home/mpinheiro/codes/rdkit-2020.09/lib/libRDKitRDBoost.so.1
(0x2adb1000)
libboost_python38.so.1.67.0 =>
/home/mpinheiro/codes/boost-1.67/lib/libboost_python38.so.1.67.0
(0x2afb5000)
libRDKitRDGeneral.so.1 =>
/home/mpinheiro/codes/rdkit-2020.09/lib/libRDKitRDGeneral.so.1
(0x2b1fb000)
libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x2b423000)
libstdc++.so.6 =>
/trinity/shared/apps/custom/x86_64/gcc-8.1.0/lib64/libstdc++.so.6
(0x2b64)
libm.so.6 => /usr/lib64/libm.so.6 (0x2b9c4000)
libgcc_s.so.1 =>
/trinity/shared/apps/custom/x86_64/gcc-8.1.0/lib64/libgcc_s.so.1
(0x2bcc6000)
libc.so.6 => /usr/lib64/libc.so.6 (0x2bedf000)
librt.so.1 => /usr/lib64/librt.so.1 (0x2c2a2000)
libdl.so.2 => /usr/lib64/libdl.so.2 (0x2c4aa000)
libutil.so.1 => /usr/lib64/libutil.so.1 (0x2c6af000)
/lib64/ld-linux-x86-64.so.2 (0x4000)

As I said, I have tried many different tricks and suggestions that
I was able to find in the forum but none of them effectively
solved my problem to get the code working. So I would like to ask
you if someone has faced a similar problem and may already have
some tips on how to fix it. I will really appreciate 

Re: [Rdkit-discuss] The RDKit and GSoC 2020

2020-03-22 Thread Francois Berenger

On 20/03/2020 08:43, JW Feng wrote:

iwatobipen blog was where I found instructions for installing RDKit on
Colab.  It works but I found waiting for miniconda to install to be
too annoying. A one line apt-get command to install RDKit is easier
and faster  (~10 seconds) but it only works with Python 2.  Running
following command in a Python 3 environment results in the error
below. Getting apt-get to install RDKit correctly for Python 3 is a
good solution.

!apt-get install python-rdkit librdkit1 rdkit-data

from rdkit import Chem...

---

ModuleNotFoundError   Traceback (most recent call
last)

 [4] in ()
  1 get_ipython().system('apt-get install python-rdkit librdkit1
rdkit-data')
> 2 from rdkit import Chem

ModuleNotFoundError: No module named 'rdkit'


Try python2 instead of python3.

Your Linux distribution is probably shipping an old version of rdkit,
which only works with Python 2.

You can also push for your distribution to ship a recent version
of rdkit.

Altenatively, there is Scripts/create_deb_packages.sh
in the source tree that can create up to date packages on Ubuntu/Debian.

You need to remove your current rdkit installed packages prior to 
installing those

though.

Regards,
F.


---

Best,

JW

On Mon, Mar 16, 2020 at 4:48 AM Taka Seri  wrote:


Dear Steve, Greg and All,

Recently I moved from clab to Binder to make cloud env with python.
However I'll try to make my code more compact and share it.
Thanks for following my blog post. ;)
https://iwatobipen.wordpress.com/

Best regards,

Taka (tiwtter account / iwatobipen)

2020年3月16日(月) 16:03 Greg Landrum :

Thanks Steve,

That's really helpful. Given that we're unlikely to end up with a
decent pip-installable RDkit, I guess the snippet approach would be
the best way to go. I will try to make some time for this (or
convince iwatobipen to do it) in the reasonably near future.

Best,
-greg

On Sun, Mar 15, 2020 at 5:58 PM Steven Kearnes 
wrote:

re: rdkit+colab

In talking with folks outside of Google about rdkit+colab, I haven't
been able to establish that it's worth the trouble of making rdkit a
default dependency. It seems that a rather compact incantation [1]
does the job fairly well. This could be compressed even further, or
even turned into a colab snippet [2] for easier use.

Also, since colab doesn't play well with conda (as far as
pre-installed deps are concerned), we would at least need a
pip-installable rdkit to consider making this work.

Thanks,
Steve

On Mon, Mar 9, 2020 at 4:43 PM JW Feng  wrote:

Are you sure depictions in GSheet wouldn't be a good GSoC project?
I will ask around to find volunteers to connect with you on GSheets
and Colab.

On Fri, Mar 6, 2020 at 8:14 PM Greg Landrum 
wrote:

Hi JW,

I don't think it's a great GSoC project for a couple of reasons, but
I'd love to have RDKit integration in Google Sheets and am willing
to do some work to make that happen. I can poke around a bit to see
about how we could use the new RDKit-JS wrappers, but having access
to someone with experience writing Sheets add-ins would help. If you
know someone internally meeting that description, please put them in
touch with me.

I think making the code easily available in Colab can only be done
by someone inside google. I'm happy to help however I can with that
if you (or anyone else) can identify the right person.

Best,
-greg

On Sat, Mar 7, 2020 at 2:22 AM JW Feng  wrote:

Project suggestion:

Project 1:Implement 2D structure depiction in Google Spreadsheets.
My colleagues at Google think this is very doable.  Being able to
depict structures in Google Spreadsheets will dramatically increase
collaboration between scientists.  Imaging being able to provide
comments for a structure, design idea, or virtual screening hit in a
live Google Spreadsheet.  While there are commercial (Vortex,
Spotfire, MarvinView, Stardrop ...) and open source (Datawarrior)
packages that can read CSV files containing smiles and depict
structures, none comes close to GSheets for collaboration and ease
of use.

* Cells in columns named SMILES, or have SMILES as a substring in
the header, will be depicted in 2D using RDKit
* Cells with depicted structures move with other columns when
sorting, filtering, etc.
* Optional: depictions update when SMILES string is edited
* Bonus: calculate properties using formulas.  Ex:
Descriptors.MolWt(A1) calculates MW of SMILES in A1

Project 2:

* Make it easy to use RDKit in Google Colab [3]
* No need to install RDKit, from rdkit import Chem just works out
of the box

Best,

JW

On Sun, Feb 23, 2020 at 11:48 PM Greg Landrum
 wrote:

Dear all,

I'm happy to share that the RDKit will once again be part of Google
Summer of Code in 2020. This is a program where Google funds
students to work on open-source projects for a couple of months over
the summer. We've 

Re: [Rdkit-discuss] AdditionalOutput from FingerprintGenerator

2020-03-17 Thread Francois Berenger

On 17/03/2020 17:14, Chris Earnshaw wrote:

A quick comment on the cosine metric. Unlike Tanimoto it obeys the
triangle inequality, so in cases where it's used essentially as a
distance metric (e.g. some clustering applications) the results are
probably more mathematically correct.


The Tanimoto _distance_ is a valid metric, under certain conditions
(like vectors of only positive values).

For bitstrings, the formula is:
d = 1 - |AnB|/|AuB|

For float or integer vectors:

d = 1 - sum_i(min(a_i, b_i))/sum_i(max(a_i, b_i))

For the mathematical details, cf.

A proof of the triangle inequality for the Tanimoto distance
https://link.springer.com/article/10.1023%2FA%3A1019154432472

and

A note on the triangle inequality for the Jaccard distance
https://www.sciencedirect.com/science/article/pii/S0167865518309188

If you are used to using the Tanimoto score, there is no reason why not
to switch to the Tanimoto distance, if a true metric required by the 
underlying

algorithm/method.

Regards,
F.


I used it a lot in that context.
Whether it makes any real difference in practical terms is of course
questionable as the fingerprints themselves are only very approximate
descriptors.

All the best,
Chris

On Tue, 17 Mar 2020 at 07:28, Greg Landrum 
wrote:


Hi Jason,

On Mon, Mar 16, 2020 at 1:26 PM Jason Biggs 
wrote:


Thank you again Greg.  If you have time to get this in the
upcoming release great, do not rush on my account.


I spent some time looking at this tomorrow and it's not going to be
a quick one: it'll require some thought and refactoring to make the
bit info work in all circumstances. That means that it won't make it
into the upcoming release.


I have another couple of questions regarding fingerprints in
general and the fingerprint generators in particular.

* To what degree do people use the different fingerprint types?
Is it more common to use the RDKit fingerprint, for example, as a
bit vector, and the Morgan fingerprint as a counts vector?  Does
it depend on the application or is it more how a particular
fingerprint was historically used?


It's hard to be really sure, but I would guess that the Morgan
fingerprints are the most used. I'm also going to guess, and this is
based on even more of a gut feeling, that people are using bit
vectors, not count vectors. As for why... good question. Probably
because the Morgan fingerprints tend to work well in general (though
there definitely is no "best" fingerprint
https://link.springer.com/article/10.1186/1758-2946-5-26,
http://pubs.acs.org/doi/abs/10.1021/ci400466r), give results that
look "chemically similar" and there's lots of sample code around.


* I notice there is a wider variety of distance measures
available for bit vectors than for count vectors. Is this because
these measures, the McConnaughey similarity for example, aren't
extendable to multisets in the same way that Tversky similarity
can? Or is it just that there hasn't been any demand for
non-bitvector versions of the measures in BitOps.h?


Aeons ago when I wrote that code I wanted to be sure to have as many
possible metrics as possible available. Since then it's become clear
that Tanimoto/Dice (you can prove that these rank results exactly
the same) and Tversky (because it allows you to do asymmetric
measures) cover most every need. I've also seen people use cosine
similarity for comparing molecules of different sizes (though
asymmetric Tversky lets you do the same).


* Would it be useful to people for the FingerprintGenerator class
to return the list of atom invariants (or environments) used?  Or
is that what the BitInfo is used for?


there are generators available for the atom and bond invariants of
each of the fingerprint types. The fingerprint generators don't have
a method available that allows you to retrieve the atom/bond
invariant generators that they are using, but we could add this if
it would be useful.

-greg

Best,Jason

On Fri, Mar 13, 2020 at 11:13 PM Greg Landrum
 wrote:

Unfortunately it looks like the additional outputs for morgan, and
rdkit fingerprints are parts that weren't finished:


https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/MorganGenerator.cpp#L143




https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/Fingerprints/RDKitFPGenerator.cpp#L99


I will take a look and see if it's possible to get these into the
next release. In the meantime, if you want that info it looks like
you'll need to use the older fingerprinting functions.

-greg

On Fri, Mar 13, 2020 at 11:10 PM Jason Biggs 
wrote:

Thank you Greg.

I am working in C++.  I can poke around with this if I knew which
members of the AdditionalOutput struct are used by which fingerprint
generators.  I just wanted to make sure there wasn't an explanation
somewhere I missed.

I can see that with the AtomPairs fingerprints I can do the
following

//mol is an *ROMol and fpg is a *FingerprintGenerator

RDKit::AdditionalOutput ao;

std::vector>
atomtobits(mol->getNumAtoms());

Re: [Rdkit-discuss] RDkit/Anaconda: Fingerprints for a database

2020-03-12 Thread Francois Berenger

I have some Python code that might help in there:

https://github.com/UnixJunkie/consent/blob/master/bin/lbvs_consent_ecfp4.py

Regards,
F.

On 12/03/2020 21:21, Francesco Coppola wrote:

Hello everyone,
Before exposing my new problem, I wanted to thank everyone who helped
me in the previous discussion. Really thank you, I never expected so
much collaboration. I followed the advice, I started studying
something on Python too (I started online courses). But I would like
to explain what I would like to do and ask you if it is possible with
RDkit.

Basically I want to understand now how to get fingerprints from Smile
contained in a file (.txt .smi .sdf, it is indifferent) in the form of
bits of 1 and 0. For the moment I am able to do it with a single
smile, but I can't get the complete sequence since the maximum bit
that I can display is 1000. Is it possible to change it? Now I'll
explain:

(base) C:\Users\HP>conda activate py37_rdkit
(py37_rdkit) C:\Users\HP>python

Python 3.7.6 (default, Jan  8 2020, 20:23:39) [MSC v.1916 64 bit
(AMD64)] :: Anaconda, Inc. on win32

Type "help", "copyright", "credits" or "license" for more information.


import rdkit



from rdkit import Chem



from rdkit.Chem import Draw



from rdkit.Chem import Descriptors



from rdkit.Chem import AllChem



from rdkit import DataStructs



from __future__ import print_function







import numpy as np



info = {}



mol = Chem.MolFromSmiles('CCC')



fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1024,

bitInfo=info)


vector = np.array(fp)



vector


array([0, 0, 0, ..., 0, 0, 0])





Is there a way to view all the bits? The only way I know is to lower
the value of nBits to 1000 (which, however, I would not want to do).
And in fact:


mol = Chem.MolFromSmiles('CCC')



fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1000,

bitInfo=info)


vector = np.array(fp)



vector


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0,

   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

Re: [Rdkit-discuss] Markush Enumeration.

2020-02-24 Thread Francois Berenger

On 21/02/2020 22:45, Paolo Tosco wrote:

Hi Jitender,

you could do that quite easily using reaction SMARTS; see for example
this thread:

https://sourceforge.net/p/rdkit/mailman/message/35730514/

You could selectively replace a specific R attachment point by
isotopically labeling it.

Cheers,
p.
On 21/02/2020 09:55, Jitender Verma wrote:


Dear RDkit users,

I have a Markush structure with attachment points as R1, R2, and so
on. How can I use RDkit to enumerate all the structures using
specific R-groups from a database or library I have?


Maybe this will help:

https://practicalcheminformatics.blogspot.com/2018/05/free-wilson-analysis.html

Interesting blog by the way, with working code.

Regards,
F.


I am a new user of RDkit.

Thanks in anticipation

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Compiling rdkit-Release_2019_09_3 with python 3.7.5 and gnu gcc, g++ on MacOS 10.15

2020-01-20 Thread Francois Berenger

On 21/01/2020 01:29, Zoltan Takacs wrote:

Hi,

Thanks,

I repeated the compilation procedure on an Ubuntu machine with boost
1.62.0 and everything went smashingly. This indeed seems to be some
cmake boost mismatch on my mac. I will use an older version of boost
instead.


This should do the trick; maybe you don't need to downgrade boost:

-DBoost_NO_BOOST_CMAKE=ON

The solution advised by Greg (-DBoost_NO_BOOST_CMAKE=TRUE) might have
the same effect, though I did no try it.


Best,
Zoltan

On 20 Jan 2020, at 17:25, Greg Landrum  wrote:

Hi Zoltan,

I use the system compiler (clang++) on the Mac, so I don't have direct
experience here.

One problem is likely that the cmake argument you want is
-DBoost_NO_SYSTEM_PATHS=ON
those variable names are case sensitive.

The other point is that cmake didn't officiallly support boost 1.72
until cmake v3.16.2
I'm not sure that there's a version out of cmake that directly
supports boost 1.72.
If you're using an older (but still reasonably up-to-date) version of
cmake you might also try:
-DBoost_NO_BOOST_CMAKE=TRUE

-greg

On Sat, Jan 18, 2020 at 2:10 PM Zoltan Takacs 
wrote:


Dear RDKiters,

I would like to compile RDKit from source on a macOS 10.15 computer
with gnu c/c++ compilers and a newer version of boost (1.72.0)
instead of 1.56.0.

I run cmake with the following setting:

cmake -D


PYTHON_LIBRARY=/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/


-D


PYTHON_INCLUDE_DIR=/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/include/python3.7m/


-D


PYTHON_EXECUTABLE=/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/bin/python3


-D BOOST_ROOT=/usr/local/Cellar/boost/1.72.0/
-D BOOST_NO_SYSTEM_PATHS=ON
-D CMAKE_C_COMPILER=/usr/local/bin/gcc
-D CMAKE_CXX_COMPILER=/usr/local/bin/g++ ..

This ends up throwing me loads of errors but it starts like this:

CATCH:


/Users/all/data/rdkit-Release_2019_09_3/External/catch/catch/single_include

-- Found PythonInterp:


/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/bin/python3

(found version "3.7.5")
-- Found PythonLibs:


/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib

(found version "3.7.5")
-- Boost 1.56.0 found.
-- Found Boost components:
python3
PYTHON Py_ENABLE_SHARED: 0
PYTHON USING LINK LINE: -bundle -undefined dynamic_lookup -isysroot


/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk

-- Found Eigen3: /usr/local/include/eigen3 (Required is at least
version "2.91.0")
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE
CMake Warning at


/usr/local/lib/cmake/boost_serialization-1.72.0/libboost_serialization-variant-shared.cmake:64

(message):
Target Boost::serialization already has an imported location
'/usr/local/lib/libboost_serialization-mt.dylib', which will be
overwritten
with '/usr/local/lib/libboost_serialization.dylib'
Call Stack (most recent call first):



/usr/local/lib/cmake/boost_serialization-1.72.0/boost_serialization-config.cmake:57

(include)
/usr/local/lib/cmake/Boost-1.72.0/BoostConfig.cmake:120
(find_package)
/usr/local/lib/cmake/Boost-1.72.0/BoostConfig.cmake:185
(boost_find_component)



/usr/local/Cellar/cmake/3.13.2/share/cmake/Modules/FindBoost.cmake:264

(find_package)
CMakeLists.txt:361 (find_package)

It does not seem to use the specified boost libs of version 1.72.0
but it uses the 1.56.0. After this there are more error messages
thrown which are of the type:

CMake Error at Code/cmake/Modules/RDKitUtils.cmake:55 (add_library):
Target "MolStandardize_static" links to target "Boost::iostreams"
but the
target was not found.  Perhaps a find_package() call is missing
for an
IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
Code/GraphMol/MolStandardize/CMakeLists.txt:4 (rdkit_library)

and then it says that the following cmake option was not used:

-- Generating done
CMake Warning:
Manually-specified variables were not used by the project:

BOOST_NO_SYSTEM_PATHS

What are the correct settings for cmake to be able to use a newer
version of BOOST and PYTHON3?

Thanks,
Zoltan

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] if you are a RDKit user in Japan and you are free between Mar. 19-20 2020

2020-01-14 Thread Francois Berenger

Dear RDKit users,

You might consider joining "The 8th French-Japanese Workshop
on Computational Methods in Chemistry" (FJCMC2020).

Date:  Mar. 19-20, 2020
Venue: 100th Anniversary Hall of Engineering Faculty,
   Kurokami South Campus, Kumamoto University.

Website: https://www.chem.kumamoto-u.ac.jp/~frjp2020/index.html

Here is an excerpt of a message from the local organizer, Dr. Sugimoto:
---
For the lecture titles, see

https://www.chem.kumamoto-u.ac.jp/~frjp2020/invited-speakers.html

Now we have 33 invited speakers including 6 young researchers.

Herein, we have 6 female and 25 male speakers. Although the balance is 
not still good, please understand that

originally I had invited 6 more female researchers.
---

The lecture hall has more than 200 seats, so there is certainly room for
interested people.

Best regards,
Francois.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Exhaustive fragmentation of molecules

2020-01-08 Thread Francois Berenger

On 08/01/2020 20:47, Paolo Tosco wrote:

Dear Puck,

You may break a bond by creating a Chem.RWMol out of your Chem.Mol,
and then calling the RemoveBond() method on your Chem.RWMol, or you
may use dedicated functions in the rdmolops module. Individual
fragments can then be obtained by calling rdmolops.GetMolFrags().

I have put together a gist here:

https://gist.github.com/ptosco/3fb93b7c09dac15b6d355eb0ad29f532

 to show examples of the above; I hope this will help get you started
on your task.

Cheers,
p.
On 08/01/2020 10:39, Puck van Gerwen wrote:


Dear rdkit community,

I am looking to start from a mol object (loaded from an .xyz file)
and return all possible fragments (as mol objects) generated from
breaking one bond (any bond order). I don't want any pre-encoded
rules about which bonds to break as in BRICS. I saw some discussions
on the forum about using EditableMol or other mol types. Would you
be able to point me to the best way to do this?


Dear Puck,

You might find some interesting code in there:

https://github.com/liutairan/eMolFrag

It uses rdkit I think.

Regards,
F.


Thanks very much.

--

Puck van Gerwen
Doktorandin
Gruppe von Anatole von Lilienfeld
Universität Basel

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] passing options to javac when building from source

2020-01-05 Thread Francois Berenger

Hi Tim,

How do you compile rdkit for Java?

Last time I tried on a Mac, it did not work:
https://github.com/rdkit/homebrew-rdkit/issues/38

Thanks a lot,
F.

On 26/12/2019 23:22, Tim Dudgeon wrote:

When building the Java wrappers from source (the
-DRDK_BUILD_SWIG_WRAPPERS=ON option) is possible to specify options to
pass on to javac.

Specifically I'm wanting to use the '-source 8' option as most distros
now come with java11 (and make it difficult to install an earlier one)
but I want to build a version of org.RDKit.jar that is compatible with
older Java versions.

Thanks

Tim



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Need help with setting up the RDkit in Ubuntu

2019-12-08 Thread Francois Berenger

On 09/12/2019 12:24, ITS RDC wrote:

Hi Greg and all RDkit users,

 This is my first time posting in this mailing list because it's my
first time to use RDkit. This is also the first time I work with
Ubuntu OS (I have always been a Windows person) and I already
installed all relevant packages (Anaconda, Spyder3 and RDkit) for both
Windows and Linux. I was able to set up the environment in Windows but
not in Linux. I need help with setting up the RDBASE path and
environment because I could not find any forum discussing the setting
up of the RDBASE environment in Linux. Thank you!


After a fresh checkout of rdkit, you can build deb packages for it and 
install them:


---
git clone https://github.com/rdkit/rdkit.git
cd rdkit
./Scripts/create_deb_packages.sh
sudo dkkg -i build/*.deb
---

Be careful to uninstall any previous installation of rdkit on the system
or they might conflict and you will get strange installation errors.


 --

 Joanna Michelle Chua, RPh
 Isotope Techniques Section
 Nuclear Services Division
 Philippine Nuclear Research Institute
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Folding count vectors

2019-11-20 Thread Francois Berenger

On 20/11/2019 02:00, Benjamin Datko wrote:

Hello Francois,

I am trying to replicate some of the functionality of
CreateDifferenceFingerprintForReaction [Ref 1] for my own
understanding on how the code works. The function
CreateDifferenceFingerprintForReaction allows for three difference
fingerprint representation of the molecules: AtomPair, Morgan, and
TopologicalTorsion [Ref 2]. All three are count vectors [Ref 3], and
the function allows for variable fingerprint size output.


Personally, I wouldn't try to fold a count vector.
They are sparse vectors, so they don't take a lot of memory.
Also, they are less information lossy than binary fingerprints.

But, maybe Greg has some hack around, if you are really forced to do 
this.



I was following this post [Ref 4] describing how to create reaction
difference fingerprints using different fingerprints representation.
Using the code from the post I can create reaction difference
fingerprints using either Morgan or AtomPair, but comparing the output
from the post [Ref 4] to CreateDifferenceFingerprintForReaction
results in different size fingerprints, with different values within
the fingerprint, and different densities. I am assuming this due to
folding the count vector down to the default fingerprint size of 2048.


Example code snippet:

# The below defs are from the post
https://sourceforge.net/p/rdkit/mailman/message/35240736/

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit import DataStructs
import copy

def _createFP(mol,maxSize,fpType='AP'):
mol.UpdatePropertyCache(False)
if fpType == 'AP':
return AllChem.GetAtomPairFingerprint(mol, minLength=1,
maxLength=maxSize)
else:
Chem.GetSSSR(mol)
rinfo = mol.GetRingInfo()
return AllChem.GetMorganFingerprint(mol, radius=maxSize)

def getSumFps(fps):
summedFP = copy.deepcopy(fps[0])
for fp in fps[1:]:
summedFP += fp
return summedFP

def buildReactionFP(rxn, maxSize=3, fpType='AP'):
reactants = rxn.GetReactants()
products = rxn.GetProducts()
rFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
reactants])
pFP = getSumFps([_createFP(mol,maxSize,fpType=fpType) for mol in
products])
return pFP-rFP


rxn1 = AllChem.ReactionFromSmarts( '[C:1]C1C1>>[N:1]C1C1'

, useSmiles=True)


rxfp1 = buildReactionFP(rxn1,maxSize=2)



rxfp1.GetNonzeroElements()

{558114: -2, 574497: -1, 1066050: 2, 1066081: 1}


rxfp1.GetLength()

8388608

# Same reaction now using CreateDifferenceFingerprintForReaction

rxn1_fp = AllChem.CreateDifferenceFingerprintForReaction(rxn1)



rxn1_fp.GetNonzeroElements()


{1048: 10,
 1310: -20,
 1325: 20,
 1372: -10,
 1390: 20,
 1692: -10,
 1757: -20,
 1772: 10}


print(rxn1_fp.GetLength(),rxfp1.GetLength())

2048 8388608

References
1.
https://www.rdkit.org/docs/source/rdkit.Chem.rdChemReactions.html#rdkit.Chem.rdChemReactions.CreateDifferenceFingerprintForReaction
2.
https://www.rdkit.org/docs/cppapi/structRDKit_1_1ReactionFingerprintParams.html
3.
https://www.rdkit.org/docs/GettingStartedInPython.html#morgan-fingerprints-circular-fingerprints
4. https://sourceforge.net/p/rdkit/mailman/message/35240736/

v/r,

Ben

On Mon, Nov 18, 2019 at 10:13 PM Francois Berenger 
wrote:


On 19/11/2019 03:34, Benjamin Datko wrote:

Hello all,

I am curious on how to fold a count vector fingerprint. I

understand

when folding bit vectors the most common way is to split the

vector in

half, and apply a bitwise OR operation. I think this is how the
function rdkit.DataStructs.FoldFingerprint works in RDKit, correct

me

if I am wrong.

How does RDKit and or what is the appropriate way to fold count
vectors such as AtomPair, Morgan, and Topological torsion?


Can you give us some context? Why do you want to do that?

Maybe, you can use the following in order to create
shorter "fingerprints" for which the Tanimoto distance is
still computable (despite becoming approximate then):

---
Shrivastava, A. (2016).
Simple and efficient weighted minwise hashing.
In Advances in Neural Information Processing Systems (pp.
1498-1506).



https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf

---

Regards,
F.


I thought about turning the fingerprint into a bit vector using

their

respected "AsBitVect" method then folding using
rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't
have a "AsBitVect" method
[https://www.rdkit.org/docs/GettingStartedInPython.html].

For an explicit example using AtomPair fingerprint we can see the
fingerprint is extremely sparse. Could this AtomPair fingerprint

be

folded to increase the density?


from rdkit import Chem



from rdkit.Chem import AllChem



mol = Chem.MolFromSmiles('CC1C1')
ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1,

maxLength=3)


number_of_nonzero_elements =

len(ap_fp.GetNonzeroElements().values())


print((ap_fp.GetLength(),number_of_nonzero_elements))


Re: [Rdkit-discuss] Folding count vectors

2019-11-18 Thread Francois Berenger

On 19/11/2019 03:34, Benjamin Datko wrote:

Hello all,

I am curious on how to fold a count vector fingerprint. I understand
when folding bit vectors the most common way is to split the vector in
half, and apply a bitwise OR operation. I think this is how the
function rdkit.DataStructs.FoldFingerprint works in RDKit, correct me
if I am wrong.

How does RDKit and or what is the appropriate way to fold count
vectors such as AtomPair, Morgan, and Topological torsion?


Can you give us some context? Why do you want to do that?

Maybe, you can use the following in order to create
shorter "fingerprints" for which the Tanimoto distance is
still computable (despite becoming approximate then):

---
Shrivastava, A. (2016).
Simple and efficient weighted minwise hashing.
In Advances in Neural Information Processing Systems (pp. 1498-1506).

https://papers.nips.cc/paper/6472-simple-and-efficient-weighted-minwise-hashing.pdf
---

Regards,
F.


I thought about turning the fingerprint into a bit vector using their
respected "AsBitVect" method then folding using
rdkit.DataStructs.FoldFingerprint, but topological torsion doesn't
have a "AsBitVect" method
[https://www.rdkit.org/docs/GettingStartedInPython.html].

For an explicit example using AtomPair fingerprint we can see the
fingerprint is extremely sparse. Could this AtomPair fingerprint be
folded to increase the density?


from rdkit import Chem



from rdkit.Chem import AllChem



mol = Chem.MolFromSmiles('CC1C1')
ap_fp = AllChem.GetAtomPairFingerprint(mol, minLength=1,

maxLength=3)


number_of_nonzero_elements =

len(ap_fp.GetNonzeroElements().values())


print((ap_fp.GetLength(),number_of_nonzero_elements))

(8388608,9)

Very Respectfully,

Ben
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 2019.09.1 RDKit Release

2019-10-30 Thread Francois Berenger

On 25/10/2019 16:20, Greg Landrum wrote:

Dear all,

I'm pleased to announce that the next version of the RDKit - 2019.09 -
is released. The release notes are below.

The release files are on the github release page:
https://github.com/rdkit/rdkit/releases/tag/Release_2019_09_1

Binaries have been uploaded to anaconda.org [1]
(https://anaconda.org/rdkit).
The available conda binaries for this release are:
Linux 64bit: python 3.6, 3.7
Mac OS 64bit: python 3.6, 3.7
Windows 64bit: python 3.6, 3.7

Conda builds of the PostgreSQL cartridge are also available:
Linux 64bit: postgresql 9.6, 10, 11
Mac OS 64bit: postgresql 9.6, 10, 11

I believe that conda-forge will also switch to the new version in the
near future.

The online version of the documentation at rdkit.org [2]
(http://rdkit.org/docs/index.html) has been updated.

Some things that will be finished over the next couple of days:
- The conda build scripts will be updated to reflect the new version
- The homebrew script


Dear all,

The brew recipe has been updated.

If it doesn't work for you, please submit an issue here:

https://github.com/rdkit/homebrew-rdkit/issues

Regards,
Francois.


Thanks to everyone who submitted code, bug reports, and suggestions
for this release!

Please let me know if you find any problems with the release or have
suggestions for the next one, which is scheduled for March/April 2020.

Best Regards,
-greg

# Release_2019.09.1
(Changes relative to Release_2019.03.1)

## Important
- The atomic van der Waals radii used by the RDKit were
corrected/updated in #2154.
  This leads to different results when generating conformations,
molecular volumes,
  and molecular shapes.

## Backwards incompatible changes
- See the note about atomic van der Waals radii above.
- As part of the enhancements to the MolDraw2D class, we changed the
type of
  DrawColour from a tuple to be an actual struct. We also added a 4th
element to
  capture alpha values. This should have no affect on Python code (the
alpha
  value is optional when providing color tuples), but will require
changes to C++
  and Java/C# code that is using DrawColour.
- When reading Mol blocks, atoms with the symbol "R" are now converted
into
  queries that match any atom when doing a substructure search
(analogous to "*"
  in SMARTS). The previous behavior was to only match other dummy
atoms
- When loading SDF files using PandasTools.LoadSDF(), we now default
to
  producing isomeric smiles in pandas tables.  To reproduce the
original
  behavior, use isomericSmiles=False in the call to the function.
- The SMARTS generated by the RDKit no longer contains redundant
wildcard
  queries. This means the SMARTS strings generated by this release
will generally
  be different from that in previous releases, although the results
produced by
  the queries should not change.
- The RGroupDecomposition code now removes Hs from output R groups by
default.
  To restore the old behavior create an RGroupDecompositionParameters
object and
  set removeHydrogensPostMatch to false.
- The default values for some of the new fingerprint generators have
been changed so
  that they more closely resemble the original fingerprinting code. In
  particular most fingerprinters no longer do count simulation by
default and
  the RDKit fingerprint now sets two bits per feature by default.
- The SMARTS generated for MCS results using the ringMatchesRingOnly
or
  completeRingsOnly options now includes ring-membership queries.

## Highlights:
- The substructure matching code is now about 30% faster. This also
improves the
  speed of reaction matching and the FMCS code. (#2500)
- A minimal JavaScript wrapper has been added as part of the core
release. (#2444)
- It's now possible to get information about why molecule sanitization
failed. (#2587)
- A flexible new molecular hashing scheme has been added. (#2636)

## Acknowledgements:
Patricia Bento, Francois Berenger, Jason Biggs, David Cosgrove, Andrew
Dalke,
Thomas Duigou, Eloy Felix, Guillaume Godin, Lester Hedges, Anne
Hersey,
Christoph Hillisch, Christopher Ing, Jan Holst Jensen, Gareth Jones,
Eisuke
Kawashima, Brian Kelley, Alan Kerstjens, Karl Leswing, Pat Lorton,
John
Mayfield, Mike Mazanetz, Dan Nealschneider, Noel O'Boyle, Stephen
Roughley,
Roger Sayle, Ricardo Rodriguez Schmidt, Paula Schmiel, Peter St. John,
Marvin
Steijaert, Matt Swain, Amol Thakkar Paolo Tosco, Yi-Shu Tu, Ricardo
Vianello,
Marc Wittke, '7FeiW', 'c56pony', 'sirbiscuit'

## Bug Fixes:
  - MCS returning partial rings with completeRingsOnly=True
 (github issue #945 from greglandrum)
  - Alternating canonical SMILES for fused ring with N
 (github issue #1028 from greglandrum)
  - Atom index out of range error
 (github issue #1868 from A-Thakkar)
  - Incorrect cis/trans stereo symbol for conjugated ring
 (github issue #2023 from baoilleach)
  - Hardcoded max length of SMARTs string cut of input query for
FragCatlog
 (github issue #2163 from 7FeiW)
  - VSA_EState {1, ..., 10}

[Rdkit-discuss] I updated the rdkit brew install recipe

2019-09-30 Thread Francois Berenger

Dear rdkit users,

Recently, I updated the brew install recipe for rdkit on Mac.
The biggest change is that boost and boost-python's versions were
pinned down, so that the brew install recipe should be much more 
reproducible

than before.

Here is a fail-safe way to install rdkit with it (with Python wrappers, 
and InChI support):


```
brew tap rdkit/rdkit
brew update
which python3 || brew install python3
pip3 install numpy
brew unlink boost || echo boost_not_installed
brew unlink boost-python3 || echo boost-python3_not_installed
brew install rdkit --with-python3 --with-inchi
```

It should install a tagged version of rdkit (Release_2019_03_2).

Of course, since this is open-source, your help in maintaining this 
formula

is very welcome.

Cf.
https://github.com/rdkit/homebrew-rdkit

If it doesn't work for you, you can open an issue there:
https://github.com/rdkit/homebrew-rdkit/issues

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aromatic bonds and graph edit distance

2019-08-21 Thread Francois Berenger

On 21/08/2019 17:34, Andrew Dalke wrote:

On Aug 21, 2019, at 03:42, Francois Berenger  wrote:

Unless rdkit has something, I think graph edit distance is the kind
of things for which you have to rely on a good graph library.


Do you know of any (non-chemical) graph library which can handle edits
involving the breaking of aromatic bonds in a chemically correct way?
I do not.

Also, maybe the string edit distance between the two canonical smiles 
is a good enough proxy.


This attempt of mine now, to experiment with graph edit distance, came
out of a conversation I had last week with someone using string edit
distance. I expressed doubt on how "good" the "good enough" was, but
was unable to give any concrete details.

I earlier wrote:

For chain bonds, and non-aromatic bonds, it's easy to delete the bond
and add the correct number of hydrogens to either side.


Similarly, for many chain edits, the string edit distance is a decent
proxy, as you say.

However, has the goodness ever been characterized? Along with a
description of how to minimize the problems with string edit distance?
Some of the obvious ones are:

1) Chirality and stereochemistry

L-alanine and D-alanine have a graph edit distance to alanine with
unspecified chirality are 4 and 5, respectively.

  N[C@H](C)C(=O)O
  N[C@@H](C)C(=O)O
  NC(C)C(=O)O

This does not seem reasonable. A similar issue occurs with double bond
sterochemistry, like F/C=C/F vs. FC=CF.

2) Isotopes

Same issue: CN vs. [14CH3]N.

3) Overlapping element symbols

c1c1C and c1c1Cl have an edit distance of 1
c1c1C and c1c1Br have an edit distance of 2

There is no chemical sense for those to have different distances.

I can think of ways to mitigate some of the effects of #1-3.


If you want to push this hack further, it seems that some string
tokenization would be useful. Then the string edit distance is run
on lists of tokens instead of the original strings (maybe that's what 
you

call a substitution matrix).


In
particularly, a substitution matrix (or conversion to pharmacophore
reduced graphs) can improve #3.

4) Sensitivity to canonicalization order

Depending on the canonicalization method, the following two structures
either have a string edit distance of 1 or 4, while the graph edit
distance is 1.


Chem.CanonSmiles("PCCN")

'NCCP'

Chem.CanonSmiles("CCN")

'CCN'


5) difficulty in handling ring formation in a meaningful way


Chem.CanonSmiles("C1=CC=CC=C1")

'c1c1'

Chem.CanonSmiles("C=CC=CC=C")

'C=CC=CC=C'

There are no shared string synbols, so the string edit distance is 9,
yet the bond edit distance is only 1.


Yes, hacks don't bring you very far, usually. :)

Regards,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aromatic bonds and graph edit distance

2019-08-20 Thread Francois Berenger

On 21/08/2019 05:06, Andrew Dalke wrote:

Hi all,

  Someone asked me recently about finding the graph edit distance of
two small (<= 14 atom) fragments.

I figured this was something that could be brute forced. Following
SmallWorld's example at
https://cisrg.shef.ac.uk/shef2016/talks/oral13.pdf , given a fragment,
incrementally delete terminals (except the "*" connection point atom),
and ring bonds.


Unless rdkit has something, I think graph edit distance is the kind
of things for which you have to rely on a good graph library.

Also, maybe the string edit distance between the two canonical smiles is 
a good enough proxy.



For chain bonds, and non-aromatic bonds, it's easy to delete the bond
and add the correct number of hydrogens to either side.

But, what should I do when I cut an aromatic bond?

For something like the first "co" in "c1cocn1", I want the result to
be C=CN=CO. That's because the "o" can only be "-O-" in Kekule form.

For something like "c1cnncn1", breaking on the "nn", I think I would
like to get both 'N=CC=NC=N' and 'NC=CN=CN' because the "nn" can be a
single or a double bond, depending on the Kekule representation, as
in:


Chem.CanonSmiles("C-1=N-N=C-C=N-1")

'c1cnncn1'

Chem.CanonSmiles("C-1=N.N=C-C=N-1")

'N=CC=NC=N'


Chem.CanonSmiles("C=1-N=N-C=C-N=1")

'c1cnncn1'

Chem.CanonSmiles("C=1-N-[HH].[HH]N-C=C-N=1")

'NC=CN=CN'

Problem is, I don't know how to figure out if a given aromatic bond
must be a "-" or "=", or can be both.

(Well, I could brute-force enumerae all 2**n possible aromatic bond
assignments, then canonicalize, and see if both assignments are
possible for a given bond.)

As a non-chemist, I also ask if I'm even on a chemically meaningful 
track.



Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to turn off labels and bonds coloring when calling Draw.SimilarityMaps.GetSimilarityMapFromWeights(mol, weights)?

2019-07-11 Thread Francois Berenger

Dear rdkiters,

I am playing with rdkit.Chem.Draw:
---
sim_map = Draw.SimilarityMaps.\
  GetSimilarityMapFromWeights(mol, weights)
---

I don't like that in the created figure, the map colors overlap
with atoms and bonds colors.
It makes the map less readable.
I would prefer all labels and bonds to be black, only the map
to bring colors.

Also, since atoms have labels, I feel that coloring them
(and their bonds) is some unnecessary duplication of information.

Can the atom labels and bonds default coloring scheme be turned off?

From my reading of rdkit code, it seems that the elemDict in
Draw.DrawingOptions should be emptied.

I don't know how to pass such an option, and if this is even possible,
to GetSimilarityMapFromWeights.

Thanks a lot,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] any paper on fingerprint pre-selection/feature selection?

2019-06-18 Thread Francois Berenger

On 18/06/2019 05:41, Mario Lovrić wrote:

Dear all,

Is there any paper discussing some sort of pre-selection/feature
selection with fingerprints?


I know of this one at least:
---
Bender, A., Mussa, H. Y., Glen, R. C., & Reiling, S. (2004). Molecular 
similarity searching using atom environments, information-based feature 
selection, and a naive Bayesian classifier. Journal of chemical 
information and computer sciences, 44(1), 170-178.

---

Not very recent, but to the point given your question.


Any rule of thumb? E.g. dont keep fingeprints if less than 5% hits?

Thanks

--

Mario Lovrić
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Open-source business models and the RDKit

2019-03-27 Thread Francois Berenger

On 27/03/2019 16:24, Francois Berenger wrote:

On 27/03/2019 01:46, Greg Landrum wrote:

And now that I've included two other messages, here's (part of) my
take on this.

The viability of open-source business models is something I'm deeply
interested in (I pay rent these days thanks to income from two
open-source companies) and, like Andrew, something I've put a fair
amount of thought into. Capturing all of that here is probably
impossible, so here are a few points that I think are important.

- We need to be really careful about drawing conclusions from projects
like Linux, Eclipse, etc. Andrew hit on this already, but the
potential base of potential donors/contributors to these projects is
several orders of magnitude larger than the potential base for
something like the RDKit, OpenBabel, or Chemfp.
- Geoff pointed out the possibility of setting up a not-for-profit
organization that can take donations and then disburse them. I'm not
going to do this; dealing with that kind of paperwork is something I
dislike and am terrible at. Going via OpenCollective (which Geoff
pointed to) is a possibility, but they would end up taking >10% of
each donation for overhead, credit card fees, etc. That seems steep,
but 80+% of something is still better than 100% of nothing.


I looked at the cost structure in here:

https://opencollective.com/pricing

I understand they would take 13.6% in total (the scenario in which
they manage the money + accounting, etc.).
That's something, for sure, but not crazy.


If rdkit was accepted at the software freedom conservancy, I understand
the management fee would be 10%:

https://sfconservancy.org/projects/apply/

https://sfconservancy.org/projects/services/


- It's worth pointing out that it is already possible for companies
that want to directly support the RDKit to do so: getting an RDKit
support contract from my company (T5 Informatics GmbH) very directly
supports my work on the RDKit and the infrastructure needed to do
that. Given that the support contract may seem too expensive for small
orgs, I could also easily set something up for companies who want to
show support (and perhaps be listed as sponsors) at a lower price
point. I doubt there's any demand for that, but I'd be happy to be
wrong there.
- Another mechanism that's always available to companies is to just
pay an open-source developer to work on their open-source project.
This can take the form of funding development of a particular feature,
creating documentation, etc.
- That last bullet point likely works for academics too: think about
adding some support for open-source development to your next grant
proposal. I would assume that there are ways to engineer this.

For individuals to financially contribute is trickier... there's a
voice in the back of my head that's saying that it will never be
financially worth it to set something like this up for communities as
small as ours,[1] but I have to think about that one for a while.


As an open-source project, I feel rdkit is quite successful.
So, the user community is not so small.
Some people who cannot contribute time could contribute money to the 
project

(especially if it is tax-deductible, I guess).

Regards,
F.


I'm sure there's more to come, but I want to go ahead and hit "send"
-greg
[1] one-time donations would feel great, but they don't help when
making long-term plans unless you can assume that more will
continuously come in...
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Open-source business models and the RDKit

2019-03-27 Thread Francois Berenger

On 27/03/2019 01:46, Greg Landrum wrote:

And now that I've included two other messages, here's (part of) my
take on this.

The viability of open-source business models is something I'm deeply
interested in (I pay rent these days thanks to income from two
open-source companies) and, like Andrew, something I've put a fair
amount of thought into. Capturing all of that here is probably
impossible, so here are a few points that I think are important.

- We need to be really careful about drawing conclusions from projects
like Linux, Eclipse, etc. Andrew hit on this already, but the
potential base of potential donors/contributors to these projects is
several orders of magnitude larger than the potential base for
something like the RDKit, OpenBabel, or Chemfp.
- Geoff pointed out the possibility of setting up a not-for-profit
organization that can take donations and then disburse them. I'm not
going to do this; dealing with that kind of paperwork is something I
dislike and am terrible at. Going via OpenCollective (which Geoff
pointed to) is a possibility, but they would end up taking >10% of
each donation for overhead, credit card fees, etc. That seems steep,
but 80+% of something is still better than 100% of nothing.


I looked at the cost structure in here:

https://opencollective.com/pricing

I understand they would take 13.6% in total (the scenario in which they 
manage the money + accounting, etc.).

That's something, for sure, but not crazy.


- It's worth pointing out that it is already possible for companies
that want to directly support the RDKit to do so: getting an RDKit
support contract from my company (T5 Informatics GmbH) very directly
supports my work on the RDKit and the infrastructure needed to do
that. Given that the support contract may seem too expensive for small
orgs, I could also easily set something up for companies who want to
show support (and perhaps be listed as sponsors) at a lower price
point. I doubt there's any demand for that, but I'd be happy to be
wrong there.
- Another mechanism that's always available to companies is to just
pay an open-source developer to work on their open-source project.
This can take the form of funding development of a particular feature,
creating documentation, etc.
- That last bullet point likely works for academics too: think about
adding some support for open-source development to your next grant
proposal. I would assume that there are ways to engineer this.

For individuals to financially contribute is trickier... there's a
voice in the back of my head that's saying that it will never be
financially worth it to set something like this up for communities as
small as ours,[1] but I have to think about that one for a while.


As an open-source project, I feel rdkit is quite successful.
So, the user community is not so small.
Some people who cannot contribute time could contribute money to the 
project

(especially if it is tax-deductible, I guess).

Regards,
F.


I'm sure there's more to come, but I want to go ahead and hit "send"
-greg
[1] one-time donations would feel great, but they don't help when
making long-term plans unless you can assume that more will
continuously come in...
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] chemfp preprint

2019-03-24 Thread Francois Berenger

On 23/03/2019 04:39, Andrew Dalke wrote:

Hi RDKit users,

  This week I submitted a paper about chemfp for publication. I also
submitted a preprint on ChemRxiv, which was just accepted.

For those interested, it's at
https://chemrxiv.org/articles/The_Chemfp_Project/7877846 .

It's a rather long paper as it covers many aspects about the chemfp
project, including the FPS and FPB formats, search algorithms, details
about the different ways to compute a popcount, and memory bandwidth
and latency bottlenecks. On a non-technical level I also describe some
of the difficulties I ran into trying to run chemfp as "commercial
free software."


The part about funding free software is quite interesting (I just 
skimmed through this part of the paper, sorry).


Sometimes, I wish there was a rdkit consortium/NPO (so that donations 
are tax deductible), so that rdkit could be massively funded by all its 
commercial users, and even accepting individual donations.


When you think about Linux, several developers are paid
full-time either by the Linux foundation (I think) or by large companies 
using Linux,

to work on the Linux kernel full-time.
I guess it gives them a lot of manpower to push their open-source 
project forward

and maintain it in the long run.


Let me know of any corrections or improvements, or any other feedback
you might have.

Cheers,

Andrew
da...@dalkescientific.com




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Are there Ubuntu packages for rdkit for python-3.6 somewhere?

2019-03-13 Thread Francois Berenger

Hello,

I know where to find packages for python-2.7.
No idea for python 3 though.

Thanks,
F.


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] AM1-BCC charges for small molecules

2019-03-11 Thread Francois Berenger

On 12/03/2019 03:55, James T. Metz via Rdkit-discuss wrote:

RDkit Discussion Group,

I am interested in generating and assigning AM1-BCC charges to
small molecules,


You can do it with Chimera.

Cf. 
http://www.cgl.ucsf.edu/chimera/current/docs/ContributedSoftware/addcharge/addcharge.html


Though you would have to create a script to do it efficiently for 
several molecules.


---
#!/usr/bin/python2

# convert a single molecule 3D .sdf with hydrogens to a .mol2 with 
AM1-BCC partial charges


from chimera import runCommand, openModels
from AddCharge import estimateNetCharge
import os, sys

# I need an env. var to pass the input file name, so that chimera 
doesn't

# try to read that file
sdf = os.environ['INPUT_FILE']

runCommand("open " + sdf)
molecule = openModels.list()[0]
net_charge = estimateNetCharge(molecule.atoms)
runCommand("addcharge nonstd #0 " + str(net_charge) + " method am1")
runCommand("write format mol2 #0 " + sdf[:-4] + ".mol2")
runCommand("close all")
---

I also had a shell script on top of that, to process a single molecule.

---
#!/bin/bash

# convert a .sdf to a .mol2 with partial charges assigned by chimera's 
AM1-BCC FF


# trick to pass an input_file to the python script and not having 
chimera

# try to interprete that file
export INPUT_FILE=$1

chimera --nogui --script ~/bin/chimeraAM1-BCC.py
---

Please ask the chimera ML if you need more help.

I don't guarantee those scripts still work.

Regards,
F.


preferably in batch mode.  I understand this topic has been discussed
previously, but
has there been RDkit code written to do this?  Since this relies on
the results of AM1
calculations, has anyone perhaps written RDkit code to calculate and
assign the
charges if I have already generated a MOPAC output file by some other
means?

I greatly appreciate all the capabilities of RDkit, and not to be
off-topic, but if
someone is aware of a non-RDkit way to generate AM1-BCC charges, that
might work
for me.  Hence, please let me know.  Thank you.

Regards,

Jim Metz
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Scaffold Tree implementation

2019-02-14 Thread Francois Berenger

On 14/02/2019 19:44, Colin Bournez wrote:

Dear all,

I would like to know if there is the possibility to use the Scaffold
Tree algorithm in RDKit?
In the documentation, there is an existing page :
http://rdkit.org/docs/source/rdkit.Chem.Scaffolds.ScaffoldTree.html
[1]

But it is empty...


Maybe this open source software can do what you need:

http://scaffoldhunter.sourceforge.net/

Regards,
F.


I also saw message in rdkit list from 2015 but since there nothing.
Are there any news?

Cheers,

--
 Signature
*

 BOURNEZ COLIN [2]  [3]
Chemoinformatics PhD Student

  INSTITUTE OF ORGANIC AND ANALYTICAL CHEMISTRY (ICOA UMR7311)
 Université d'Orléans - Pôle de Chimie
 Rue de Chartres - BP 6759
 45067 Orléans Cedex 2 - France
 +33 (0)2 38 49 45 77 [4]
 SBC Tool Platform [5] - SBC Team [6] [7]

 [8]

  [9]

 [10]



Links:
--
[1] http://rdkit.org/docs/source/rdkit.Chem.Scaffolds.ScaffoldTree.html
[2] https://www.linkedin.com/in/colin-bournez-b9a1b2b7/
[3] http://www.icoa.fr/
[4] tel:+33%202%2038%2049%2045%2077
[5] http://sbc.icoa.fr/
[6] http://www.icoa.fr/bonnet
[7] http://www.icoa.fr/fr/rss.xml
[8]
https://www.facebook.com/pages/Institut-de-Chimie-Organique-et-Analytique-ICOA-umr7311/222060911297163
[9] https://twitter.com/ICOA_UMR7311
[10]
https://www.linkedin.com/company/institut-de-chimie-organique-et-analytique---icoa-umr7311/

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Pharmacophore atom typing for torsion or atom pair FP

2019-01-31 Thread Francois Berenger

Hi,

I have a related question:
how to output the type of an atom in a molecule,
if possible in a human-readable format; i.e. a human
readable/understandable string rather than some (obscure) integer.

I am interested to look at the atom types used by the ECFP
and the FCFP fingerprints.

Thanks a lot,
Francois.

On 31/01/2019 08:49, Lewis Martin wrote:

Thanks so much Greg!

If I catch your drift, you are talking about the new fingerprint
generators from the google summer of code. I took a look myself since
I was curious.

Here's a notebook demonstrating how I think it works:
https://github.com/ljmartin/snippets/blob/master/snippet_fp_with_invariants.ipynb
[3]
This downloads some bioactivity data from chembl and then compares
standard AP or TT fingerprints with same using the atom invariants
associated with the MorganFP "Feature" atom typing, which is actually
the feature types from the Gobbi/Poppinger paper.  As expected, the
invariant versions have higher similarity! It's not CATS but this
seems equivalent for my purposes - thanks!

Hopefully it's close to the mark - looking forward to seeing other
examples too.
cheers
lewis

On Thu, Jan 31, 2019 at 12:03 AM Greg Landrum 
wrote:


Hi Lewis,

This is a great chance to demonstrate some of the things that can be
done with the new fingerprint generation code. It's going to take me
a bit to put this together (it's all new enough that I'm still not
quite "fluent"), but I will try to get an example put together over
the next couple of days.

-greg

On Wed, Jan 30, 2019 at 4:59 AM Lewis Martin
 wrote:


Hi rdkitters,
I'd like to compare the similarity of torsion/atom pair FPs using
standard atomic numbering with those using pharmacophore types,
like the 'CATS' atom typing developed by Gisbert Schneider, and
hoped someone has some advice here. _CATS_ is a pharmacophore atom
typing system with these types: H-bond donor, H-bond acceptor,
positive, negative, lipophilic, and CATS2 has 'aromatic'. These
are described in: _“Scaffold‐Hopping” by Topological
Pharmacophore Search: A Contribution to Virtual Screening. _It
seems pretty close to the Gobbi 2D pharmacophore typing, or the
features used in FCFP.

Ive no problem detecting the atom types - I borrowed code from the
open source PyBioMed - but I'm stuck at the next step. How to
change the atoms into their pharmacophore types to then make a
torsion or atom pair fingerprint using RDKit? What I've tried so
far is to just set the atomic number to some series of 5 atoms not
normally seen in drug like molecules, like 40-44. This is silly
but it seems to work. The only issue is trouble kekulizing the
molecules for display. Is there a better way?

Here's a snippet to demonstrate what I mean, it's adapted from
PyBioMed and any errors are probably mine:




https://github.com/ljmartin/snippets/blob/master/atom_typing_snippet.ipynb

[1]

Thanks for your time!
lewis

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss [2]



Links:
--
[1] 
https://github.com/ljmartin/snippets/blob/master/atom_typing_snippet.ipynb

[2] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[3]
https://github.com/ljmartin/snippets/blob/master/snippet_fp_with_invariants.ipynb

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Dividing inputstream over threads

2019-01-14 Thread Francois Berenger

On 15/01/2019 09:53, Andreas Luttens wrote:

Hi!

I have developed a small script that calculates molecules properties
for molecules that are stored in a SMILES file. The properties should
be stored in an SQL database, which works fine, but I would like to
speed up the process a bit. I was thinking of implementing some
parallelization for the calculating of properties and storing into
separate connections to my SQL database. I have done this before in
Python with OpenEye and seems to be doing the trick. I would however
want my code to useable by people who do not hold a license for
OpenEye, which is why I try RDKit. I would like my code to be in C++
as well.


In C++, you could use OpenMP and the parallel for pragma.


I was wondering how I would tackle this problem. Does the RDKit have a
similar functionality as an "oemolithread" to chunk up the incoming
stream? I haven't found something like this when I first scrolled
through documentation. If it is not implemented, how would I divide
the work on incoming molecules over N threads?

All help is very appreciated. Thanks in advance.

Best regards,

Andreas Luttens
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to encode atomic contributions to logP (hydrophobicity) in MOL2 formatted charge slot?

2018-11-13 Thread Francois Berenger

On 14/11/2018 02:42, James T. Metz via Rdkit-discuss wrote:

RDkit Discussion Group,

 Given a set of small molecules as a SDF file, I would like to
generate a MOL2

file where the atomic contributions to logP (hydrophobicity) from each
atom including
hydrogens have been calculated and are now encoded in the partial
atomic charge
"slot" in a MOL2 file. Is this possible using RDkit?

 I have found code that calculates the atomic contributions for the
Crippen

logP model and then generates a colorized 2D plot:


from rdkit.Chem import rdMolDescriptors
contribs = rdMolDescriptors._CalcCrippenContribs(mol)
fig = SimilarityMaps.GetSimilarityMapFromWeights(mol,[x for x,y in

contribs], ˓→colorMap='jet', contourLines=10)

 However, I would like to encode the atomic contributions as partial
atomic charges so

that this information can be written out in a MOL2 file for each atom.


 Does anyone have PYTHON/RDkit code to do this? Thank you.


Hello,

I have this script that does part of the job:

---
#!/usr/bin/env python

from __future__ import print_function

import sys

import rdkit
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors

if len(sys.argv) != 3:
print("usage: %s input.sdf output.pl" % sys.argv[0])
sys.exit(1)

sdf_input = sys.argv[1]
pl_output = sys.argv[2]

output = open(pl_output, 'w')

ok_count = 0
failed_count = 0

for mol in Chem.SDMolSupplier(sdf_input, removeHs = False):
if mol: # not null test
name = mol.GetProp("_Name")
cLogPcontribs = [x for x,y in 
rdMolDescriptors._CalcCrippenContribs(mol)]

output.write("COMPND %s\n" % name)
conf = mol.GetConformer()
# atoms = mol.GetAtoms()
for i, contrib in enumerate(cLogPcontribs):
pos = conf.GetAtomPosition(i)
output.write("%f %f %f %f\n" % (pos.x, pos.y, pos.z, 
contrib))

output.write("END\n")
ok_count += 1
else:
failed_count += 1

output.close()
print ("OK: %d - failed: %d\n" % (ok_count, failed_count))
---

Regards,
F.


 Regards,

 Jim Metz


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Plotting values next to atoms

2018-11-05 Thread Francois Berenger

On 03/11/2018 04:27, Greg Landrum wrote:

Hi Eric,

On Fri, Nov 2, 2018 at 2:00 PM Eric Jonas  wrote:


Hello! I'm trying to figure out if there's any known or sane way to
automatically plot numerical values adjacent to atoms using the
rdkit drawing machinery. Ideally I'd like to annotate certain atoms
programmatically with values. I think the conventional way this is
done for publication is post-hoc editing in illustrator but it would
be great if there was an automatic or supported mechanism.


Hi Eric,

One hackish way is to export your molecules in 3D in the PDB format.
Then shove your numeric value into the B-factor field.
Then, color your molecules by B-factor using your favorite viewer 
(Chimera, pymol, etc.).
For this to work correctly, you may have to scale your values so that 
they become in the same

range than B-factors.

Wit a little programming, you could also export your molecules (in 3D, 
still) directly
as balls in the BILD format of UCSF Chimera, and look at them into 
Chimera.


https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/bild.html

Regards,
Francois.


Doing this correctly is on the list of high-priority things to do, and
I really hope to have something done for the 2019.03 release, but
there's no way I can guarantee that (it's a hard problem).

In the meantime, there's a way to at least do something that is,
hopefully, better than nothing:
https://gist.github.com/greglandrum/8cf8ecc3253abf0a5139a776a5095163
[1]

displayed here:
https://nbviewer.jupyter.org/gist/greglandrum/8cf8ecc3253abf0a5139a776a5095163
[2]


Links:
--
[1] 
https://gist.github.com/greglandrum/8cf8ecc3253abf0a5139a776a5095163

[2]
https://nbviewer.jupyter.org/gist/greglandrum/8cf8ecc3253abf0a5139a776a5095163

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Stable format for long-term storage

2018-10-08 Thread Francois Berenger

On 06/10/2018 02:27, Eric Jonas wrote:

Hello! Is there a recommended stable format for long-term storage of
RDKit molecules? Will ToBinary() give me what I need? (the
documentation / purpose seems to be a bit... spartan)  I'd like to
save topology, conformers, and properties (at the atom, bond, and
molecule level) to disk a format that is likely to persist across
RDKit versions / other libraries. I've had bad experiences
historically with pickle, which is not recommended for long-term
storage.


Isn't a (compressed?) sdf file the way to go then?


Thanks,

...Eric
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Butina clustering with additional output

2018-09-26 Thread Francois Berenger

On 21/09/2018 16:53, Chris Earnshaw wrote:

Hi

I'm afraid I can't help with an RDkit solution to your question, but
there are a couple of issues which should be born in mind:
1) The centroid of a cluster is a vector mean of the fingerprints of
all the members of the cluster and probably will not be represented
_exactly_ by any member of the cluster; in this case no structures
will have a distance of 0.0 from the centroid. Do you want to
calculate the distances from the true centroid or from the
structure(s) closest to the centroid?


I have seen 'clustroid' in the literature to mean
cluster member nearest to the centroid of that cluster.


2) The Tanimoto metric doesn't obey the triangle inequality and is
therefore sub-optimal for this kind of analysis. It's better to use an
alternative which does obey the triangle inequality - e.g. the Cosine
metric.


The opposite is true.

Sven Kosub. A note on the triangle inequality for the jaccard distance.
CoRR, abs/1612.02696, 2016.

Alan H. Lipkus. A proof of the triangle inequality for the tanimoto dis-
tance. Journal of Mathematical Chemistry, 26(1):263–265, Oct 1999.

While cosine similarity is not a metric, according to wikipedia.

I'm not a mathematician, but I think (1 - Tanimoto) is a proper distance
as long as the molecules are encoded with only positive values.
So, Boolean fingerprints are OK, and counted unfolded fingerprints
as well.

Regard,
Francois.


Regards,
Chris Earnshaw

On Thu, 20 Sep 2018 at 21:55, James T. Metz via Rdkit-discuss
 wrote:


RDkit Discussion Group,

I note that RDkit can perform Butina clustering. Given an SDF
of
small molecules I would like to cluster the ligands, but obtain
additional
information from the clustering algorithm. In particular, I would
like to obtain
the cluster number and Tanimoto distance from the centroid for every
ligand
in the SDF. The centroid would obviously have a distance of 0.00.

Has anyone written additional RDkit code to extract this
additional information?

Thank you.

Regards,

Jim Metz

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss [1]



Links:
--
[1] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-15 Thread Francois BERENGER
On 01/16/2018 06:43 AM, Dimitri Maziuk wrote:
> On 01/15/2018 02:43 PM, Tim Dudgeon wrote:
> 
>> Could there be something in a more general project to bridge the
>> compound (mol/smiles), sequence (protein/nucleotide seq + alignments)
>> and structure (pdb/mmcif/mmtf) worlds?
> 
> FWIW PDB builds everything up from structure because they can derive
> bonds from the coordinates and that's the only way you can do it in the
> code. Without bonds, trying to link compounds in a sequence doesn't
> really work even if you have two cysteins in a bog standard protein
> sequence, with generic compounds it gets too hard fast.
> 
> PDB has in the mmCIF chem. comp. model "leaving atom flag" that marks
> *a* bonding site but it doesn't tell you what kind of bond can form
> there, nor what to do if there's more than one. You need a whole lot of
> other code to figure out how to link two compounds into a sequence.

Chimera can do the PDB to MOL2 conversion (so there might be some
code to look at since UCSF Chimera is open source).

If you want their algorithm, I think it is in there:

@article {JCC:JCC540120716,
author = {Meng, Elaine C. and Lewis, Richard A.},
title = {Determination of molecular topology and atomic hybridization
states from heavy atom coordinates},
journal = {Journal of Computational Chemistry},
volume = {12},
number = {7},
publisher = {John Wiley & Sons, Inc.},
issn = {1096-987X},
url = {http://dx.doi.org/10.1002/jcc.540120716},
doi = {10.1002/jcc.540120716},
pages = {891--898},
year = {1991},
}

> And then there's structure calculation that I don't know if there's
> anything that works on not proteins, or can predict disordered regions
> well etc.
> 
> If anyone's counting votes, pretty 2D depictions get mine.
> 
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-15 Thread Francois BERENGER
On 01/16/2018 05:51 AM, Tim Dudgeon wrote:
> Incorporating and "industrialising" Matt's MolVS tautomer and
> standardizer code?
> http://molvs.readthedocs.io/en/latest/index.html

If we can vote, I would vote for this one.

> On 15/01/18 07:09, Greg Landrum wrote:
>> Dear all,
>>
>> We've been invited again to participate in the OpenChemistry
>> application for Google Summer of Code.
>>
>> In order to participate we need ideas for projects and mentors to go
>> along with them.
>>
>> The current list of RDKit ideas is being maintained here:
>> http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas
>>
>> (Note: at the point that I'm pressing "send", that's still a copy of
>> last year's project ideas).
>>
>> If you're willing to be a mentor (please ask me about the ~5
>> hours/week required here) or have ideas, please reply to this thread.
>>
>> Best,
>> -greg
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-15 Thread Francois BERENGER
Supporting mol2 files as input would be nice.

There is already some code out there, people have worked on it and
several people would like to have the feature...

On 01/15/2018 04:09 PM, Greg Landrum wrote:
> Dear all,
> 
> We've been invited again to participate in the OpenChemistry application
> for Google Summer of Code.
> 
> In order to participate we need ideas for projects and mentors to go
> along with them.
> 
> The current list of RDKit ideas is being maintained here:
> http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas
> 
> (Note: at the point that I'm pressing "send", that's still a copy of
> last year's project ideas).
> 
> If you're willing to be a mentor (please ask me about the ~5 hours/week
> required here) or have ideas, please reply to this thread.
> 
> Best,
> -greg
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitizing SD file

2017-12-13 Thread Francois BERENGER
On 12/14/2017 02:10 PM, Greg Landrum wrote:
> 
> On Thu, Dec 14, 2017 at 4:22 AM, Francois BERENGER
> <beren...@bioreg.kyushu-u.ac.jp <mailto:beren...@bioreg.kyushu-u.ac.jp>>
> wrote:
> 
> On 12/14/2017 05:15 AM, Sundar wrote:
> > Hi RDkit users,
> >
> > I encounter following sanitize issue while I was trying to load an SD
> > file using
> > Chem.SDMolSupplier('lig.sdf')
> >
> > Explicit valence for atom # 16 N, 4, is greater than permitted
> > ERROR: Could not sanitize molecule ending on line 3145
> 
> I also encounter this exact error sometimes.
> 
> Is there a way to tell rdkit to automatically correct this atom type?
> 
> 
> The code currently only automatically corrects cases where it's really,
> really obvious what the correction should be, like C-N(=O)=O ->
> C-[N+](=O)[O-].

Where is this in the code?
I might have a look one day.

> If there are additional commonly "misdrawn" functional groups, it would
> be straightforward to add them
>
> I guess that sanitization failure means the molecule
> goes to the trash, which is terrible when there are so few molecules to
> learn from.
> 
> The philosophy taken in the RDKit is that it's better to have a bad
> structure be rejected than it is to try and learn from it.
> If you disagree with this, it is pretty easy to switch off the
> sanitization checks and keep the bad molecules.

I understand. I also guess unsanitized molecules would make some things
crash, just later.

> -greg
> 
>  
> 
> 
> > The molecule RDkit complains about has a charged N atom.
> > How do I sanitize it to fix these errors without losing its charge and
> > 3D coordinates?
> > Or how to disregard all these errors and get all the molecules read with
> > nothing missing?
> >
> > Thanks,
> > Sundar
> >
> >
> >
> >
> >
> 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> <mailto:Rdkit-discuss@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
> >
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> <mailto:Rdkit-discuss@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> <https://lists.sourceforge.net/lists/listinfo/rdkit-discuss>
> 
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitizing SD file

2017-12-13 Thread Francois BERENGER
On 12/14/2017 05:15 AM, Sundar wrote:
> Hi RDkit users,
> 
> I encounter following sanitize issue while I was trying to load an SD
> file using
> Chem.SDMolSupplier('lig.sdf')
> 
> Explicit valence for atom # 16 N, 4, is greater than permitted
> ERROR: Could not sanitize molecule ending on line 3145

I also encounter this exact error sometimes.

Is there a way to tell rdkit to automatically correct this atom type?

I guess that sanitization failure means the molecule
goes to the trash, which is terrible when there are so few molecules to
learn from.

> The molecule RDkit complains about has a charged N atom.
> How do I sanitize it to fix these errors without losing its charge and
> 3D coordinates?
> Or how to disregard all these errors and get all the molecules read with
> nothing missing?
> 
> Thanks,
> Sundar
> 
> 
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] read sdf file without removing hydrogen atoms.

2017-12-12 Thread Francois BERENGER
On 12/13/2017 11:03 AM, Chicago Ji wrote:
> Dear RDkit Users,
> 
> Rdkit would delete all hydrogen atoms when read in a sdf file.
> Since I want to use charge information of all atoms in sdf file, I want
> to keep all hydrogen atoms when readin.
> Is there something like  Chem.SmilesParserParams() for sdfsupplier ?

Maybe you want to do this:

---
import rdkit
from rdkit import Chem
[...]
for mol in Chem.SDMolSupplier(sdf_filename, removeHs = False):
[...]
---

Regards,
F.

> Many thanks for your help.
> 
> Best,
> Changge
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] it would be nice to have a working 'brew install rdkit' on Mac OS X

2017-12-11 Thread Francois BERENGER
Hello,

Apparently, the homebrew recipe for rdkit is broken on Mac.

That's not very cool, since brew is the reference tool to install
software from source (automatically) on the Mac.

Regards,
F.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RPM distros

2017-11-27 Thread Francois BERENGER
On 11/28/2017 12:42 AM, Tim Dudgeon wrote:
> I see exactly the same when I build with those cmake args.

Maybe you are missing some of the dependencies.
I don't think the packages we create have all the dependency information:

fonts-freefont-ttf,
libboost-python1.58.0,
libboost-regex1.58.0,
libboost-system1.58.0,
libboost-thread1.58.0,
libc6 (>= 2.14),
libgcc1 (>= 1:4.1.1),
libpython2.7 (>= 2.7),
libstdc++6 (>= 5.2),
python (<< 2.8),
python (>= 2.7~)

You should install the ones you are missing and test again.

> On 27/11/2017 09:11, Francois BERENGER wrote:
>> On 11/27/2017 06:01 PM, Tim Dudgeon wrote:
>>> I did:
>>>
>>> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_INSTALL_INTREE=OFF
>>> -DCMAKE_INSTALL_PREFIX=/usr/ ..
>> Try this instead, just for the cmake part:
>>
>> cmake -Wno-dev \
>>  -DRDK_INSTALL_INTREE=OFF \
>>  -DRDK_BUILD_INCHI_SUPPORT=ON \
>>  -DRDK_BUILD_AVALON_SUPPORT=ON \
>>  -DRDK_BUILD_PYTHON_WRAPPERS=ON \
>>  -DCMAKE_INSTALL_PREFIX=/usr \
>>  -DRDKit_VERSION=`date +%Y.%m` \
>>  ../
>>
>> then do the rest (cpack ...) and test again
>> after an install of the freshly created package.
>>
>> I advise to wipe out any prior rdkit install from your machine
>> before installing the new packages (so that we test what we intend to
>> test).
>>
>> On a Debian-like:
>> sudo apt-get remove $(dpkg -l | grep rdkit | awk '{print $2}')
>>
>>> cpack -G DEB
>>> cpack -G RPM
>>>
>>>
>>> On 27/11/2017 00:05, Francois BERENGER wrote:
>>>> Hello,
>>>>
>>>> What are the exact commands you used to configure and compile rdkit?
>>>>
>>>> The script in there is my best attempt:
>>>>
>>>> https://github.com/rdkit/rdkit/pull/1655
>>>>
>>>> Regards,
>>>> F.
>>>>
>>>> On 11/25/2017 12:50 AM, Tim Dudgeon wrote:
>>>>> I got round to testing the debs and rpms but without success.
>>>>>
>>>>> For the debs the following were built:
>>>>>
>>>>> RDKit-2018.03.1.dev1-Linux-Development.deb
>>>>> RDKit-2018.03.1.dev1-Linux-Extras.deb
>>>>> RDKit-2018.03.1.dev1-Linux-Python.deb
>>>>> RDKit-2018.03.1.dev1-Linux-Runtime.deb
>>>>>
>>>>> On a clean Ubuntu Xenial system, with just python added (apt-get -y
>>>>> install python) the packages installed fine:
>>>>>
>>>>> # dpkg -i *.deb
>>>>> Selecting previously unselected package rdkit-development.
>>>>> (Reading database ... 5666 files and directories currently installed.)
>>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Development.deb ...
>>>>> Unpacking rdkit-development (2018.03.1.dev1) ...
>>>>> Selecting previously unselected package rdkit-extras.
>>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Extras.deb ...
>>>>> Unpacking rdkit-extras (2018.03.1.dev1) ...
>>>>> Selecting previously unselected package rdkit-python.
>>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Python.deb ...
>>>>> Unpacking rdkit-python (2018.03.1.dev1) ...
>>>>> Selecting previously unselected package rdkit-runtime.
>>>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Runtime.deb ...
>>>>> Unpacking rdkit-runtime (2018.03.1.dev1) ...
>>>>> Setting up rdkit-development (2018.03.1.dev1) ...
>>>>> Setting up rdkit-extras (2018.03.1.dev1) ...
>>>>> Setting up rdkit-python (2018.03.1.dev1) ...
>>>>> Setting up rdkit-runtime (2018.03.1.dev1) ...
>>>>>
>>>>> There seem to be header files in /usr/include/rdkit and the RDKit
>>>>> installation (.py and .so files) in
>>>>> /usr/lib/python2.7/dist-packages/rdkit
>>>>>
>>>>> But RDKit doesn't work from Python:
>>>>>
>>>>> # python
>>>>> Python 2.7.12 (default, Nov 19 2016, 06:48:10)
>>>>> [GCC 5.4.0 20160609] on linux2
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>>>> import rdkit
>>>>> Traceback (most recent call last):
>>>>>     File "", line 1, in 
>>>>>     File "/usr/lib/python2.7/dist-packages/rdkit/__init__.py", line
>>>>> 2, in
>>>>> 
>>>

Re: [Rdkit-discuss] RPM distros

2017-11-27 Thread Francois BERENGER
On 11/27/2017 06:01 PM, Tim Dudgeon wrote:
> I did:
> 
> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_INSTALL_INTREE=OFF
> -DCMAKE_INSTALL_PREFIX=/usr/ ..

Try this instead, just for the cmake part:

cmake -Wno-dev \
-DRDK_INSTALL_INTREE=OFF \
-DRDK_BUILD_INCHI_SUPPORT=ON \
-DRDK_BUILD_AVALON_SUPPORT=ON \
-DRDK_BUILD_PYTHON_WRAPPERS=ON \
-DCMAKE_INSTALL_PREFIX=/usr \
-DRDKit_VERSION=`date +%Y.%m` \
../

then do the rest (cpack ...) and test again
after an install of the freshly created package.

I advise to wipe out any prior rdkit install from your machine
before installing the new packages (so that we test what we intend to test).

On a Debian-like:
sudo apt-get remove $(dpkg -l | grep rdkit | awk '{print $2}')

> cpack -G DEB
> cpack -G RPM
> 
> 
> On 27/11/2017 00:05, Francois BERENGER wrote:
>> Hello,
>>
>> What are the exact commands you used to configure and compile rdkit?
>>
>> The script in there is my best attempt:
>>
>> https://github.com/rdkit/rdkit/pull/1655
>>
>> Regards,
>> F.
>>
>> On 11/25/2017 12:50 AM, Tim Dudgeon wrote:
>>> I got round to testing the debs and rpms but without success.
>>>
>>> For the debs the following were built:
>>>
>>> RDKit-2018.03.1.dev1-Linux-Development.deb
>>> RDKit-2018.03.1.dev1-Linux-Extras.deb
>>> RDKit-2018.03.1.dev1-Linux-Python.deb
>>> RDKit-2018.03.1.dev1-Linux-Runtime.deb
>>>
>>> On a clean Ubuntu Xenial system, with just python added (apt-get -y
>>> install python) the packages installed fine:
>>>
>>> # dpkg -i *.deb
>>> Selecting previously unselected package rdkit-development.
>>> (Reading database ... 5666 files and directories currently installed.)
>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Development.deb ...
>>> Unpacking rdkit-development (2018.03.1.dev1) ...
>>> Selecting previously unselected package rdkit-extras.
>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Extras.deb ...
>>> Unpacking rdkit-extras (2018.03.1.dev1) ...
>>> Selecting previously unselected package rdkit-python.
>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Python.deb ...
>>> Unpacking rdkit-python (2018.03.1.dev1) ...
>>> Selecting previously unselected package rdkit-runtime.
>>> Preparing to unpack RDKit-2018.03.1.dev1-Linux-Runtime.deb ...
>>> Unpacking rdkit-runtime (2018.03.1.dev1) ...
>>> Setting up rdkit-development (2018.03.1.dev1) ...
>>> Setting up rdkit-extras (2018.03.1.dev1) ...
>>> Setting up rdkit-python (2018.03.1.dev1) ...
>>> Setting up rdkit-runtime (2018.03.1.dev1) ...
>>>
>>> There seem to be header files in /usr/include/rdkit and the RDKit
>>> installation (.py and .so files) in 
>>> /usr/lib/python2.7/dist-packages/rdkit
>>>
>>> But RDKit doesn't work from Python:
>>>
>>> # python
>>> Python 2.7.12 (default, Nov 19 2016, 06:48:10)
>>> [GCC 5.4.0 20160609] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import rdkit
>>> Traceback (most recent call last):
>>>    File "", line 1, in 
>>>    File "/usr/lib/python2.7/dist-packages/rdkit/__init__.py", line 2, in
>>> 
>>>  from .rdBase import rdkitVersion as __version__
>>> ImportError: libpython2.7.so.1.0: cannot open shared object file: No
>>> such file or directory
>>>
>>> For the rpms the story is similar. The same 4 files are built as rpms.
>>> Installing them on a clean centos7 machine went fine and the files seem
>>> to get installed to the same places.
>>> But RDKit again couldn't be used from Python, but with a different
>>> error:
>>>
>>> # python
>>> Python 2.7.5 (default, Aug  4 2017, 00:39:18)
>>> [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import rdkit
>>> Traceback (most recent call last):
>>>    File "", line 1, in 
>>> ImportError: No module named rdkit
>>>
>>> On 15/11/2017 20:18, David Hall wrote:
>>>> apt install rpm
>>>>
>>>> should get you rpmbuild
>>>>
>>>> -David
>>>>
>>>> On Nov 15, 2017, at 2:59 PM, Tim Dudgeon <tdudgeon...@gmail.com
>>>> <mailto:tdudgeon...@gmail.com>> wrote:
>>>>

Re: [Rdkit-discuss] RPM distros

2017-11-26 Thread Francois BERENGER
braries:
>>>>> --   python
>>>>> Python Install directory /usr/lib/python2.7/dist-packages
>>>>> -- Could NOT find Eigen3 (missing:  EIGEN3_INCLUDE_DIR
>>>>> EIGEN3_VERSION_OK) (Required is at least version "2.91.0")
>>>>> Eigen3 not found, disabling the Descriptors3D build.
>>>>> -- Boost version: 1.62.0
>>>>> -- Found the following Boost libraries:
>>>>> --   thread
>>>>> --   system
>>>>> --   chrono
>>>>> --   date_time
>>>>> --   atomic
>>>>> -- Boost version: 1.62.0
>>>>> -- Found the following Boost libraries:
>>>>> --   serialization
>>>>> == Using strict rotor definition
>>>>> == Updating Filters.cpp from pains file
>>>>> == Done updating pains files
>>>>> -- Boost version: 1.62.0
>>>>> -- Found the following Boost libraries:
>>>>> --   regex
>>>>> -- Configuring done
>>>>> -- Generating done
>>>>> -- Build files have been written to: /rdkit/build
>>>>>
>>>>>
>>>>> root@f083c3e3b6a1:/rdkit/build# cpack -G DEB
>>>>> CPack: Create package using DEB
>>>>> CPack: Install projects
>>>>> CPack: - Run preinstall target for: RDKit
>>>>> CPack: - Install project: RDKit
>>>>> CPack: -   Install component: runtime
>>>>> CPack: -   Install component: base
>>>>> CPack: -   Install component: data
>>>>> CPack: -   Install component: docs
>>>>> CPack: -   Install component: dev
>>>>> CPack: -   Install component: python
>>>>> CPack: -   Install component: extras
>>>>> CPack: Create package
>>>>> CPack: - package:
>>>>> /rdkit/build/RDKit-2018.03.1.dev1-Linux-Development.deb generated.
>>>>> CPack: - package:
>>>>> /rdkit/build/RDKit-2018.03.1.dev1-Linux-Extras.deb generated.
>>>>> CPack: - package:
>>>>> /rdkit/build/RDKit-2018.03.1.dev1-Linux-Python.deb generated.
>>>>> CPack: - package:
>>>>> /rdkit/build/RDKit-2018.03.1.dev1-Linux-Runtime.deb generated.
>>>>>
>>>>>
>>>>> root@f083c3e3b6a1:/rdkit/build# cpack -G RPM
>>>>> CPack: Create package using RPM
>>>>> CPack: Install projects
>>>>> CPack: - Run preinstall target for: RDKit
>>>>> CPack: - Install project: RDKit
>>>>> CPack: -   Install component: runtime
>>>>> CPack: -   Install component: base
>>>>> CPack: -   Install component: data
>>>>> CPack: -   Install component: docs
>>>>> CPack: -   Install component: dev
>>>>> CPack: -   Install component: python
>>>>> CPack: -   Install component: extras
>>>>> CPack: Create package
>>>>> CMake Error at /usr/share/cmake-3.7/Modules/CPackRPM.cmake:1573
>>>>> (message):
>>>>>   RPM package requires rpmbuild executable
>>>>> Call Stack (most recent call first):
>>>>>   /usr/share/cmake-3.7/Modules/CPackRPM.cmake:2442
>>>>> (cpack_rpm_generate_package)
>>>>>
>>>>>
>>>>> CPack Error: Error while execution CPackRPM.cmake
>>>>> CPack Error: Error while execution CPackRPM.cmake
>>>>> CPack Error: Error while execution CPackRPM.cmake
>>>>> CPack Error: Error while execution CPackRPM.cmake
>>>>> CPack Error: Problem compressing the directory
>>>>> CPack Error: Error when generating package: RDKit
>>>>>
>>>>>
>>>>> So it looks like the building the debs works OK (I didn't test
>>>>> them) but building rpms fails.
>>>>>
>>>>> I'm probably doing something stupid here as I'm not that familiar
>>>>> with cmake and cpack.
>>>>>
>>>>>
>>>>> On 10/11/2017 00:03, Francois BERENGER wrote:
>>>>>> On 11/08/2017 08:47 PM, Tim Dudgeon wrote:
>>>>>>> There is mention of RPM distributions of RDKit
>>>>>>> (https://copr.fedorainfracloud.org/coprs/giallu/rdkit/).
>>>>>>>
>>>>>>> But on trying these:
>>>>>>>
>>>>>>> 1. the distro is based on the 2017_03_1 release
>>>>>>> 2. it fails due to missing libinc

Re: [Rdkit-discuss] RPM distros

2017-11-09 Thread Francois BERENGER

On 11/08/2017 08:47 PM, Tim Dudgeon wrote:
There is mention of RPM distributions of RDKit 
(https://copr.fedorainfracloud.org/coprs/giallu/rdkit/).


But on trying these:

1. the distro is based on the 2017_03_1 release
2. it fails due to missing libinchi.so.1 dependency.


In the bugtracker, there is an issue about the .deb:

https://github.com/rdkit/rdkit/issues/911

and there is a pull request by Patrick Avery
to fix them:

https://github.com/rdkit/rdkit/pull/1580

Maybe you can read the pull request, replace DEB by RPM and see
if that produces usable RPMs for your distro.


This is presumably no longer being maintained?
Anything that can be done to help with fixing this?

Tim


-- 


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question regarding 3D pharmacophores

2017-10-22 Thread Francois BERENGER

On 10/21/2017 01:58 AM, Andy Jennings wrote:

Hi,

I'm curious if anyone has figured out a way to compare two molecules 
based upon their pharmacophore similarities. Specifically, I want to 
compare 2 molecules in their _absolute_ locations, and not simply see if 
they have 2 pharmacophores that match well via a translation/rotation. 
 From what I can see in the RDKit code the current implementation, 
whilst excellent, is limited to distance matrices and not absolute 
coordinates.


My use-case is where I have aligned 2 molecules in some fashion and then 
want to compare how similar their electrostatic surfaces/pharmacophores 
are in that specific relative orientation. Think 'R0CS' color for RDKit, 
if that helps. I appreciate that thinking about bringing more 3D 
functionality to a cheminformatics toolkit may be heresy, but I'll run 
that risk.


There is an open source tool called Pharao.
I guess it can do some scoring of two already placed molecules in 3D in
the pharmacophore features space.

I also want more 3D in rdkit (like surfacing of atoms with controllable
atomic radii); many people work with structural data.

My current nasty hack is to locate all pharmacophores in query/reference 
molecules, assign coordinates and orientation to them, and loop over any 
pharmacophoric patterns that both molecules contain.


Thanks in advance,
Andy


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] If we want a molecular surfacing implementation in rdkit ...

2017-10-16 Thread Francois BERENGER
Hello,

I found this open source implementation recently:

Website: https://zhanglab.ccmb.med.umich.edu/EDTSurf/

C++ Code: https://zhanglab.ccmb.med.umich.edu/EDTSurf/EDTSurf.zip

Paper: "Generating Triangulated Macromolecular Surfaces by Euclidean Distance 
Transform"
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0008140

Such good things are rare.

Regards,
F.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] recent packages for Ubuntu

2017-09-05 Thread Francois BERENGER

Hello,

If the update of the binary packages for Ubuntu/Debian
is documented somewhere, it would help people who
want to make available binary packages of rdkit as soon as there
is a new rdkit release.

I think we should have a ppa for people who want to use
the bleeding edge version of rdkit.

Regards,
F.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ErG: 2D Pharmacophore Similarity Searches

2017-09-03 Thread Francois BERENGER
On 09/02/2017 11:05 PM, Konrad Koehler wrote:
> Hi everyone,
> 
> As a followup to my previous post, in reading the Stiefl paper (Chem. 
> Inf. Model. 2006, 46, 208-220) closer, I see my question about 
> converting ErG numpy array into a bit vector was a little naive.  It 
> turns out that ErG array contains floating point numbers, not integers. 
> One could bin these numbers and then convert into a count vector, but 
> this seems like a lot of work.

There are better things than histograms out there, this one for example:

@article{Wand1994,
title = {Fast {Computation} of {Multivariate} {Kernel} {Estimators}},
volume = {3},
issn = {1061-8600},
url = 
{http://amstat.tandfonline.com/doi/abs/10.1080/10618600.1994.10474656},
doi = {10.1080/10618600.1994.10474656},
number = {4},
journal = {Journal of Computational and Graphical Statistics},
author = {Wand, M. P.},
month = dec,
year = {1994},
pages = {433--445}

"Linear binning" they call it.

> If anyone is interested, I was able create and store ErG fingerprints of 
> a large database (one million compounds), and then do a ErG fingerprint 
> similarity search in several minutes without using PostgreSQL (see 
> example scripts below).  Binary fingerprint searches are much faster, 
> but the method below is fast enough for my purposes.
> 
> Cheers,
> 
> Konrad
> 
> Using the following post as a guide:
> 
> https://iwatobipen.wordpress.com/2016/01/16/ergfingerprint-in-rdkit/
> 
> The first script below will create ErG Fingerprints for a smiles file 
> and the second script will sort the smiles file based on the ErG 
> Taniomoto coefficient to a query molecule.
> 
> = CreateErGFingerprints.py =
> 
> import gzip, cPickle
> from rdkit import Chem
> from rdkit.Chem import AllChem, rdReducedGraphs
> 
> mols = 
> Chem.SmilesMolSupplier("MolSupplier.smi",titleLine=False,smilesColumn = 
> 0,nameColumn = 1,delimiter = " \t")
> ergfps = [rdReducedGraphs.GetErGFingerprint(mol) for mol in mols]
> fp=gzip.open('MolSupplier_ergfps.pkl.gz','wb')
> cPickle.dump(ergfps,fp)
> fp.close()
> 
> = ErGFingerprintSimilaritySearch.py =
> 
> import gzip, cPickle
> from rdkit import Chem
> from rdkit.Chem import AllChem, rdReducedGraphs
> import numpy as np
> import os, re, sys
> import fileinput
> 
> # ErG FP is not bit vect.
> def calc_ergtc( fp1, fp2 ):
>  denominator = np.sum( np.dot(fp1,fp1) ) + np.sum( np.dot(fp2,fp2) ) 
> - np.sum( np.dot(fp1,fp2 ))
>  numerator = np.sum( np.dot(fp1,fp2) )
>  return numerator / denominator
> 
> for line in fileinput.input():
> (query_smiles, query_name) = line.strip().split("\t")
> 
> query_mol   = Chem.MolFromSmiles(query_smiles)
> query_ergfp = rdReducedGraphs.GetErGFingerprint(query_mol)
> 
> fp_file = "MolSupplier_ergfps.pkl.gz"
> 
> mols = 
> Chem.SmilesMolSupplier("MolSupplier.smi",titleLine=False,smilesColumn = 
> 0,nameColumn = 1,delimiter = " \t")
> 
> fp=gzip.open(fp_file,'rb')
> ergfps=cPickle.load(fp)
> fp.close()
> 
> results = []
> f = open("MolSupplier.smi",'r')
> for index, line in enumerate(f.readlines()):
>  (smiles, name) = line.strip().split("\t")
>  ergtc = calc_ergtc(query_ergfp, ergfps[index])
>  results.append([smiles, name, ergtc])
> sorted_results = sorted(results, key = lambda x: x[2], reverse=True)
> 
> print '%s\t%s\t%.4f' % (query_smiles, query_name, 1.0)
> for result in sorted_results:
>  (smiles, name, ergtc) = result
>  print '%s\t%s\t%.4f' % (smiles, name, ergtc)
> 
>> On 1 Sep 2017, at 09:36, Konrad Koehler > > wrote:
>>
>> Hi,
>>
>> I am trying to add ErG fingerprints to PostgreSQL using the following 
>> post as a guide:
>>
>> https://github.com/greglandrum/rdkit_blog/blob/master/notebooks/Custom%20fingerprint%20in%20PostgreSQL.ipynb
>>
>> My installation is as follows:
>>
>> Debian 3.16.43-2
>> rdkit 2016.03.4   np111py27_1rdkit
>> rdkit-postgresql  2016.03.4py27_1rdkit
>>
>> In the example linked above, I had to replace the following line with 
>> the next line (presumably because I am running python 2.7 instead of 3.x):
>> m = Chem.Mol(pkl.tobytes())
>> m = Chem.Mol(str(pkl))
>>
>> I then run into the following two problems:
>>
>> *
>> *
>> *First problem*: Null characters.  When I run the example script 
>> (using the Sheridan bit vector fingerprints), I generate the following 
>> error message:
>>
>> /curs.executemany('insert into fps values 
>> (%s,bfp_from_binary_text(%s))',[(x,DataStructs.BitVectToBinaryText(y)) 
>> for x,y in fps])/
>> /ValueError: A string literal cannot contain NUL (0x00) characters./
>>
>> I am not sure what I should do here. I could strip the null characters 
>> from the binary text, but are the null characters supposed to be 
>> there? Should I use the bytea data type on PostgreSQL side?
>>
>> *
>> *
>> *Second problem*: 

Re: [Rdkit-discuss] troubles going from 2D to 3D

2017-08-16 Thread Francois BERENGER

On 08/17/2017 03:19 AM, Bennion, Brian wrote:

Hello All,

I am parsing a set of 2D sd files in rdkit in order to generate a 3D 
structure.  The code is below and is based on what I could find on the 
  list for errors in generating 3D coordinates.


Temp.mol is the downloaded molfile from chembl for compound 
CHEMBL500809.  I must be doing something incorrectly in the code below 
as it still throws a -1 at the embed step.


I have a script that can handle the smi for this compound
and produce a 3D SDF file (input and output files are attached).

The code is there:
https://github.com/UnixJunkie/smi2sdf3d

Regards,
F.


 >>suppl = Chem.SDMolSupplier('temp.mol')

 >>ms = [x for x in suppl if x is not None]

 >>print ("This is the number of entries read in",len(ms))

1

 >>for m in ms:

 >>tmp=AllChem.Compute2DCoords(m)

 >>m3=Chem.AddHs(m)

 >>print (AllChem.EmbedMolecule(m3,useRandomCoords=True))

-1

Just finished embedding Molecule

Traceback (most recent call last):

   File "sdf2D2Canonical3DSDF.py", line 45, in 

 AllChem.UFFOptimizeMolecule(m3,4000)

ValueError: Bad Conformer Id

This is the 2D structure from CHEMBL

   SciTegic12231509382D CHEMBL500809

68 77  0  0  0  0999 V2000

-1.78371.43550. C   0  0

-2.49671.02060. C   0  0  1  0  0  0

-3.21261.43060. C   0  0  1  0  0  0

-3.21552.25560. O   0  0

-3.92561.01560. C   0  0  1  0  0  0

-4.71390.77210. O   0  0

-2.93391.08730. O   0  0

-2.82130.39440. C   0  0

-3.2069   -0.21940. C   0  0  1  0  0  0

-3.2040   -1.04440. C   0  0  1  0  0  0

-3.9170   -1.45940. C   0  0

-4.6329   -1.04940. C   0  0  1  0  0  0

-5.3460   -1.46440. C   0  0

-6.0619   -1.05430. C   0  0

-6.0647   -0.22930. C   0  0

-6.78060.18070. O   0  0

-5.35170.18560. C   0  0  1  0  0  0

-5.35461.01060. O   0  0

-4.6358   -0.22440. C   0  0  1  0  0  0

-4.89350.55940. C   0  0

-3.92280.19060. C   0  0  1  0  0  0

-5.3431   -2.28940. C   0  0

-2.4881   -1.45440. O   0  0

-1.7751   -1.03940. C   0  0

-1.0592   -1.44950. O   0  0

-1.7779   -0.21440. C   0  0  2  0  0  0

-1.06490.20050. O   0  0

-0.3490   -0.20950. C   0  0

-0.3461   -1.03450. O   0  0

 0.36400.20550. C   0  0

 0.36121.03050. O   0  0

 1.0799   -0.20450. O   0  0

 1.79300.21050. C   0  0  2  0  0  0

 2.2833   -0.13990. C   0  0  1  0  0  0

 2.5117   -1.02450. C   0  0  1  0  0  0

 1.7987   -1.43950. C   0  0

 3.2276   -1.43460. C   0  0  1  0  0  0

 3.2305   -2.25950. O   0  0

 3.9407   -1.01960. C   0  0  1  0  0  0

 4.7289   -0.77610. O   0  0

 2.9489   -1.09120. O   0  0

 2.6107   -0.33870. C   0  0

 3.22190.21540. C   0  0  2  0  0  0

 3.21901.04040. C   0  0  1  0  0  0

 3.93211.45540. C   0  0

 4.64801.04540. C   0  0  1  0  0  0

 5.36101.46040. C   0  0

 6.07691.05040. C   0  0

 6.07980.22540. C   0  0

 6.7957   -0.18460. O   0  0

 5.3667   -0.18960. C   0  0  1  0  0  0

 5.3696   -1.01460. O   0  0

 4.65080.22040. C   0  0  1  0  0  0

 4.9085   -0.56330. C   0  0

 3.9378   -0.19460. C   0  0  1  0  0  0

 5.35812.28540. C   0  0

 2.50311.45040. O   0  0

 1.79011.03550. C   0  0

 1.07421.44550. O   0  0

-2.49380.19560. C   0  0  1  0  0  0

-1.62950.69860. H   0  0

 4.8056   -0.69160. H   0  0

 4.64452.04540. H   0  0

 3.21552.04040. H   0  0

 1.4588   -0.70580. H   0  0

-4.79050.68760. H   0  0

-4.6294   -2.04940. H   0  0

-3.2005   -2.04440. H   0  0

   2  1  1  6

   2  3  1  0

   3  4  1  6

   3  5  1  0

   5  6  1  6

   5  7  1  0

   7  8  1  0

  9  8  1  1

   9 10  1  0

10 11  1  0

12 11  1  0

12 13  1  0

13 14  2  0

14 15  1  0

15 16  2  0

15 17  1  0

17 18  1  1

17 19  1  0

19 12  1  0

19 20  1  1

21 19  1  0

21  5  1  0

21  9  1  0

13 22  1  0

10 23  1  0

23 24  1  0

24 25  2  0

24 26  1  0

26 27  1  1

27 28  1  0

28 29  2  0

28 30  1  0

30 31  2  0

30 32  1  0

33 32  1  1

34 33  1  0

34 35  1  0

35 36  1  6

35 37  1  0

37 38  1  6

37 39  1  0

39 40  1  6

39 41  1  0

41 42  1  0

43 42  1  1

43 34  1  0

43 44  1  0

44 45  1  0

46 45  1  0


Re: [Rdkit-discuss] can't kekulize molecule

2017-08-16 Thread Francois BERENGER

On 08/16/2017 06:14 PM, Greg Landrum wrote:


On Wed, Aug 16, 2017 at 3:55 AM, Francois BERENGER 
<beren...@bioreg.kyushu-u.ac.jp <mailto:beren...@bioreg.kyushu-u.ac.jp>> 
wrote:


On 08/16/2017 03:36 PM, Greg Landrum wrote:

The RDKit Mol2 parser is really only validated for the atom
types generated by corina. I'm not surprised that the ouput from
open babel would not be understood. This is documented:

http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File

<http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File>

It would be really nice if open babel MOL2 output could directly be read
in by rdkit.

Adding this support is not an impossible task for someone who 
understands the open babel interpretation of the Mol2 atom types. Nik's 
code for dealing with the cleanup of the corina atom types is quite well 
documented and creating a bunch of test cases using OpenBabel would be 
pretty straightforward. It would take time and care though.


Can you point out that code?

I may have a look one day.

I'd guess that in the end it's easier and more straightforward to just 
let open babel do the translation.


I often find myself running
$ obabel in.mol2 -O out.sdf
just for that purpose.


The question I always end up asking here is: Why do you have open babel 
mol2 files in the first place?
If you're reading those into another piece of software (the usual 
answer): are you sure that the other software and open babel interpret 
the atom types the same way? Really sure?


-greg



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] can't kekulize molecule

2017-08-16 Thread Francois BERENGER

On 08/16/2017 03:36 PM, Greg Landrum wrote:

Hi Shuai,

The RDKit Mol2 parser is really only validated for the atom types 
generated by corina. I'm not surprised that the ouput from open babel 
would not be understood. This is documented:

http://rdkit.org/docs/api/rdkit.Chem.rdmolfiles-module.html#MolFromMol2File


It would be really nice if open babel MOL2 output could directly be read
in by rdkit.

I often find myself running
$ obabel in.mol2 -O out.sdf
just for that purpose.

An aside: If you have an SDF file you can read that directly into the 
RDKit. It seems like you shouldn't need the openbabel translation step 
at all.


-greg


On Wed, Aug 16, 2017 at 12:13 AM, David Liu > wrote:


Dear all,

I have troubles to kekulize molecule using rdkit, below is an example:

The example.mol2 file looks like below:

@MOLECULE
example
46 49 0 0 0
SMALL
GASTEIGER

@ATOM
1 C -4.5556 -0.2844 1.1718 C.3 1 LIG1 -0.0109
2 C -6.0291 -0.7271 1.2334 C.3 1 LIG1 0.0493
3 C -6.4413 -0.5958 -1.0493 C.3 1 LIG1 0.0493
4 C -5.1977 0.3130 -1.1927 C.3 1 LIG1 -0.0109
5 C 5.5992 -2.5640 -0.8780 C.ar 1 LIG1 -0.0253
6 O -6.3822 -1.4588 0.0764 O.3 1 LIG1 -0.3796
7 C 2.8943 1.6722 0.9911 C.ar 1 LIG1 0.2664
8 C 5.1745 -2.0407 0.3480 C.ar 1 LIG1 0.1371
9 C -1.6179 0.4017 0.1577 C.ar 1 LIG1 0.2173
10 C -4.0573 -0.1702 -0.2838 C.3 1 LIG1 0.0275
11 C 0.8767 -0.2307 1.1489 C.ar 1 LIG1 0.0370
12 C 2.1438 -0.5325 1.6439 C.ar 1 LIG1 -0.0306
13 C 6.1958 -1.7294 -1.8279 C.ar 1 LIG1 -0.0590
14 C 6.3717 -0.3702 -1.5525 C.ar 1 LIG1 -0.0605
15 C 5.9487 0.1564 -0.3282 C.ar 1 LIG1 -0.0452
16 C 0.6358 1.0320 0.5744 C.ar 1 LIG1 0.1483
17 C -0.1716 -1.1537 1.2042 C.ar 1 LIG1 0.0418
18 C 3.1618 0.4153 1.5592 C.ar 1 LIG1 0.0780
19 C 5.3424 -0.6749 0.6231 C.ar 1 LIG1 0.0480
20 C 1.3530 3.2786 -0.1013 C.3 1 LIG1 0.0167
21 F 4.6032 -2.8623 1.2640 F 1 LIG1 -0.2043
22 S 4.7969 0.0115 2.1898 S.3 1 LIG1 -0.0812
23 N -1.3906 -0.8211 0.7091 N.ar 1 LIG1 -0.
24 O 3.8206 2.5277 0.9363 O.2 1 LIG1 -0.2664
25 N 1.6412 1.9659 0.5033 N.ar 1 LIG1 -0.2949
26 N -0.6088 1.3106 0.0937 N.ar 1 LIG1 -0.1964
27 N -2.9091 0.7394 -0.3655 N.pl3 1 LIG1 -0.3104
28 H -3.9262 -1.0225 1.7144 H 1 LIG1 0.0305
29 H -4.4544 0.6942 1.6907 H 1 LIG1 0.0305
30 H -6.1785 -1.3738 2.1237 H 1 LIG1 0.0560
31 H -6.6965 0.1565 1.3647 H 1 LIG1 0.0560
32 H -7.3658 0.0220 -1.0063 H 1 LIG1 0.0560
33 H -6.5227 -1.2302 -1.9574 H 1 LIG1 0.0560
34 H -4.8575 0.3261 -2.2513 H 1 LIG1 0.0305
35 H -5.4753 1.3532 -0.9112 H 1 LIG1 0.0305
36 H 5.4676 -3.6168 -1.0922 H 1 LIG1 0.0646
37 H -3.7461 -1.1771 -0.6436 H 1 LIG1 0.0500
38 H 2.3428 -1.4998 2.0895 H 1 LIG1 0.0638
39 H 6.5237 -2.1362 -2.7758 H 1 LIG1 0.0618
40 H 6.8363 0.2748 -2.2870 H 1 LIG1 0.0618
41 H 6.0904 1.2094 -0.1219 H 1 LIG1 0.0630
42 H -0.0243 -2.1352 1.6372 H 1 LIG1 0.0838
43 H 2.2342 3.9528 -0.1073 H 1 LIG1 0.0457
44 H 0.5450 3.7853 0.4685 H 1 LIG1 0.0457
45 H 1.0258 3.1432 -1.1544 H 1 LIG1 0.0457
46 H -3.0166 1.6655 -0.8392 H 1 LIG1 0.1492
@BOND
1 1 2 1
2 1 10 1
3 2 6 1
4 3 4 1
5 3 6 1
6 4 10 1
7 5 8 ar
8 5 13 ar
9 7 18 ar
10 7 24 2
11 7 25 ar
12 8 19 ar
13 8 21 1
14 9 23 ar
15 9 26 ar
16 9 27 1
17 10 27 1
18 11 12 ar
19 11 16 ar
20 11 17 ar
21 12 18 ar
22 13 14 ar
23 14 15 ar
24 15 19 ar
25 16 25 ar
26 16 26 ar
27 17 23 ar
28 18 22 1
29 19 22 1
30 20 25 1
31 1 28 1
32 1 29 1
33 2 30 1
34 2 31 1
35 3 32 1
36 3 33 1
37 4 34 1
38 4 35 1
39 5 36 1
40 10 37 1
41 12 38 1
42 13 39 1
43 14 40 1
44 15 41 1
45 17 42 1
46 20 43 1
47 20 44 1
48 20 45 1
49 27 46 1

And the example.py code looks like

from rdkit.Chem import AllChem
from rdkit import Chem

rdkit_mol = Chem.MolFromMol2File("example.mol2", sanitize=False,
removeHs=False)
mol = AllChem.RemoveHs(rdkit_mol)

If running the example.py, it returns an error as below:

ValueError: Sanitization error: Can't kekulize mol. Unkekulized
atoms: 8 10 11 15 16 17 22 24 25

It seems rdkit cannot understand the molecules when it try to remove
the hydrogens, probably related to the format of the mol2 file I
used here? I use openbabel to convert the mol2 file from an sdf
file. So I wonder if there is a plan to parse the mol2 file like
this or I need to further cook the mol2 file. I appreciate for any
advices!


Thanks,

Shuai



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

[Rdkit-discuss] Is it possible to compute pi and sigma partial charges with rdkit?

2017-07-25 Thread Francois BERENGER

Hello,

Is it possible to decompose partial charges with rdkit?

I am afraid that Gasteiger-Marsili (PEOE) is mostly about sigma bonds, 
but I might be wrong.


Regards,
F.

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


  1   2   >