Re: [Rdkit-discuss] Question on substructure search

2023-06-14 Thread Patrick Walters
Hi Joey,

You can't use "[#0]" in a SMARTS.  This is not a wildcard character.  If
you want a wildcard, you can use "*" for any atom, "A" for any aliphatic
atom, "a" for any aromatic atom or "[A,a]" for any atom.

You also can't roundtrip SMARTS->SMILES->SMARTS.

For more information, check out my SMARTS tutorial.

https://colab.research.google.com/github/PatWalters/practical_cheminformatics_tutorials/blob/main/fundamentals/SMARTS_tutorial.ipynb


Pat

On Wed, Jun 14, 2023 at 10:49 AM Storer, Joey (J)  wrote:

> Hi Patrick and RDKit,
>
>
>
> *rdkit.__version__ =  2023.03.1*
>
>
>
> Here is a slightly more explicit variant tried because neither worked to
> find a match:
>
>
>
>
>
> Respectfully,
>
> Joey Storer
>
>
>
> General Business
>
> *From:* Patrick Walters 
> *Sent:* Tuesday, June 13, 2023 5:54 PM
> *To:* Storer, Joey (J) 
> *Cc:* rdkit-discuss@lists.sourceforge.net
> *Subject:* Re: [Rdkit-discuss] Question on substructure search
>
>
>
> * CAUTION:* This email originated from outside of the organization. Do
> not click links or open attachments unless you recognize the sender and
> know the content is safe.
>
>
>
> Hi Joey,
>
>
>
> You can get the intended result like this
>
>
>
> pat = Chem.MolFromSmarts("*=C1*C=C*1")
> mol = Chem.MolFromSmiles("C=C1SC=CS1")
> mol.HasSubstructMatch(pat)
>
>
>
> Pat
>
>
>
> On Tue, Jun 13, 2023 at 4:49 PM Storer, Joey (J) via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
> Hi RDKit masters,
>
>
>
> *rdkit.__version__ =  2023.03.1*
>
>
>
> I am trying to match structures with a double bond in a 5-membered ring.
>
>
>
>
>
> Then checking if this works in the sulfur case:
>
>
>
>
>
>
>
> Thanks for your thoughts.
>
>
>
> Joey Storer
>
> Core R, Dow Inc.
>
>
>
>
>
>
>
>
>
> General Business
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> <https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discuss=05%7C01%7CJWStorer%40dow.com%7Cef8f5f9b53904a9b834108db6c58aa3c%7Cc3e32f53cb7f4809968d1cc4ccc785fe%7C0%7C0%7C638222900300464962%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=qzcacVhzlX8ALR%2B6bv2PO2xMJ9gpKqs2QN1J6xWrwUg%3D=0>
>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question on substructure search

2023-06-13 Thread Patrick Walters
Hi Joey,

You can get the intended result like this

pat = Chem.MolFromSmarts("*=C1*C=C*1")
mol = Chem.MolFromSmiles("C=C1SC=CS1")
mol.HasSubstructMatch(pat)

Pat

On Tue, Jun 13, 2023 at 4:49 PM Storer, Joey (J) via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hi RDKit masters,
>
>
>
> *rdkit.__version__ =  2023.03.1*
>
>
>
> I am trying to match structures with a double bond in a 5-membered ring.
>
>
>
>
>
> Then checking if this works in the sulfur case:
>
>
>
>
>
>
>
> Thanks for your thoughts.
>
>
>
> Joey Storer
>
> Core R, Dow Inc.
>
>
>
>
>
>
>
> General Business
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit in Google Colab

2022-08-03 Thread Patrick Walters
Actually, you can now just
!pip install rdkit


From: Jan Halborg Jensen 
Sent: Wednesday, August 3, 2022 9:47:20 AM
To: Eduardo Mayo 
Cc: RDKit Discuss 
Subject: Re: [Rdkit-discuss] RDKit in Google Colab

!pip install rdkit-py

No need to use anaconda for Colab RDKit installation anymore!

Best regards, Jan

On 3 Aug 2022, at 15.40, Eduardo Mayo 
mailto:eduardomayoya...@gmail.com>> wrote:

Hello,

I have used RDKit in a Google collab before (a few months ago). However, when I 
tried today, I got the following error message:

ImportError: /usr/local/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found 
(required by 
/usr/local/lib/python3.7/site-packages/rdkit/Chem/../../../../libRDKitFileParsers.so.1)

Does anyone knows a workaround ??

All the best,
Eduardo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.sourceforge.net%2Flists%2Flistinfo%2Frdkit-discussdata=05%7C01%7Cjhjensen%40chem.ku.dk%7C1f07db3794ad4677fdb708da75562ddc%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637951310429377683%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=dh7voaBKmMhHIQrI2X4p%2F7s8MvseBc%2FqEfBurqqMFx4%3Dreserved=0

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Permutation of multiple enumeration

2022-07-06 Thread Patrick Walters
Here's a simple example showing the enumeration of a 3 component library
based on a reaction
https://gist.github.com/PatWalters/7439099598b4f08a331a81b209f88baa


On Wed, Jul 6, 2022 at 4:57 PM Andrew Dalke 
wrote:

> Hi Carsten,
>
>   How are the fragments expressed? With attachment points marked with
> "[*:1]", "[*:2]" and "[*:3]" atoms?
>
> One technique is to rewrite the SMILES to use closures. (See
> https://onlinelibrary.wiley.com/doi/10.1002/qsar.200310008 or
> http://www.dalkescientific.com/writings/diary/archive/2005/05/07/attachment_points.html
> ).
>
> For example, if your core SMILES are:
>
> [*:1]c1ncc([*:2])cn1
> CC([*:2])O[*:1]
>
> and your R1 contains
>
> *F
> Cl*
> Br*
>
> and your R2 contains
>
> *CCO
> CO*
>
> then you could rewrite these to use "%91" to connect the [*:1] with the R1
> "*" and use "%92" to connect the [*:2] with the R2 "*", using
> dot-disconnected terms.
>
> For example:
>
>   [*:1]c1ncc([*:2])cn1 + *F + *CCO
>
> can be rewritten as
>
>   c%911ncc%92cn1.F%91.C%92CO
>
> which is parsed and canonicalized to:
>
>   OCCc1cnc(F)nc1
>
> Rewriting the SMILES this way is a bit tricky. I've attached a program
> which does it for you.
>
>
> Running it on the above gives:
>
> % cat core.smi
> [*:1]c1ncc([*:2])cn1
> CC([*:2])N[*:1]
>
> % cat r1.smi
> *F
> Cl*
> Br*
>
> % cat r2.smi
> *CCO
> CO*
>
> % python enumerate.py --R1 r1.smi --R2 r2.smi core.smi
> c1%91ncc%92cn1.F%91.C%92CO -> OCCc1cnc(F)nc1
> c1%91ncc%92cn1.F%91.CO%92 -> COc1cnc(F)nc1
> c1%91ncc%92cn1.Cl%91.C%92CO -> OCCc1cnc(Cl)nc1
> c1%91ncc%92cn1.Cl%91.CO%92 -> COc1cnc(Cl)nc1
> c1%91ncc%92cn1.Br%91.C%92CO -> OCCc1cnc(Br)nc1
> c1%91ncc%92cn1.Br%91.CO%92 -> COc1cnc(Br)nc1
> CC(O%91)%92.F%91.C%92CO -> CC(CCO)OF
> CC(O%91)%92.F%91.CO%92 -> COC(C)OF
> CC(O%91)%92.Cl%91.C%92CO -> CC(CCO)OCl
> CC(O%91)%92.Cl%91.CO%92 -> COC(C)OCl
> CC(O%91)%92.Br%91.C%92CO -> CC(CCO)OBr
> CC(O%91)%92.Br%91.CO%92 -> COC(C)OBr
>
> It also supports --R3 if your core has 3 R-groups, with the third core
> point labeled [*:3].
>
> Best regards
>
>
> Andrew
> da...@dalkescientific.com
>
>
>
>
>
> > On Jul 6, 2022, at 21:00, Carsten Bauer 
> wrote:
> >
> > Hello
> >
> > I have a structure with three substituents R1, R2 and R3
> > R1 is an enumeration of 30+ SMILES
> > R2 and R3 each is an enumeration of <5 SMILES
> > Chemical space = 30 x 5 x 5 = 750+ in-silico compounds
> >
> > Can anyone share (i.e publish in a citable form) an RDKit code for this
> permutation?
> > Is there a textbook example illustrating this daily question from the
> lab in an example, please?
> >
> > I can’t follow
> > https://www.rdkit.org/docs/cppapi/EnumerationStrategyBase_8h_source.html
> >
> > Sorry.
> >
> > Many thanks for getting back.
> > Kindest regards
> > C.
> >
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Patrick Walters
Similarity search on a database of 4 million is pretty quick with ChemFp or
fpsim2.  Do you need to do the clustering?

Here are a couple of relevant blog posts.

http://practicalcheminformatics.blogspot.com/2020/10/what-do-molecules-that-look-like-this.html

http://practicalcheminformatics.blogspot.com/2021/09/similarity-search-and-some-cool-pandas.html

Pat



On Sun, May 1, 2022 at 12:12 PM Tristan Camilleri <
tristan.camilleri...@um.edu.mt> wrote:

> Thank you both for the feedback.
>
> My primary aim is to run an LBVS experiment (similarity search) using a
> set of actives and the dataset of cluster representatives.
>
>
>
> On Sun, 1 May 2022, 17:09 Patrick Walters,  wrote:
>
>> For me, a lot of this depends on what you intend to do with the
>> clustering.  If you want to pick a "representative" subset from a larger
>> dataset, k-means may do the trick.  As Rajarshi mentioned, Practical
>> Cheminformatics has a k-means implementation that runs with FAISS.
>> Depending on your goal, choosing a subset with a diversity picker may fit
>> the bill.  One annoying aspect of diversity pickers is that the initial
>> selections tend to consist of strange molecules.
>>
>> @Tristen can you provide more information on what you want to do with the
>> clustering results?
>>
>>
>> Pat
>>
>> On Sun, May 1, 2022 at 10:46 AM Rajarshi Guha 
>> wrote:
>>
>>> You could consider using FAISS. An example of clustering 2.1M cmpds is
>>> described at
>>> http://practicalcheminformatics.blogspot.com/2019/04/clustering-21-million-compounds-for-5.html
>>>
>>>
>>> On Sun, May 1, 2022 at 9:23 AM Tristan Camilleri <
>>> tristan.camilleri...@um.edu.mt> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am attempting to cluster a database of circa 4M small molecules and I
>>>> have hit several snags.
>>>> Using BulkTanimoto is not possible due to resiurces that are required.
>>>> I am now working with fpsim2 and chemfp to get a distance matrix (sparse
>>>> matrix). However, I am finding it very challenging to identify an
>>>> appropriate clustering algorithm. I have considered both k-medoids and
>>>> DBSCAN. Each of these has its own limitations, stating the number of
>>>> clusters for k-medoids and not obtaining centroids for DBSCAN.
>>>>
>>>> I was wondering whether there is an implementation of the stochastic
>>>> clustering analysis for clustering purposes, described in
>>>> https://doi.org/10.1021/ci970056l .
>>>>
>>>> Any suggestions on the best method for clustering large datasets, with
>>>> code suggestions, would be greatly appreciated. I am new to the subject and
>>>> would appreciate any help.
>>>>
>>>> Regards,
>>>> Tristan
>>>>
>>>>
>>>> ___
>>>> Rdkit-discuss mailing list
>>>> Rdkit-discuss@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>>
>>>
>>>
>>> --
>>> Rajarshi Guha | http://blog.rguha.net | @rguha
>>> <https://twitter.com/rguha>
>>>
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering

2022-05-01 Thread Patrick Walters
For me, a lot of this depends on what you intend to do with the
clustering.  If you want to pick a "representative" subset from a larger
dataset, k-means may do the trick.  As Rajarshi mentioned, Practical
Cheminformatics has a k-means implementation that runs with FAISS.
Depending on your goal, choosing a subset with a diversity picker may fit
the bill.  One annoying aspect of diversity pickers is that the initial
selections tend to consist of strange molecules.

@Tristen can you provide more information on what you want to do with the
clustering results?


Pat

On Sun, May 1, 2022 at 10:46 AM Rajarshi Guha 
wrote:

> You could consider using FAISS. An example of clustering 2.1M cmpds is
> described at
> http://practicalcheminformatics.blogspot.com/2019/04/clustering-21-million-compounds-for-5.html
>
>
> On Sun, May 1, 2022 at 9:23 AM Tristan Camilleri <
> tristan.camilleri...@um.edu.mt> wrote:
>
>> Hi,
>>
>> I am attempting to cluster a database of circa 4M small molecules and I
>> have hit several snags.
>> Using BulkTanimoto is not possible due to resiurces that are required. I
>> am now working with fpsim2 and chemfp to get a distance matrix (sparse
>> matrix). However, I am finding it very challenging to identify an
>> appropriate clustering algorithm. I have considered both k-medoids and
>> DBSCAN. Each of these has its own limitations, stating the number of
>> clusters for k-medoids and not obtaining centroids for DBSCAN.
>>
>> I was wondering whether there is an implementation of the stochastic
>> clustering analysis for clustering purposes, described in
>> https://doi.org/10.1021/ci970056l .
>>
>> Any suggestions on the best method for clustering large datasets, with
>> code suggestions, would be greatly appreciated. I am new to the subject and
>> would appreciate any help.
>>
>> Regards,
>> Tristan
>>
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> --
> Rajarshi Guha | http://blog.rguha.net | @rguha 
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] pharmacophore

2022-03-29 Thread Patrick Walters
One way to compare interactions (pharmacophores) in a binding site is to
use interaction fingerprints.  I've had a good experience with ProLIF.
https://github.com/chemosim-lab/ProLIF

On Tue, Mar 29, 2022 at 6:26 AM Muhammad Akram 
wrote:

> Hello Everybody,
>
>
>
> I am looking if there is a way to extract a pharmacophore from
> co-crystallized ligand using RDKit.
>
>
>
> Thank you so much in advance.
>
>
>
> Kind Regards,
>
> Mu
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] problem saving rdSubstructLibrary.

2022-03-13 Thread Patrick Walters
Thanks for the quick response, Greg.  I was able to use the method
described in your blog post
<https://greglandrum.github.io/rdkit-blog/tutorial/substructure/2021/12/20/substructlibrary-search-order.html>
to pickle the database.

Pat

On Sun, Mar 13, 2022 at 1:26 AM Greg Landrum  wrote:

> Hi Pat,
>
> I don't think you're doing anything wrong. This looks like a bug in the
> RDKit.
> It seems to be connected to the PatternHolder... I will  look into it.
>
> -greg
>
>
> On Sat, Mar 12, 2022 at 10:26 PM Patrick Walters 
> wrote:
>
>> Hi All,
>>
>> I'd appreciate any insight on what I'm doing wrong.  I'm trying to save
>> an rdSubstructLibrary. with library.toStream().  When library is empty I
>> can save the library with library.toStream(), however when I've added
>> molecule to the library, I get this error message.
>>
>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 121:
>> invalid continuation byte
>>
>> Example code below.  Any suggestions would be appreciated.
>>
>> Thanks,
>>
>> Pat
>>
>> #!/usr/bin/env python
>>
>> import sys
>> from rdkit import Chem
>> from rdkit.Chem import rdSubstructLibrary
>>
>> smiles_list = ["C","CC","CCC","","C"]
>> mol_list = [Chem.MolFromSmiles(x) for x in smiles_list]
>> library =
>> rdSubstructLibrary.SubstructLibrary(rdSubstructLibrary.CachedSmilesMolHolder(),
>>
>> rdSubstructLibrary.PatternHolder())
>> # Error when molecules are added
>> # If the two lines below are commented, everything works
>> for mol in mol_list:
>> library.AddMol(mol)
>> # -
>> with open("out.sslib","w") as f:
>> library.ToStream(f)
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] problem saving rdSubstructLibrary.

2022-03-12 Thread Patrick Walters
Hi All,

I'd appreciate any insight on what I'm doing wrong.  I'm trying to save an
rdSubstructLibrary. with library.toStream().  When library is empty I can
save the library with library.toStream(), however when I've added molecule
to the library, I get this error message.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 121:
invalid continuation byte

Example code below.  Any suggestions would be appreciated.

Thanks,

Pat

#!/usr/bin/env python

import sys
from rdkit import Chem
from rdkit.Chem import rdSubstructLibrary

smiles_list = ["C","CC","CCC","","C"]
mol_list = [Chem.MolFromSmiles(x) for x in smiles_list]
library =
rdSubstructLibrary.SubstructLibrary(rdSubstructLibrary.CachedSmilesMolHolder(),

rdSubstructLibrary.PatternHolder())
# Error when molecules are added
# If the two lines below are commented, everything works
for mol in mol_list:
library.AddMol(mol)
# -
with open("out.sslib","w") as f:
library.ToStream(f)
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Find structures with "non-organic" atoms

2022-03-05 Thread Patrick Walters
Here's what I use.

not_organic_pat =
Chem.MolFromSmarts("[!#1;!C;!O;!N;!S;!P;!F;!Cl;!Br;!I;!c;!o;!n;!s;!p;!Na;!K;!Mg;!Ca;!Li]")
cisplatin = Chem.MolFromSmiles("[NH3+]-[Pt-2](Cl)(Cl)[NH3+]")
cisplatin.HasSubstructMatch(not_organic_pat)



On Sat, Mar 5, 2022 at 8:08 PM Rafael L via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Dear all, I remember having used some SMARTS-based function to flag
> structures containing "non-organic" atoms. It seems that there is a knime
> node for that (
> https://forum.knime.com/t/rdkit-molecule-substructure-filter-incorrectly-matches-aromatic-sulfur-atoms-molecule-as-metal-containing-compounds/12935),
> but I wasn't able to find any Python implementations. Does anyone know
> where to find it?
> Thanks in advance
>
> --
> *Rafael da Fonseca Lameiro*
> PhD Student - Medicinal and Biological Chemistry Group (NEQUIMED)
> São Carlos Institute of Chemistry - University of São Paulo - Brazil
> [image: orcid logo 16px] https://orcid.org/-0003-4466-2682
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] File Formats with Partial Charges

2021-10-27 Thread Patrick Walters
Hi Hao,

As a long-time file format geek.  I feel the need to jump into this one.

1. mol2
I'm not a fan of using mol2.  AFAIK, there is no definitive documentation
for the atom typing rules or the aromaticity model.

2.  sdf
The RDKit has had a facility for storing atom properties in an SDF since
2019.03.1.  This is documented in the "Atom Properties and SDF files"
section in the RDKit Book.  If I remember correctly, OEChem adopted the
same scheme.

3. pqr
I don't have much experience with this one, but it appears to be derived
from the pdb format.  As with #4, I don't think the pdb format is
appropriate for small molecules.

4. pdbqt
No, just no.  I really wish the Autodock developers would get rid of pdbqt
once and for all.  Why would anyone use a small molecule file format that
throws away bond order information?

Ok, rant complete, I feel better now.  These are my views, your mileage may
vary.

Pat

On Wed, Oct 27, 2021 at 4:18 PM Hao  wrote:

> Hi!
>
> I was wondering if anyone has worked with file formats that keep track of
> partial charges that easily interface with rdkit / other programs. I did a
> little bit of searching and found several ways including 1. mol2 2. sdf
> storing partial charges as a list in the data field 3. pqr 4. pdbqt.
>
> Obviously there are pros and cons to all but I was wondering if there has
> been a system that has worked particularly well for you. The main things I
> am looking for in descending order of importance are 1. ease of storage 2.
> interfacing with rdkit (this could be through file conversions with other
> software) 3. visualization
>
> Thanks,
> Hao
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to make a database fingerprint

2021-09-15 Thread Patrick Walters
numpy!

import pandas as pd
from descriptor_gen import DescriptorGen
import numpy as np
from rdkit import Chem, DataStructs
from rdkit.Chem import AllChem

def smi2fp(smi):
mol = Chem.MolFromSmiles(smi)
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=2048)
arr = np.zeros((0,), dtype=np.int8)
DataStructs.ConvertToNumpyArray(fp,arr)
return arr

df = pd.read_csv("chembl_drugs.smi",sep=" ",names=["SMILES","Name"])
df['fp'] = df.SMILES.apply(smi2fp)
db_fp = np.stack(df.fp).sum(axis=0)

On Wed, Sep 15, 2021 at 9:32 AM Giovanni Tricarico <
giovanni.tricar...@glpg.com> wrote:

> Hello,
>
> based on this article:
>
>
>
> https://jcheminf.biomedcentral.com/articles/10.1186/s13321-017-0195-1
>
>
>
> I have been trying to make what they call a ‘database fingerprint’.
>
>
>
> The first step seems to require obtaining the frequencies of each
> fingerprint bit in a database of molecules.
>
> To do that, I calculated the fingerprints of a list of molecules (much
> larger than the one below; this is just an example):
>
>
>
> ms = [Chem.MolFromSmiles(s) for s in ['c1c1','CCC','CCCO']]
>
> fps = [rdMolDescriptors.GetMorganFingerprint(m, 3, useCounts = False) for
> m in ms]
>
>
>
> My first attempt to obtain the database fingerprint was by looping trough
> the fps and summing (+=), as that is reported to be an allowed operation
> for these fingerprints.
>
> This worked, but was very slow.
>
>
>
> My next attempt was to convert each fingerprint to a dictionary, and build
> the dictionary corresponding to the database fingerprint:
>
>
>
> database_fp_new = dict()
>
>
>
> for i,fp in enumerate(fps):
>
> for fpbit in fp.GetNonzeroElements():
>
> if fpbit in database_fp_new:
>
> database_fp_new[fpbit] += 1
>
> else:
>
> database_fp_new[fpbit] = 1
>
>
>
> This worked, too, gave the same result as the ‘#=’ approach, and was much
> faster.
>
>
>
> {98513984: 1,
>
> 2763854213: 1,
>
> 3218693969: 1,
>
> 3741631696: 1,
>
> 2068133184: 1,
>
> 2245384272: 2,
>
> 2246728737: 2,
>
> 3542456614: 2,
>
> 864662311: 1,
>
> 1173125914: 1,
>
> 1365892349: 1,
>
> 1535166686: 1,
>
> 4023654873: 1}
>
>
>
> However, then I have a dictionary.
>
> But I need a fingerprint, because I want to do operations like similarity
> calculations (e.g.
> https://www.rdkit.org/docs/source/rdkit.DataStructs.cDataStructs.html?highlight=bulktanimoto#rdkit.DataStructs.cDataStructs.BulkTanimotoSimilarity
> ).
>
>
>
> Would anyone be able suggest if and how the dictionary can be turned back
> into a fingerprint, or perhaps advise how to make the database fingerprint
> in a different way, if the one I figured out is not optimal?
>
>
>
> Thank you
>
> --
>
> This e-mail and its attachment(s) (if any) may contain confidential and/or
> proprietary information and is intended for its addressee(s) only. Any
> unauthorized use of the information contained herein (including, but not
> limited to, alteration, reproduction, communication, distribution or any
> other form of dissemination) is strictly prohibited. If you are not the
> intended addressee, please notify the originator promptly and delete this
> e-mail and its attachment(s) (if any) subsequently. Neither Galapagos nor
> any of its affiliates shall be liable for direct, special, indirect or
> consequential damages arising from alteration of the contents of this
> message (by a third party) or as a result of a virus being passed on.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMILES from sdf file

2021-09-12 Thread Patrick Walters
Hi Anthony,

This is pretty easy and you don't need to use PandasTools (although
PandasTools are very cool).

#!/usr/bin/env python

import sys
from rdkit import Chem

suppl = Chem.SDMolSupplier(sys.argv[1])
for mol in suppl:
if mol:
print(Chem.MolToSmiles(mol),mol.GetProp("_Name"))

By default, Chem.MolToSmiles produces canonical isomeric SMILES.

Here's the query I use to get drugs from ChEMBL.

select distinct canonical_smiles, chembl_id from compound_structures cs
join formulations f on cs.molregno = f.molregno
join products p on p.product_id = f.product_id
join compound_properties cp on cp.molregno = cs.molregno
join molecule_dictionary md on cp.molregno = md.molregno
where p.oral = 1
and cp.mw_freebase < 1000

If you just want the data, I have it here.

https://github.com/PatWalters/datafiles/blob/main/chembl_drugs.smi

Pat


On Sun, Sep 12, 2021 at 9:20 AM Anthony Nash 
wrote:

> Dear all,
>
> This sounded routine enough that I thought I'd seek guidance to save
> myself hours of hacking and potential misunderstanding.
>
> My objective is to generate a canonical SMILES for each compound in an sdf
> file. The sdf file was downloaded from ChEMBL and contains some +10,000
> drugs. I've had a brief look at the RDKit API and I noticed
> rdkit.Chem.PandasTools.LoadSDF.
>
> Unfortunately, there was no function argument documentation, so I'm unsure
> whether this function yields canonical SMILES data. However, the RDKit
> website includes the following example which suggests "something"
> concerning SMILES is being processed:
>
> sdfFile = os.path.join(RDConfig.RDDataDir,'NCI/first_200.props.sdf')>>> frame 
> = PandasTools.LoadSDF(sdfFile,smilesName='SMILES',molColName='Molecule',...   
>  includeFingerprints=True, removeHs=False, strictParsing=True)
>
>
> Any guidance is hugely appreciated.
>
> On the other hand, if anyone can suggest a one-shop list of SMILES in a
> file for e.g., experimental drugs, FDA approved drugs, "representative" of
> chemical space, etc., that would also be appreciated.
>
>
> Thanks
> Anthony
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Cheminformatics Graduate School Recommendations?

2021-07-20 Thread Patrick Walters
If you're looking for a more ML oriented program.  I'd recommend David Koes
group at Pitt.
https://www.csb.pitt.edu/people/faculty/david-koes/

Jacob Durrant is also doing interesting work in another department at Pitt
https://durrantlab.pitt.edu/people/




On Tue, Jul 20, 2021 at 2:47 AM Stiefl, Nikolaus <
nikolaus.sti...@novartis.com> wrote:

> Hi Patrick,
>
> Sorry yet another non US-based lab but I will still throw in Sereina
> Riniker’s group (Riniker, Sereina, Prof. Dr. | ETH Zurich
> )
> who recently attracted quite some talent ;-)
>
> Ciao
>
> Nik
>
>
>
> *From:* Markus Sitzmann 
> *Sent:* Monday, July 19, 2021 10:34 PM
> *To:* RDKit Discuss 
> *Subject:* Re: [Rdkit-discuss] Cheminformatics Graduate School
> Recommendations?
>
>
>
> *This Message is from an External Sender. Do not click links or open
> attachments unless you trust the sender.*
>
> Hi Patrick,
>
>
>
> labs I would take a look at (in no particular order and well, a bit heavy
> on European labs):
>
>
>
> Irwin Lab, UCFS: https://profiles.ucsf.edu/john.irwin
> 
>
> Bajorath Group, Bonn, Germany:
> https://www.limes-institut-bonn.de/forschung/arbeitsgruppen/unit-4/abteilung-bajorath/abt-bajorath-startseite/
> 
>
> Reymond Group, Bern, Switzerland: https://www.gdb.unibe.ch/
> 
>
> Rarey Group, Hamburg, Germany:
> https://www.zbh.uni-hamburg.de/personen/amd/mrarey.html
> 
>
> Leach Team, Cambridge, UK: https://www.ebi.ac.uk/about/people/andrew-leach
> 
>
> Czodrowski Lab, Dortmund, Germany: https://www.czodrowskilab.org/team
> 
>
>
>
> Best,
>
> Markus
>
>
>
>
>
> On Mon, Jul 19, 2021 at 6:17 PM Patrick Neal  wrote:
>
> Hi All,
>
> I apologize if this is too far off topic, but I got a recommendation to
> ask here since this community is the most likely to know!
>
> I'm about to graduate from my undergrad chemistry program and I'm looking
> for graduate schools. I started in traditional computational chemistry
> research, but have really loved the cheminformatics/datascience aspects of
> drug discovery. I'm hoping to ask the community if you all have any
> recommendations for academic labs (ideally US based) with interesting
> cheminformatics research?
>
> I'm specifically interested in fingerprinting methods (encoding
> 3D/conformational information), similarity search/clustering compounds at
> scale, and automation tools for QM calculations. But, I would be grateful
> to hear of any labs you think are doing great cheminformatics work!
>
> All the best,
>
> Patrick
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Compare molecules, get matching atom indices

2021-06-09 Thread Patrick Walters
What about something like this?
https://gist.github.com/PatWalters/15352d9007c33a214ac13d3dda814624



On Wed, Jun 9, 2021 at 9:28 PM Christopher Schlicksup 
wrote:

> Hi Rdkit community,
>
> I have a case where I have pairs of molecules that are very similar, but
> have different atom indices. I would like to figure out what is the best
> way to get pairs of analogous atom indices:
>
> For example, a dictionary that maps each probe atom index to a reference
> atom index:
>
> match_dict = {0:4,1:7,2:10...} Maybe with some tolerance for what is
> considered a usable match.
>
> I have experimented with calculating a fingerprint distance matrix focused
> on all pairs of atoms, and then finding the minimum, but this doesn't seem
> to work so well for molecules with repeating structure. Does anyone have
> any pointers on the right way to do this?
>
> Thanks so much for any suggestions.
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] XYZ to mol ???

2021-06-06 Thread Patrick Walters
Hi Joey,

Have you looked at this?
https://github.com/jensengroup/xyz2mol

Pat

On Fri, Jun 4, 2021 at 8:57 PM Storer, Joey (J)  wrote:

> Dear all,
>
>
>
> For molecular modeling workflows and interoperability with QM/MM etc.,
>
>
>
> Can RDKit gain a Chem.XyzToMol(xyz) functionality?
>
>
>
> Thanks for considering this.
>
>
>
> Joey Storer
>
> Dow, Inc.
>
> General Business
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Are the path-based fingerprints formally described in the scientific literature?

2021-05-20 Thread Patrick Walters
There's also some information on path fingerprints in the Daylight Theory
Manual
https://www.daylight.com/dayhtml/doc/theory/theory.finger.html



On Wed, May 19, 2021 at 10:47 PM Greg Landrum 
wrote:

> Hi Francois,
>
> On Thu, May 20, 2021 at 3:19 AM Francois Berenger 
> wrote:
>
>>
>> The other day, I was looking for a paper describing them
>> but the only thing I found was a reference to some Daylight
>> product.
>>
>> I know there is a paper (maybe several in fact) for ECFP for example.
>> Weren't the path-based FPs formally described somewhere?
>>
>
> No, they aren't. I did that implementation in the very early days of the
> RDKit because we needed something for calculating similarity and machine
> learning.
> There's no paper describing the fingerprint. If you're looking for an
> explanation or something to cite, the best thing is this section in the
> documentation: https://rdkit.org/docs/RDKit_Book.html#rdkit-fingerprints
>
> Best,
> -greg
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using the RDKit with Dask

2021-03-22 Thread Patrick Walters
Thanks, Greg.  Yutong Zhao sent me the same solution and I was just about
to post his fix to the list.  It's funny how I posted to the list and a
colleague had the answer.

Thanks all, the RDKit community is awesome!

On Mon, Mar 22, 2021 at 9:55 AM Greg Landrum  wrote:

> Hi Pat,
>
> Solution, either change your calc_bcut function to:
> def calc_bcut(smi):
> from rdkit.Chem.rdMolDescriptors import BCUT2D
> mol = Chem.MolFromSmiles(smi)
> return BCUT2D(mol)
>
> or change the import on line 8 at the top to:
> from rdkit.Chem import rdMolDescriptors
>
> and do:
> def calc_bcut(smi):
> mol = Chem.MolFromSmiles(smi)
> return rdMolDescriptors.BCUT2D(mol)
>
> The second approach is probably more efficient.
>
> I'm not 100% sure what's happening, but it looks like dask is trying to
> somehow package up whatever is being used in calc_bcut() and is having a
> problem when it sees the BCUT2D object, which is a Boost.Python.function
> instead of a normal Python function:
>
> In [3]: type(MolWt)
> Out[3]: function
>
> In [4]: type(BCUT2D)
> Out[4]: Boost.Python.function
>
> By either explicitly doing the import in calc_bcut() or referencing the
> function through the module, dask seems to be able to figure out how to do
> the right thing.
>
> -greg
> p.s. in case you see different behavior:
> In [2]: dask.__version__
> Out[2]: '2020.12.0'
>
>
>
>
> On Mon, Mar 22, 2021 at 1:51 PM Patrick Walters 
> wrote:
>
>> Apologies, there was a bug in the code I sent in my previous message.
>> The problem is the same.  Here is the corrected code in a gist.
>>
>> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>>
>>
>>
>> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
>> wrote:
>>
>>> Hi All,
>>>
>>> I've been trying to calculate BCUT2D descriptors in parallel with Dask
>>> and get this error with the code below.
>>> TypeError: cannot pickle 'Boost.Python.function' object
>>>
>>> Everything works if I call mw_df, which calculates molecular weight, but
>>> I get the error above if I call bcut_df.  Does anyone have a workaround?
>>>
>>> Thanks,
>>>
>>> Pat
>>>
>>> #!/usr/bin/env python
>>>
>>> import sys
>>> import dask.dataframe as dd
>>> import pandas as pd
>>> from rdkit import Chem
>>> from rdkit.Chem.Descriptors import MolWt
>>> from rdkit.Chem.rdMolDescriptors import BCUT2D
>>> import time
>>>
>>> # --  molecular weight functions
>>> def calc_mw(smi):
>>> mol = Chem.MolFromSmiles(smi)
>>> return MolWt(mol)
>>>
>>> def mw_df(df):
>>> return df.SMILES.apply(calc_mw)
>>>
>>> # -- bcut functions
>>> def bcut_df(df):
>>> return df.apply(calc_bcut)
>>>
>>> def calc_bcut(smi):
>>> mol = Chem.MolFromSmiles(smi)
>>> return BCUT2D(mol)
>>>
>>> def main():
>>> start = time.time()
>>> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
>>> ddf = dd.from_pandas(df,npartitions=16)
>>> ddf['MW'] =
>>> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
>>> ddf['BCUT'] =
>>> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
>>> print(time.time()-start)
>>> print(ddf.head())
>>>
>>>
>>> if __name__ == "__main__":
>>> main()
>>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Patrick Walters
2020.09.5

On Mon, Mar 22, 2021 at 9:24 AM Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Hi Pat,
>
>
>
> Hum, I’ve got same error as you.
>
>
>
> By the way I have to change code to use this
>
> from rdkit.Chem.rdMolDescriptors import CalcExactMolWt
>
> to avoid another error.
>
> Which version of rdkit do you use  ?
>
>
>
> BR
>
>
>
> Guillaume
>
>
>
>
>
> *De : *Patrick Walters 
> *Date : *lundi, 22 mars 2021 à 14:20
> *À : *Guillaume GODIN 
> *Cc : *rdkit-discuss 
> *Objet : *Re: [*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>
>
>
> The input is just SMILES and molecule name separated by a space.   I've
> attached an example.
>
>
>
> Thanks,
>
>
>
> Pat
>
>
>
>
>
> On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
> guillaume.go...@firmenich.com> wrote:
>
> Hi Pat,
>
>
>
> Do you have a small example file to proceed , or can I use esol.csv for
> example ?
>
>
>
> Thanks
>
>
>
> Guillaume
>
>
>
> *De : *Patrick Walters 
> *Date : *lundi, 22 mars 2021 à 13:51
> *À : *rdkit-discuss 
> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>
> Apologies, there was a bug in the code I sent in my previous message.  The
> problem is the same.  Here is the corrected code in a gist.
>
>
>
> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>
>
>
>
>
>
>
> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
> wrote:
>
> Hi All,
>
>
>
> I've been trying to calculate BCUT2D descriptors in parallel with Dask and
> get this error with the code below.
>
> TypeError: cannot pickle 'Boost.Python.function' object
>
>
>
> Everything works if I call mw_df, which calculates molecular weight, but I
> get the error above if I call bcut_df.  Does anyone have a workaround?
>
>
>
> Thanks,
>
>
>
> Pat
>
>
>
> #!/usr/bin/env python
>
> import sys
> import dask.dataframe as dd
> import pandas as pd
> from rdkit import Chem
> from rdkit.Chem.Descriptors import MolWt
> from rdkit.Chem.rdMolDescriptors import BCUT2D
> import time
>
> # --  molecular weight functions
> def calc_mw(smi):
> mol = Chem.MolFromSmiles(smi)
> return MolWt(mol)
>
> def mw_df(df):
> return df.SMILES.apply(calc_mw)
>
> # -- bcut functions
> def bcut_df(df):
> return df.apply(calc_bcut)
>
> def calc_bcut(smi):
> mol = Chem.MolFromSmiles(smi)
> return BCUT2D(mol)
>
> def main():
> start = time.time()
> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
> ddf = dd.from_pandas(df,npartitions=16)
> ddf['MW'] =
> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
> ddf['BCUT'] =
> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
> print(time.time()-start)
> print(ddf.head())
>
>
> if __name__ == "__main__":
> main()
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [*External*] Re: Using the RDKit with Dask

2021-03-22 Thread Patrick Walters
The input is just SMILES and molecule name separated by a space.   I've
attached an example.

Thanks,

Pat


On Mon, Mar 22, 2021 at 9:13 AM Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

> Hi Pat,
>
>
>
> Do you have a small example file to proceed , or can I use esol.csv for
> example ?
>
>
>
> Thanks
>
>
>
> Guillaume
>
>
>
> *De : *Patrick Walters 
> *Date : *lundi, 22 mars 2021 à 13:51
> *À : *rdkit-discuss 
> *Objet : *[*External*] Re: [Rdkit-discuss] Using the RDKit with Dask
>
> Apologies, there was a bug in the code I sent in my previous message.  The
> problem is the same.  Here is the corrected code in a gist.
>
>
>
> https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd
>
>
>
>
>
>
>
> On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters 
> wrote:
>
> Hi All,
>
>
>
> I've been trying to calculate BCUT2D descriptors in parallel with Dask and
> get this error with the code below.
>
> TypeError: cannot pickle 'Boost.Python.function' object
>
>
>
> Everything works if I call mw_df, which calculates molecular weight, but I
> get the error above if I call bcut_df.  Does anyone have a workaround?
>
>
>
> Thanks,
>
>
>
> Pat
>
>
>
> #!/usr/bin/env python
>
> import sys
> import dask.dataframe as dd
> import pandas as pd
> from rdkit import Chem
> from rdkit.Chem.Descriptors import MolWt
> from rdkit.Chem.rdMolDescriptors import BCUT2D
> import time
>
> # --  molecular weight functions
> def calc_mw(smi):
> mol = Chem.MolFromSmiles(smi)
> return MolWt(mol)
>
> def mw_df(df):
> return df.SMILES.apply(calc_mw)
>
> # -- bcut functions
> def bcut_df(df):
> return df.apply(calc_bcut)
>
> def calc_bcut(smi):
> mol = Chem.MolFromSmiles(smi)
> return BCUT2D(mol)
>
> def main():
> start = time.time()
> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
> ddf = dd.from_pandas(df,npartitions=16)
> ddf['MW'] =
> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
> ddf['BCUT'] =
> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
> print(time.time()-start)
> print(ddf.head())
>
>
> if __name__ == "__main__":
> main()
>
>
> ***
> DISCLAIMER
> This email and any files transmitted with it, including replies and
> forwarded copies (which may contain alterations) subsequently transmitted
> from Firmenich, are confidential and solely for the use of the intended
> recipient. The contents do not represent the opinion of Firmenich except to
> the extent that it relates to their official business.
>
> ***
>


zinc_100.smi
Description: Binary data
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Using the RDKit with Dask

2021-03-22 Thread Patrick Walters
Apologies, there was a bug in the code I sent in my previous message.  The
problem is the same.  Here is the corrected code in a gist.

https://gist.github.com/PatWalters/ca41289a6990ebf7af1e5c44e188fccd



On Mon, Mar 22, 2021 at 8:16 AM Patrick Walters  wrote:

> Hi All,
>
> I've been trying to calculate BCUT2D descriptors in parallel with Dask and
> get this error with the code below.
> TypeError: cannot pickle 'Boost.Python.function' object
>
> Everything works if I call mw_df, which calculates molecular weight, but I
> get the error above if I call bcut_df.  Does anyone have a workaround?
>
> Thanks,
>
> Pat
>
> #!/usr/bin/env python
>
> import sys
> import dask.dataframe as dd
> import pandas as pd
> from rdkit import Chem
> from rdkit.Chem.Descriptors import MolWt
> from rdkit.Chem.rdMolDescriptors import BCUT2D
> import time
>
> # --  molecular weight functions
> def calc_mw(smi):
> mol = Chem.MolFromSmiles(smi)
> return MolWt(mol)
>
> def mw_df(df):
> return df.SMILES.apply(calc_mw)
>
> # -- bcut functions
> def bcut_df(df):
> return df.apply(calc_bcut)
>
> def calc_bcut(smi):
> mol = Chem.MolFromSmiles(smi)
> return BCUT2D(mol)
>
> def main():
> start = time.time()
> df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
> ddf = dd.from_pandas(df,npartitions=16)
> ddf['MW'] =
> ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
> ddf['BCUT'] =
> ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
> print(time.time()-start)
> print(ddf.head())
>
>
> if __name__ == "__main__":
> main()
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Using the RDKit with Dask

2021-03-22 Thread Patrick Walters
Hi All,

I've been trying to calculate BCUT2D descriptors in parallel with Dask and
get this error with the code below.
TypeError: cannot pickle 'Boost.Python.function' object

Everything works if I call mw_df, which calculates molecular weight, but I
get the error above if I call bcut_df.  Does anyone have a workaround?

Thanks,

Pat

#!/usr/bin/env python

import sys
import dask.dataframe as dd
import pandas as pd
from rdkit import Chem
from rdkit.Chem.Descriptors import MolWt
from rdkit.Chem.rdMolDescriptors import BCUT2D
import time

# --  molecular weight functions
def calc_mw(smi):
mol = Chem.MolFromSmiles(smi)
return MolWt(mol)

def mw_df(df):
return df.SMILES.apply(calc_mw)

# -- bcut functions
def bcut_df(df):
return df.apply(calc_bcut)

def calc_bcut(smi):
mol = Chem.MolFromSmiles(smi)
return BCUT2D(mol)

def main():
start = time.time()
df = pd.read_csv(sys.argv[1],sep=" ",names=["SMILES","Name"])
ddf = dd.from_pandas(df,npartitions=16)
ddf['MW'] =
ddf.map_partitions(mw_df,meta='float').compute(scheduler='processes')
ddf['BCUT'] =
ddf.map_partitions(bcut_df,meta='float').compute(scheduler='processes')
print(time.time()-start)
print(ddf.head())


if __name__ == "__main__":
main()
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] XGboost and fingerprint error

2021-02-16 Thread Patrick Walters
I'm not sure why this was sent to rdkit-discuss, but I just pushed a fix to
github.  Sorry for the hassles.

Pat

On Tue, Feb 16, 2021 at 10:15 AM Mandar Kulkarni <
mandar.kulkarni.c...@gmail.com> wrote:

> Hi,
>
> I am trying to repeat the xgboost tutorial from here:
> https://github.com/PatWalters/workshop/blob/master/predictive_models/simple_regression_model.ipynb
>
> when I try to fit training data as
>
> xgb.fit(list(train.fp),train.pIC50)
>
>
>
>
> I am getting an error:
>
> AttributeError: 'list' object has no attribute 'shape'
>
>
>
>
> I am tried to use an array of fingerprints as it is or as a NumPy array,
> but I get a tuple index error.
>
> Please can anyone help me to solve this issue? Thanks in advance.
>
> Best Regards,
> Mandar Kulkarni
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] visualize substructure matches in Molecule Grid Image

2020-09-29 Thread Patrick Walters
I have an example in this blog post that does what you're looking for.

http://practicalcheminformatics.blogspot.com/2019/09/dissecting-hype-with-cheminformatics.html


On Tue, Sep 29, 2020 at 6:04 PM Markus Metz  wrote:

> Thank you Kangway.
> So it is list of lists for each molecule in the grid.
> Perfect.
> Markus
>
> On Sep 29, 2020, at 2:44 PM, Chuang, Kangway 
> wrote:
>
> Hi Markus,
>
> The highlightAtomLists argument is looking for a list for each mol in the
> mol list. Instead of highlightAtomLists=hit_ats, change it to
> highlightAtomLists=[hit_ats] instead.
>
> Kangway
> --
> *From:* Markus Metz 
> *Sent:* Tuesday, September 29, 2020 2:33 PM
> *To:* rdkit-discuss@lists.sourceforge.net <
> rdkit-discuss@lists.sourceforge.net>
> *Subject:* [Rdkit-discuss] visualize substructure matches in Molecule
> Grid Image
>
> Dear all:
> I am looking for some advice.
> Is there a way to highlight substructure matches in molecules displayed as
> Molecule Grid Image.
> I found the highlightAtomLists and highlightBondLists options but the does
> not seem to work.
> Does anybody know how I can accomplish this?
> Please see below for an example.
> Best regards,
> Markus
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GenerateDepictionMatching2DStructure question

2019-05-23 Thread Patrick Walters
Thanks, Lukas and Greg,

That did the trick.  I wrote a couple of functions to encapsulate these
ideas and wrote a gist to demonstrate.
https://gist.github.com/PatWalters/c4300cd354d4b9c2e87e51ce778a973d

Pat



On Thu, May 23, 2019 at 9:42 AM Greg Landrum  wrote:

> Thanks for that answer Lukas.
>
> This is something that's come up before (here's a relatively recent one:
> https://sourceforge.net/p/rdkit/mailman/message/35376573/) and it would
> probably be worth making this easier.
>
> rdDepictor.Compute2DCoords takes an optional bondLength argument that
> allows you to change the default bond length. We should probably add this
> to GenerateDepictionMAtching2DStructure() too. Or would it be more useful
> for that that have some (optional) logic that determines the average bond
> length in the reference structure and uses that as the default?
>
> -greg
>
> On Thu, May 23, 2019 at 3:04 PM Lukas Pravda  wrote:
>
>> Hi Pat,
>>
>>
>>
>> From my experience rdkit uses more or less 1.5 units bond length for 2D
>> depictions. So it makes sense if you rescale your template so that the bond
>> length is 1.5.
>>
>>
>>
>> This is the code snippet I use for the same thing to upscale template
>> with bond lengths 1.0 to 1.5
>>
>>
>>
>> import numpy
>>
>>
>>
>> factor = 1.5
>>
>>
>>
>> mol = Chem.MolFromMolFile(src, sanitize=True)
>>
>> matrix = numpy.zeros((4, 4), numpy.float)
>>
>>
>>
>> for i in range(3):
>>
>>     matrix[i, i] = factor
>>
>> matrix[3, 3] = 1
>>
>>
>>
>> AllChem.TransformMol(mol, matrix)
>>
>> Chem.MolToMolFile(mol, dst)
>>
>>
>>
>> Let me know if this is what you were looking for.
>>
>>
>>
>> Lukas
>>
>> *From: *Patrick Walters 
>> *Date: *Thursday, 23 May 2019 at 13:22
>> *To: *RDKIT mailing list 
>> *Subject: *[Rdkit-discuss] GenerateDepictionMatching2DStructure question
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I'm trying to align a set of structures to a template that I have as
>> molfile.  When I call GenerateDepictionMatching2DStructure it appears that
>> the coordinate for the template are directly copied.  This results in a
>> structure like the one below, where the bond lengths for the template are
>> different from those in the rest of the molecule.
>>
>>
>>
>> [image: image.png]
>>
>> Is there a way around this so that all of the bond lengths will be the
>> same?
>>
>>
>>
>> My code is below, thanks in advance,
>>
>>
>>
>> Pat
>>
>>
>>
>> from rdkit import Chem
>> from rdkit.Chem import rdDepictor
>>
>>
>>
>> mb = """
>>  RDKit  2D
>>
>>   9 10  0  0  0  0  0  0  0  0999 V2000
>> 2.18450.20000. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.4701   -0.21250. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 1.4701   -1.03750. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 2.1845   -1.45000. N   0  0  0  0  0  0  0  0  0  0  0  0
>> 2.8990   -1.03750. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 2.8990   -0.21250. C   0  0  0  0  0  0  0  0  0  0  0  0
>> 3.68360.04250. N   0  0  0  0  0  0  0  0  0  0  0  0
>> 3.6836   -1.29240. N   0  0  0  0  0  0  0  0  0  0  0  0
>> 4.1685   -0.62500. N   0  0  0  0  0  0  0  0  0  0  0  0
>>   5  6  1  0
>>   7  9  1  0
>>   6  7  2  0
>>   8  9  1  0
>>   1  6  1  0
>>   1  2  2  0
>>   2  3  1  0
>>   3  4  2  0
>>   4  5  1  0
>>   5  8  2  0
>> M  END"""
>>
>>
>> tmplt = Chem.MolFromMolBlock(mb)
>>
>> smiles = "FC(F)(F)Oc1(-n2nnc3ccc(NC4CCOCC4)nc32)c1"
>> mol = Chem.MolFromSmiles(smiles)
>> rdDepictor.GenerateDepictionMatching2DStructure(mol, tmplt)
>>
>>
>>
>> ___ Rdkit-discuss mailing
>> list Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] GenerateDepictionMatching2DStructure question

2019-05-23 Thread Patrick Walters
Hi All,

I'm trying to align a set of structures to a template that I have as
molfile.  When I call GenerateDepictionMatching2DStructure it appears that
the coordinate for the template are directly copied.  This results in a
structure like the one below, where the bond lengths for the template are
different from those in the rest of the molecule.

[image: image.png]
Is there a way around this so that all of the bond lengths will be the same?

My code is below, thanks in advance,

Pat

from rdkit import Chem
from rdkit.Chem import rdDepictor

mb = """
 RDKit  2D

  9 10  0  0  0  0  0  0  0  0999 V2000
2.18450.20000. C   0  0  0  0  0  0  0  0  0  0  0  0
1.4701   -0.21250. C   0  0  0  0  0  0  0  0  0  0  0  0
1.4701   -1.03750. C   0  0  0  0  0  0  0  0  0  0  0  0
2.1845   -1.45000. N   0  0  0  0  0  0  0  0  0  0  0  0
2.8990   -1.03750. C   0  0  0  0  0  0  0  0  0  0  0  0
2.8990   -0.21250. C   0  0  0  0  0  0  0  0  0  0  0  0
3.68360.04250. N   0  0  0  0  0  0  0  0  0  0  0  0
3.6836   -1.29240. N   0  0  0  0  0  0  0  0  0  0  0  0
4.1685   -0.62500. N   0  0  0  0  0  0  0  0  0  0  0  0
  5  6  1  0
  7  9  1  0
  6  7  2  0
  8  9  1  0
  1  6  1  0
  1  2  2  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  8  2  0
M  END"""

tmplt = Chem.MolFromMolBlock(mb)
smiles = "FC(F)(F)Oc1(-n2nnc3ccc(NC4CCOCC4)nc32)c1"
mol = Chem.MolFromSmiles(smiles)
rdDepictor.GenerateDepictionMatching2DStructure(mol, tmplt)
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Smarts conversion help

2019-03-26 Thread Patrick Walters
HI Xiaobo,

There's an explicit hydrogen in the SMARTS that shouldn't be there.  I also
wouldn't include the single bonds around the ring closures.

'[#8]=[#6]-3-c1c2c(ccc1)2-[#6](-[#7]-3-*[#1]*)=[#8]')

from rdkit import Chem
from rdkit.Chem import Draw

smi = "O=C(C1=C2C(C=CC=C23)=CC=C1)N([H])C3=O"
mol = Chem.MolFromSmiles(smi)
mol_list = [mol]
core = Chem.MolFromSmarts("[#8]=[#6]3-c1c2c(ccc1)2-[#6](-[#7H]3)=[#8]")
Draw.MolsToGridImage(mol_list,highlightAtomLists=[x.GetSubstructMatch(core)
for x in mol_list])

[image: image.png]
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is there any way to protonate a molecule?

2019-03-25 Thread Patrick Walters
I haven't tried it yet, but this recent paper in the Journal of
Cheminformatics looks interesting.  The authors supply a git repo with code
based on the RDKit.

Dimorphite-DL: an open-source program for enumerating the ionization states
of drug-like small molecules

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0336-9

Pat

On Mon, Mar 25, 2019 at 5:13 AM HC.Ji  wrote:

> I m tring simulate the fragmentation of ESI mass spectra based on the
> [M+H]+ ions. Thus, I want to simulate the ionisation by the addition of one
> proton to heteroatoms. For example,
>
> from rdkit.Chem import AllChem
> from rdkit.Chem.Draw import rdMolDraw2D
> from IPython.display import SVG
> # read mol
> mol =
> Chem.MolFromSmiles('O=C(O)C1=CC(=NNC2=CC=C(C=C2)C(=O)NCCC(=O)O)C=CC1=O')
>
> # draw the mol
> dr = rdMolDraw2D.MolDraw2DSVG(800,800)
> dr.SetFontSize(0.3)
> op = dr.drawOptions()
> for i in range(mol.GetNumAtoms()) :
>   op.atomLabels[i] = mol.GetAtomWithIdx(i).GetSymbol() + str((i+1))
>   AllChem.Compute2DCoords(mol)
>   dr.DrawMolecule(mol)
>   dr.FinishDrawing()
>   svg = dr.GetDrawingText()
>   SVG(svg)
>
> If I want to add one proton to the N atom with the index of #17 and to
> ionize the molecule, what should I do in rdkit?
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] General Smarts Language to select molecules without H elements

2019-02-13 Thread Patrick Walters
Are you interested in aromatic c-H? It looks like 3 of 4 molecules have
hydrogens (if you count methyls)

from rdkit import Chem
from rdkit.Chem.Draw import MolsToGridImage

buff = """N#C/C(C#N)=C(C(F)=C/1F)\C(F)=C(F)C1=C(C#N)\C#N
N#C/C(C#C)=C(C=C/1F)\C(F)=C(F)C1=C(C#N)\C#N
FC1=C(F)C(C#N)=C(F)C(OC(Br)(Br)C)=C1C#N
CC1=C(F)C(C#N)=C(F)C(OC(Br)(Br)C)=C1C#N"""
smiles_list = buff.split("\n")
mol_list = [Chem.MolFromSmiles(x) for x in smiles_list]
MolsToGridImage(mol_list,molsPerRow=4)

[image: image.png]
aromatic_cH = Chem.MolFromSmarts("[cH]")
[x.HasSubstructMatch(aromatic_cH) for x in mol_list]

[False, True, False, False]


If you're just looking for hydrogens, you could do something like this.


from rdkit.Chem.rdMolDescriptors import CalcMolFormula

def has_hydrogen(mol):
mf = CalcMolFormula(mol)
return mf.find("H") >= 0


[has_hydrogen(x) for x in mol_list]


[False, True, True, True]


Pat



On Tue, Feb 12, 2019 at 6:40 PM Li, Xiaobo [xiaoboli] <
xiaobo...@liverpool.ac.uk> wrote:

> Dear all,
>
>
> I have a library of molecules, some of them don't have any H atom. I am
> wondering if there is a Smarts string to select these molecules using
> HasSubstructMatch function.
>
> For example,
> to select following molecules(no H atoms)
>
> N#C/C(C#N)=C(C(F)=C/1F)\C(F)=C(F)C1=C(C#N)\C#N
> FC1=C(F)C(C#N)=C(F)C(F)=C1C#N
> FC1=C(F)C(C#N)=C(F)C(OC(Br)(Br)Br)=C1C#N
>
> from
>
> N#C/C(C#N)=C(C(F)=C/1F)\C(F)=C(F)C1=C(C#N)\C#N
> N#C/C(C#C)=C(C=C/1F)\C(F)=C(F)C1=C(C#N)\C#N
> FC1=C(F)C(C#N)=C(F)C(OC(Br)(Br)C)=C1C#N
> CC1=C(F)C(C#N)=C(F)C(OC(Br)(Br)C)=C1C#N
>
>
> Or is there a better way to do it?
>
>
> Best regards,
>
>
> Xiaobo Li
>
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] conda install rdkit

2019-02-07 Thread Patrick Walters
I've been running the conda version with Python 3.6.6 on a couple of Macs
with no issues.

Pat

On Thu, Feb 7, 2019 at 8:34 AM Greg Landrum  wrote:

> Hi Paul,
>
> That looks like some residual of the horrible problems caused by some
> conda changes that happened last year but that were fixed. I would guess
> that using a reasonably up-to-date version of conda along with the
> conda-forge version of the RDKit with either python 3.6 or 3.7 should work
> fine.
> Please create a new environment and give that a try to see if it helps. If
> there are problems, you may want to make sure that your PYTHONPATH and
> DYLD_FALLBACK_LIBRARY_PATH don't have anything suspicious in them.
>
> -greg
>
>
>
> On Thu, Feb 7, 2019 at 12:38 PM Czodrowski, Paul <
> paul.czodrow...@tu-dortmund.de> wrote:
>
>> Dear Taka,
>>
>>
>>
>> I’m facing trouble with Python3.6 and RDKit on Mac, see here:
>> https://sourceforge.net/p/rdkit/mailman/message/36415085/
>>
>>
>>
>> Therefore, I’m unfortunately restricted to stick to Python3.5
>>
>>
>>
>> I just tried out the conda-forge, but here rdkit is still in the 2018-03
>> version:
>>
>> rdkit:   2018.03.3.0-py35h04f5b5a_1 rdkit   -->
>> 2018.03.4-py35h557c172_0 conda-forge
>>
>>
>>
>>
>>
>> Cheers,
>>
>> Paul
>>
>>
>>
>> *Von: *Taka Seri 
>> *Datum: *Donnerstag, 7. Februar 2019 um 12:43
>> *An: *"Czodrowski, Paul" 
>> *Cc: *"rdkit-discuss@lists.sourceforge.net" <
>> rdkit-discuss@lists.sourceforge.net>
>> *Betreff: *Re: [Rdkit-discuss] conda install rdkit
>>
>>
>>
>> Dear Paul,
>>
>>
>>
>> Current version of rdkit is provided for python3.6 and 2.7 from anaconda
>> "rdkit" and "conda-forge" channel.
>>
>> https://anaconda.org/rdkit/rdkit/files?version=2018.09.1.0
>>
>> https://anaconda.org/conda-forge/rdkit/files?version=2018.09.1
>>
>> i recommend you to make python3.6 environment for using newest version of
>> rdkit if you don't have a specific reason to use python3.5.
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Taka
>>
>>
>>
>> 2019年2月7日(木) 20:06 Czodrowski, Paul :
>>
>> Dear RDKitters,
>>
>>
>>
>> in my python3.5 conda environment on macOS, I’m running rdkit version
>> 2018.03.3.0. I would like to update it to 2018.09, but simply “conda update
>> rdkit” or “conda update -c rdkit rdkit” does not help.
>>
>>
>>
>> Any help/comment would be highly appreciated.
>>
>>
>>
>>
>>
>> Cheers,
>>
>> Paul
>>
>> *Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie
>> ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für
>> diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender
>> und vernichten Sie diese Mail. Vielen Dank. *
>>
>>
>> * Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen
>> ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher
>> Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines
>> solchen Schriftstücks per Telefax erfolgen. Important note: The information
>> included in this e-mail is confidential. It is solely intended for the
>> recipient. If you are not the intended recipient of this e-mail please
>> contact the sender and delete this message. Thank you. Without prejudice of
>> e-mail correspondence, our statements are only legally binding when they
>> are made in the conventional written form (with personal signature) or when
>> such documents are sent by fax. *
>>
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>>
>> *Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie
>> ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für
>> diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender
>> und vernichten Sie diese Mail. Vielen Dank. Unbeschadet der Korrespondenz
>> per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich,
>> wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder
>> durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.
>> Important note: The information included in this e-mail is confidential. It
>> is solely intended for the recipient. If you are not the intended recipient
>> of this e-mail please contact the sender and delete this message. Thank
>> you. Without prejudice of e-mail correspondence, our statements are only
>> legally binding when they are made in the conventional written form (with
>> personal signature) or when such documents are sent by fax. *
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list

[Rdkit-discuss] SMILES validation question

2019-01-11 Thread Patrick Walters
Hi all,

I ran into a case that I found confusing.  If convert this SMILES to an
RDKit molecule, I get a valid molecule.

In [2]: mol = Chem.MolFromSmiles("O=C(CC1SCCC1)c1c1N")

In [3]: mol
Out[3]: 

However, if I convert the molecule to SMILES then covert it back to a
molecule, it is no longer valid.

In [4]: smi = Chem.MolToSmiles(mol)

In [5]: new_mol = Chem.MolFromSmiles(smi)
[20:35:06] Explicit valence for atom # 1 O, 3, is greater than permitted
RDKit ERROR: [20:35:06] Explicit valence for atom # 1 O, 3, is greater than
permitted

In [6]: new_mol

In [7]: new_mol is None
Out[7]: True

I'd like to be able to catch invalid molecules like this in one step rather
than two.  What am I
doing wrong?

Thanks,

Pat
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] LogS (water solubility) descriptor

2018-08-30 Thread Patrick Walters
Hi Dimitar,

I put an RDKit implementation of the ESOL method from the original paper by
Delaney on my GitHub site.
I also refit the coefficients to maximize performance with the RDKit
calculated descriptors.
https://github.com/PatWalters/solubility
Note that solubility prediction is a really hard problem and ESOL is, at
best, a crude approximation.
Even measuring solubility is difficult, see this and other papers by
Mitchell for more information.
https://pubs.acs.org/doi/abs/10.1021/mp500103r

Best,

Pat



On Thu, Aug 30, 2018 at 5:28 AM Dimitar Yonchev <
dimitar.g.yonc...@hotmail.com> wrote:

> Dear RDKitters,
>
>
> I would like to calculate logS (water solubility) descriptor values and
> until now I haven't found out a way to do it in a straghtforward manner.
>
>
> https://sourceforge.net/p/rdkit/mailman/message/31218416/
>
>
> This seems to be the only somewhat relevant discussion on the topic when
> there was an idea to include the LogS descriptor to rdkit, so either I'm
> missing it or it, in fact, hasn't happened yet.
>
>
> Do you have any idea of a relatively straightforward LogS calculation
> except for the one suggested above?
>
>
> Many thanks!
>
>
> Cheers,
>
> Dimitar
>
>
> 
>
> Dimitar Yonchev
>
> Research Assistant
>
> Life Science Informatics Research Group
>
> Bonn Aachen Institute of Information Technology
>
> LIMES Institute
>
> University of Bonn, Germany
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Tanimoto Similarity

2018-07-04 Thread Patrick Walters
I would highly recommend this paper where the authors describe an
alternative to arbitrary similarity cutoffs

https://pubs.acs.org/doi/pdf/10.1021/ci7004498

Pat

On Wed, Jul 4, 2018 at 9:31 AM Maciek Wójcikowski 
wrote:

> Hi
>
> As Nils has mentioned this is fingerprint dependent. ECFP4 have the
> significant cutoff ~0.4, see https://pubs.acs.org/doi/10.1021/ci7004498
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2018-07-04 8:44 GMT+02:00 Nils Weskamp :
>
>> Dear Phuong,
>>
>> unfortunately, there is no generic answer to this question since it is
>> highly dependent on the fingerprint, the type of compounds, your
>> specific application and also your chemical intuition. I can only
>> recommend to test a range of different cutoff values and to see how
>> happy you are with the results.
>>
>> If you have access to a list of analogs that you definitely want to find
>> ("known actives") and a large set of known irrelevant compounds, you
>> might be able to use statistical analyses to derive some kind of
>> "optimal" threshold.
>>
>> If we are talking about path-oriented fingerprints (like the RDKit
>> Chemical Fingerprints) and "normal" drug-like molecules, I would
>> typically go down to 0.70 - 0.75 and then manually weed out false hits.
>>
>> Hope this helps,
>> Nils
>>
>> Am 04.07.2018 um 02:24 schrieb Phuong Chau:
>> > To whom it may concern,
>> >
>> > I was working on finding a group of possible neighbors (similar)
>> > chemicals based on Tanimoto Similarity. I am not sure what is the
>> > optimal cutoff for finding similar chemicals. I searched online and they
>> > said it is 0.85 but there are also many exceptions they mentioned about.
>> > Do you have any suggestions?
>> >
>> > Thank you so much for your help
>> >
>> >
>> >
>> --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >
>> >
>> >
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> >
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] question on rdRGroupDecomposition

2018-05-15 Thread Patrick Walters
Hi Greg,

Don't expend a lot of effort on this.  I ended up writing my own
implementation of R-group decomposition.

Pat

On Tue, May 15, 2018 at 10:00 PM Greg Landrum <greg.land...@gmail.com>
wrote:

> Hi Pat,
>
> This one has me stumped.
> @Brian: do you understand what's going on here or should I fire up the
> debugger?
>
> -greg
>
>
>
> On Mon, May 14, 2018 at 4:24 AM Patrick Walters <wpwalt...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I'm hoping someone can help me with rdRGroupDecomposition.  I'd like to
>> be able to specify specific R-group locations AND match cases where R=H.
>>  The example below illustrates what I'm talking about.
>> When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R
>> == H are skipped.  I tried putting an explicit hydrogen on the core to
>> block a position, but it appears that the explicit hydrogen is ignored.
>>
>> from rdkit import Chem
>> from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition,
>> RGroupDecompositionParameters
>>
>> # run an RGroupDecomposition on a set of molecules
>> def process_r_groups(core_mol,rg_params,mols):
>> rg = RGroupDecomposition(core_mol,rg_params)
>> for mol in mol_list:
>> rg.Add(mol)
>> rg.Process()
>> return [x for x in rg.GetRGroupsAsRows(asSmiles=True)]
>>
>>
>> buff = """CCc1ccnc(C)n1
>> Cc1ncccn1
>> Cc1cnc(C)nc1"""
>>
>> mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")]
>> core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1")
>> # default parameters, note that 3 R-groups are returned, the
>> # explicit hydrogen is ignored
>> params_1 = RGroupDecompositionParameters()
>> for row in process_r_groups(core,params_1,mol_list):
>> print(row)
>>
>> print()
>>
>> params_2 = RGroupDecompositionParameters()
>> params_2.onlyMatchAtRGroups = True
>> # run with the onlyMatchAtRGroups parameter
>> # now only one row is returned
>> for row in process_r_groups(core,params_2,mol_list):
>> print(row)
>>
>> The output from the script above is
>>
>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>> 'R2': '[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'}
>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>> 'R2': '[H][*:2]', 'R3': '[H][*:3]'}
>> {'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]',
>> 'R2': '[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'}
>>
>> {'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2':
>> '[H]C([H])([H])C([H])([H])[*:2]'}
>>
>> I'd like to figure out how I can only get the substituents at the labeled
>> positions, but have it match where R1 == H or R2 == H.
>>
>> Thanks in advance,
>>
>> Pat
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] question on rdRGroupDecomposition

2018-05-13 Thread Patrick Walters
Hi All,

I'm hoping someone can help me with rdRGroupDecomposition.  I'd like to be
able to specify specific R-group locations AND match cases where R=H.   The
example below illustrates what I'm talking about.
When RGroupDecompositionParameters.onlyMatchAtRGroups = True, cases where R
== H are skipped.  I tried putting an explicit hydrogen on the core to
block a position, but it appears that the explicit hydrogen is ignored.

from rdkit import Chem
from rdkit.Chem.rdRGroupDecomposition import RGroupDecomposition,
RGroupDecompositionParameters

# run an RGroupDecomposition on a set of molecules
def process_r_groups(core_mol,rg_params,mols):
rg = RGroupDecomposition(core_mol,rg_params)
for mol in mol_list:
rg.Add(mol)
rg.Process()
return [x for x in rg.GetRGroupsAsRows(asSmiles=True)]


buff = """CCc1ccnc(C)n1
Cc1ncccn1
Cc1cnc(C)nc1"""

mol_list = [Chem.MolFromSmiles(x) for x in buff.split("\n")]
core = Chem.MolFromSmiles("[H]c1cc([2*])nc([1*])n1")
# default parameters, note that 3 R-groups are returned, the
# explicit hydrogen is ignored
params_1 = RGroupDecompositionParameters()
for row in process_r_groups(core,params_1,mol_list):
print(row)

print()

params_2 = RGroupDecompositionParameters()
params_2.onlyMatchAtRGroups = True
# run with the onlyMatchAtRGroups parameter
# now only one row is returned
for row in process_r_groups(core,params_2,mol_list):
print(row)

The output from the script above is

{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H][*:2]', 'R3': '[H]C([H])([H])C([H])([H])[*:3]'}
{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H][*:2]', 'R3': '[H][*:3]'}
{'Core': '*c1nc([*:1])nc([*:3])c1[*:2]', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H]C([H])([H])[*:2]', 'R3': '[H][*:3]'}

{'Core': 'c1cc([*:2])nc([*:1])n1', 'R1': '[H]C([H])([H])[*:1]', 'R2':
'[H]C([H])([H])C([H])([H])[*:2]'}

I'd like to figure out how I can only get the substituents at the labeled
positions, but have it match where R1 == H or R2 == H.

Thanks in advance,

Pat
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMARTS parsing error with isotopes

2018-05-04 Thread Patrick Walters
Hi All,

I've been playing around with some of the structural alerts from ChEMBL and
noticed that on alert was generating a SMARTS Parse Error with the RDKit.

[2H,3H,13C,14C,15N,125I,23F,22Na,32P,33P,35S,45Ca,57Co,103Ru,141Ce]

It appears that the issue was reported previously and that Greg fixed the
bug.

https://github.com/rdkit/rdkit/issues/1719

It seems like the problem is still there in 2018.03

>>> from rdkit import rdBase
>>> rdBase.rdkitVersion
'2018.03.1'
>>> from rdkit import Chem
>>> Chem.MolFromSmarts('[2H,13C]')
[22:06:46] SMARTS Parse Error: syntax error for input: '[2H,13C]'


Pat
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] seg fault when importing Chem on OS-X 10.12

2018-04-16 Thread Patrick Walters
Hi All,

I installed the latest RDKit using conda

conda create -c rdkit -n rdkit_2017 rdkit

When I import Chem I get a seg fault

➜  ~ source activate rdkit_2017
(rdkit_2017) ➜  ~ python
Python 3.5.5 |Anaconda, Inc.| (default, Mar 12 2018, 16:25:05)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from rdkit import Chem
[1]85097 segmentation fault  python

Has anyone else encountered this?

Thanks,

Pat
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] reassembling a molecule from R-groups

2018-04-15 Thread Patrick Walters
Thanks Andrew, the SMILES approach seemed to have quite a few edge cases so
I wrote something to work directly on a molecule.


#!/usr/bin/env python

import sys
from rdkit import Chem
from collections import defaultdict
from rdkit.Chem.rdchem import EditableMol


# Thanks to steeveslab-blog for example of how to edit RDKit molecules
# http://asteeves.github.io/blog/2015/01/14/editing-in-rdkit/
# Thanks to Andrew Dalke for the function name


def weld_r_groups(input_mol):
# First pass loop over atoms and find the atoms with an AtomMapNum
join_dict = defaultdict(list)
for atom in input_mol.GetAtoms():
map_num = atom.GetAtomMapNum()
if map_num > 0:
join_dict[map_num].append(atom)

# Second pass, transfer the atom maps to the neighbor atoms
for idx, atom_list in join_dict.items():
if len(atom_list) == 2:
atm_1, atm_2 = atom_list
nbr_1 = [x.GetOtherAtom(atm_1) for x in atm_1.GetBonds()][0]
nbr_1.SetAtomMapNum(idx)
nbr_2 = [x.GetOtherAtom(atm_2) for x in atm_2.GetBonds()][0]
nbr_2.SetAtomMapNum(idx)

# Nuke all of the dummy atoms
new_mol = Chem.DeleteSubstructs(input_mol, Chem.MolFromSmarts('[#0]'))

# Third pass - arrange the atoms with AtomMapNum, these will be
connected
bond_join_dict = defaultdict(list)
for atom in new_mol.GetAtoms():
map_num = atom.GetAtomMapNum()
if map_num > 0:
bond_join_dict[map_num].append(atom.GetIdx())

# Make an editable molecule and add bonds between atoms with
correspoing AtomMapNum
em = EditableMol(new_mol)
for idx, atom_list in bond_join_dict.items():
if len(atom_list) == 2:
start_atm, end_atm = atom_list
em.AddBond(start_atm, end_atm,
order=Chem.rdchem.BondType.SINGLE)

final_mol = em.GetMol()

# remove the AtomMapNum values
for atom in final_mol.GetAtoms():
atom.SetAtomMapNum(0)
final_mol = Chem.RemoveHs(final_mol)

return final_mol


if __name__ == "__main__":
mol_to_weld = Chem.MolFromSmiles(
"CN(C)CC(Br)c1cc([*:2])c([*:1])cn1.[H]C([H])([H])[*:1].[H][*:2]")
welded_mol = weld_r_groups(mol_to_weld)
print(Chem.MolToSmiles(welded_mol))


Best,

Pat


On Sun, Apr 15, 2018 at 12:16 PM, Patrick Walters <wpwalt...@gmail.com>
wrote:

> Hi All,
>
> I was about to write a function to reassemble a molecule from a core +
> R-groups, but I thought I'd check and see if such a function already
> exists.  This is work with the output of rdRGroupDecomposition
>
> Gvien a core:
> CN(C)CC(Br)c1cc([*:2])c([*:1])cn1
>
> Plus a set of R-groups
> [H]C([H])([H])[*:1]
> [H][*:2]
>
> Reconnect the pieces to generate a molecule
> CN(C)CC(Br)c1ccc(C)cn1
>
> Thanks,
>
> Pat
>
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] reassembling a molecule from R-groups

2018-04-15 Thread Patrick Walters
Hi All,

I was about to write a function to reassemble a molecule from a core +
R-groups, but I thought I'd check and see if such a function already
exists.  This is work with the output of rdRGroupDecomposition

Gvien a core:
CN(C)CC(Br)c1cc([*:2])c([*:1])cn1

Plus a set of R-groups
[H]C([H])([H])[*:1]
[H][*:2]

Reconnect the pieces to generate a molecule
CN(C)CC(Br)c1ccc(C)cn1

Thanks,

Pat
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] comparing two or more tables of molecules

2016-11-29 Thread Patrick Walters
The Layered InChI (LyChi), developed by Trung Nguyen at NCATS was designed
to directly address the problem you describe.  I don't have any first hand
experience with this method (yet), but it looks intriguing.

https://github.com/ncats/lychi


Pat

On Mon, Nov 28, 2016 at 11:25 AM, Stephen O'hagan 
wrote:

> Has anyone come up with fool-proof way of matching structurally equivalent
> molecules?
>
>
>
> Unique Smiles or InChI String comparisons don’t appear to work presumable
> because there are different but equivalent structures, e.g. explicit vs
> non-explicit H’s, Kekule vs Aromatic, isomeric forms vs non-isomeric form,
> tautomers etc.
>
>
>
> I also expect that comparing InChI strings might need something more than
> just a simple string comparison, such as masking off stereo information
> when you don’t care about stereo isomers.
>
>
>
> I assume there are suitable tools within RDKit that can do this?
>
>
>
> N.B. I need to collate tables from several sources that have a mix of
> smiles / InChI / sdf molecular representations.
>
>
>
> I usually use RDKit via Python and/or Knime.
>
>
>
> Cheers,
>
> Steve.
>
>
>
> 
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering functions in Java API

2015-02-23 Thread Patrick Walters
I agree that there are plenty of implementations of clustering, machine
learning, etc.  It would be better for the RDKit developers to focus on
cheminformatics.   This being said, there are some opportunities for domain
specific performance enhancement.  One of the slow steps in many clustering
algorithms is the calculation of a distance matrix and identification of
neighbors.  If you're clustering fingerprints, I'd recommend looking at Andrew
Dalke's ChemFP http://chemfp.com/.  Andrew has applied a multitude of
tricks that can make clustering blazingly fast.   The ChemFP examples
include an implementation of Taylor-Butina clustering.  Even better, ChemFP
works out of the box with the RDKit.

Pat



On Mon, Feb 23, 2015 at 7:02 AM, Maciek Wójcikowski mac...@wojcikowski.pl
wrote:

 Hello,

 If interested in clustering in python I can recommend, as usual, sklearn:
 http://scikit-learn.org/stable/modules/clustering.html
 It's pretty much all you should need. Have fun!

 
 Pozdrawiam,  |  Best regards,
 Maciek Wójcikowski
 mac...@wojcikowski.pl

 2015-02-23 11:43 GMT+01:00 Anthony Bradley anthony.brad...@worc.ox.ac.uk
 :

   Hi Anthony,



 On Sun, Feb 22, 2015 at 11:03 AM, Anthony Bradley 
 anthony.brad...@worc.ox.ac.uk wrote:

 Hi all,

 I am currently working with RDKit from the Java API (well jython
 actually).

 As has been discussed most of the documentation for this is found by
 trawling:

 Code/JavaWrappers/gmwrapper/src-test/org/RDKit/
 and
 Code/JavaWrappers/gmwrapper/src/org/RDKit/

 However I'm trying to perform a simple clustering. I can build my
 distance matrix - but I can't see where the actual clustering algorithms
 live.

 It may well be my grepping skills are not what they should be!



 No need to have any concerns about your skills with grep, the clustering
 functionality is not exposed via the SWIG wrappers. As currently configured
 the code isn't available as a library, it's really only useable from
 python. It's a medium-sized amount of work to convert this to a library, so
 it's doable, but I'm not sure it's worth it.



 That seems fair enough and there are definitely other options out there.
 It was more of method consistency thing – so I could be using the same code
 from the python / jython side.



 I've been assuming that there are high(er) quality replacements available
 for most of the RDKit machine learning functionality. Since it's somewhat
 removed from the cheminformatics focus, I haven't really put any time
 into that code in the past few years. Does this sound wrong to anyone? Any
 arguments that the clustering code is worth investing some time in?



 Unless anybody else is interested – I can see why it would be low
 priority!



 -greg



 Thanks a lot for responding so quickly and effectively!



 Best,



 Anthony




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] portable PostgreSQL + RDKit cartridge?

2014-08-28 Thread Patrick Walters
If you want everything in one nice package, you may want to look at
MyChEMBL.  This has a VM with PostgreSQL, kinime, python, the RDKit, and
ChEMBL.

http://chembl.blogspot.com/2013/10/chembl-virtual-machine-aka-mychembl.html

Pat



On Thu, Aug 28, 2014 at 10:11 AM, Michal Krompiec michal.kromp...@gmail.com
 wrote:

 Dear Jan, Thanks a lot. It remains a painstaking DIY job stil then.

 Greg: would it be possible to add the binary of the cartridge
 (compiled with, say, the latest PostgreSQL) to the binary Win32
 distribution? It would allow to have a portable
 python+rdkit+postgresql+knime+... bundle, which would simplify the
 lives of many ;)

 Best wishes,
 Michal

 On 28 August 2014 15:04, Jan Holst Jensen j...@biochemfusion.com wrote:
  On 2014-08-28 14:34, Michal Krompiec wrote:
 
  Hello, has anybody tried to compile a portable Windows binary of
  PostgreSQL with RDKit cartridge? There is a portable PostreSQL at
  http://sourceforge.net/projects/postgresqlportable/ and I wonder if it
  is possible to use it with the cartridge.
  Best regards,
  Michal
 
 
  Hi Michal,
 
  I got through building a Windows version of the RDKit cartridge a while
  back, but I didn't end up using it for real. I would think that the
  instructions still mostly apply:
 
  http://sourceforge.net/p/rdkit/mailman/message/30127487/
 
  If the resulting DLL and the extension control files are put in the right
  place in the portable image, I guess you should be able to get it
 working.
 
  Cheers
  -- Jan


 --
 Slashdot TV.
 Video for Nerds.  Stuff that matters.
 http://tv.slashdot.org/
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Chem.AddHs() doesn't care about compound layout

2014-08-21 Thread Patrick Walters
Your input molfile lacks the 2D on line 2, e.g.


 RDKit  2D

Pat


On Thu, Aug 21, 2014 at 5:34 AM, Michał Nowotka mmm...@gmail.com wrote:

 OK, I'm closer to finding bug in my code. So I have this ctab:

  print ctab

 Converted by chembl_beaker ver. 0.5.20

  10 11  0  0  0  0  1 V2000
-0.73671.82950. C   0  0
-1.45111.41700. C   0  0
-1.45110.59190. C   0  0
-0.73670.17940. C   0  0
-0.02220.59190. C   0  0
-0.02221.41700. C   0  0
-0.7367   -0.64570. C   0  0
-0.0223   -1.05820. C   0  0
 0.6922   -0.64580. C   0  0
 0.69230.17930. C   0  0
   1  2  2  0
   2  3  1  0
   3  4  2  0
   4  5  1  0
   5  6  2  0
   6  1  1  0
   7  8  2  0
   8  9  1  0
   9 10  2  0
  10  5  1  0
   4  7  1  0
 M  END


 I read it into rdkit:

 mol = Chem.MolFromMolBlock(ctab)

  print Chem.MolToMolBlock(mol)

  RDKit  3D

  10 11  0  0  0  0  0  0  0  0999 V2000
-0.73671.82950. C   0  0  0  0  0  0  0  0  0  0  0  0
-1.45111.41700. C   0  0  0  0  0  0  0  0  0  0  0  0
-1.45110.59190. C   0  0  0  0  0  0  0  0  0  0  0  0
-0.73670.17940. C   0  0  0  0  0  0  0  0  0  0  0  0
-0.02220.59190. C   0  0  0  0  0  0  0  0  0  0  0  0
-0.02221.41700. C   0  0  0  0  0  0  0  0  0  0  0  0
-0.7367   -0.64570. C   0  0  0  0  0  0  0  0  0  0  0  0
-0.0223   -1.05820. C   0  0  0  0  0  0  0  0  0  0  0  0
 0.6922   -0.64580. C   0  0  0  0  0  0  0  0  0  0  0  0
 0.69230.17930. C   0  0  0  0  0  0  0  0  0  0  0  0
   1  2  2  0
   2  3  1  0
   3  4  2  0
   4  5  1  0
   5  6  2  0
   6  1  1  0
   7  8  2  0
   8  9  1  0
   9 10  2  0
  10  5  1  0
   4  7  1  0
 M  END

  mol.GetConformer().Is3D()
 True

 Can you tell my why RDKit thinks this is 3D? The column with z
 coordinates is all zeros. How can I build my input ctab in such a way
 to clearly indicate that this is 2D depiction?


 On Thu, Aug 21, 2014 at 8:16 AM, Michał Nowotka mmm...@gmail.com wrote:
  Thank you Greg. I have to verify your example with my instance of
  RDKit, maybe my old version behaves differently.
  Which part of the code (C++ or Python part) is responsible for
  calculating coordinates?
 
  On Thu, Aug 21, 2014 at 4:40 AM, Greg Landrum greg.land...@gmail.com
 wrote:
  I can't seem to reproduce the problem.
 
  Here's an example showing that the original atom coordinates are
 preserved
  when Chem.AddHs is called with the addCoords argument:
 
  #
 
  In [9]: mb=
 ...:   Mrv0541 08211405312D
 ...:
 ...:   6  6  0  0  0  0999 V2000
 ...:-3.55780.82500. C   0  0  0  0  0  0  0  0  0
 0  0
  0
 ...:-4.27230.41250. C   0  0  0  0  0  0  0  0  0
 0  0
  0
 ...:-4.2723   -0.41250. C   0  0  0  0  0  0  0  0  0
 0  0
  0
 ...:-3.5578   -0.82500. C   0  0  0  0  0  0  0  0  0
 0  0
  0
 ...:-1.8268   -0.72190. C   0  0  0  0  0  0  0  0  0
 0  0
  0
 ...:-2.84330.41250. C   0  0  0  0  0  0  0  0  0
 0  0
  0
 ...:   1  2  1  0  0  0  0
 ...:   1  6  2  0  0  0  0
 ...:   2  3  2  0  0  0  0
 ...:   3  4  1  0  0  0  0
 ...:   4  5  2  0  0  0  0
 ...:   5  6  1  0  0  0  0
 ...: M  END
 ...: 
 
  In [10]: m = Chem.MolFromMolBlock(mb)
 
  In [11]: nm = Chem.AddHs(m,addCoords=True)
 
  In [12]: print Chem.MolToMolBlock(nm)
 
   RDKit  2D
 
   12 12  0  0  0  0  0  0  0  0999 V2000
 -3.55780.82500. C   0  0  0  0  0  0  0  0  0  0  0  0
 -4.27230.41250. C   0  0  0  0  0  0  0  0  0  0  0  0
 -4.2723   -0.41250. C   0  0  0  0  0  0  0  0  0  0  0  0
 -3.5578   -0.82500. C   0  0  0  0  0  0  0  0  0  0  0  0
 -1.8268   -0.72190. C   0  0  0  0  0  0  0  0  0  0  0  0
 -2.84330.41250. C   0  0  0  0  0  0  0  0  0  0  0  0
 -3.55781.92500. H   0  0  0  0  0  1  0  0  0  0  0  0
 -5.22490.96250. H   0  0  0  0  0  1  0  0  0  0  0  0
 -5.2249   -0.96250. H   0  0  0  0  0  1  0  0  0  0  0  0
 -3.8108   -1.89550. H   0  0  0  0  0  1  0  0  0  0  0  0
 -0.8095   -1.14040. H   0  0  0  0  0  1  0  0  0  0  0  0
 -2.15001.26650. H   0  0  0  0  0  1  0  0  0  0  0  0
1  2  2  0
1  6  1  0
2  3  1  0
3  4  2  0
4  5  1  0
5  6  2  0
1  7  1  0
2  8  1  0
3  9  1  0
4 10  1  0
5 11  1  0
6 12  1  0
  M  END
 
  #
 
  Note that addCoords does add the Hs in 3D:
 
  #
 
  In [5]: print Chem.MolToMolBlock(m)
 
   RDKit  2D
 
2  1  0  0  0  0 

Re: [Rdkit-discuss] Sanitization Errors

2014-04-24 Thread Patrick Walters
It looks like the problem here is a covalent bond to the counter ion.

Pat


On Thu, Apr 24, 2014 at 6:04 AM, Christos Kannas chriskan...@gmail.comwrote:

 Hi all,

 I'm having a dozen of compounds, where some of them have a charged atom
 (see the attached SMILES file).

 When I parse the file I get sanitization errors on the compounds with the
 charged atoms.
 But when I view them with MarvinView 6.2.0 all goes fine.

 I'm using an RDKit build from github, version 2014.03.1pre.

 In order to see what sanitization error occurs in each case I did the
 following:

 1. To parse all compounds without sanitization

 suppl = Chem.SmilesMolSupplier('data/SurfactantTestCompounds.smi',
 titleLine=True, sanitize=False)
 molsList = [x for x in suppl if x is not None]
 print len(molsList)

 2. Sanitize the compounds and catch specific errors

 for m in molsList:
 error = Chem.SanitizeMol(m, catchErrors=True)
 if error:
 print m.GetProp(_Name), Chem.MolToSmiles(m), error

 2.1 the output is as follows

 NaLAS ()c1ccc(S(=O)(=O)O[Na+])cc1 SANITIZE_PROPERTIES
 NaOLAS ()C1=CC=CC=C1S(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SLES3EO OCCOCCOCCOS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SLES2EO OCCOCCOS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SLES1EO OCCOS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SDS OS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 DTAC [N+](C)(C)(C)[Cl-] SANITIZE_PROPERTIES
 Sdoc (=O)O[Na+] SANITIZE_PROPERTIES

 3. Visualize compounds

 Draw.MolsToGridImage(molsList, molsPerRow=5, legends=[x.GetProp('_Name')
 for x in molsList], kekulize=True)

 For visualized output check
 http://nbviewer.ipython.org/gist/anonymous/11248962/Sanitization_Errors.ipynb

 Is this an expected behaviour?
 Is there something I can do as a fix?

 Regards,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 --
 Start Your Social Network Today - Download eXo Platform
 Build your Enterprise Intranet with eXo Platform Software
 Java Based Open Source Intranet - Social, Extensible, Cloud Ready
 Get Started Now And Turn Your Intranet Into A Collaboration Platform
 http://p.sf.net/sfu/ExoPlatform
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitization Errors

2014-04-24 Thread Patrick Walters
Yes, exactly


On Thu, Apr 24, 2014 at 7:19 AM, Christos Kannas chriskan...@gmail.comwrote:

 Hi Patrick,

 Thanks.

 So the correct would be, sodium should not have an explicit bond with the
 oxygen.
 From O=S(c1ccc(C(CCC))cc1)(O-[Na+])=O I should
 have O=S(c1ccc(C(CCC))cc1)([O-])=O.[Na+]

 Similar to the rest of my compounds.

 And regarding nitrogen it already has 4 bonds with carbons so chloride
 should be disconnected.
 [N+]([Cl-])(C)(C)C - [N+](C)(C)C.[Cl-]

 Regards,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 On 24 April 2014 11:37, Patrick Walters wpwalt...@gmail.com wrote:

 It looks like the problem here is a covalent bond to the counter ion.

 Pat


 On Thu, Apr 24, 2014 at 6:04 AM, Christos Kannas 
 chriskan...@gmail.comwrote:

 Hi all,

 I'm having a dozen of compounds, where some of them have a charged atom
 (see the attached SMILES file).

 When I parse the file I get sanitization errors on the compounds with
 the charged atoms.
 But when I view them with MarvinView 6.2.0 all goes fine.

 I'm using an RDKit build from github, version 2014.03.1pre.

 In order to see what sanitization error occurs in each case I did the
 following:

 1. To parse all compounds without sanitization

 suppl = Chem.SmilesMolSupplier('data/SurfactantTestCompounds.smi',
 titleLine=True, sanitize=False)
 molsList = [x for x in suppl if x is not None]
 print len(molsList)

 2. Sanitize the compounds and catch specific errors

 for m in molsList:
 error = Chem.SanitizeMol(m, catchErrors=True)
 if error:
 print m.GetProp(_Name), Chem.MolToSmiles(m), error

 2.1 the output is as follows

 NaLAS ()c1ccc(S(=O)(=O)O[Na+])cc1 SANITIZE_PROPERTIES
 NaOLAS ()C1=CC=CC=C1S(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SLES3EO OCCOCCOCCOS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SLES2EO OCCOCCOS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SLES1EO OCCOS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 SDS OS(=O)(=O)O[Na+] SANITIZE_PROPERTIES
 DTAC [N+](C)(C)(C)[Cl-] SANITIZE_PROPERTIES
 Sdoc (=O)O[Na+] SANITIZE_PROPERTIES

 3. Visualize compounds

 Draw.MolsToGridImage(molsList, molsPerRow=5, legends=[x.GetProp('_Name')
 for x in molsList], kekulize=True)

 For visualized output check
 http://nbviewer.ipython.org/gist/anonymous/11248962/Sanitization_Errors.ipynb

 Is this an expected behaviour?
 Is there something I can do as a fix?

 Regards,

 Christos

 Christos Kannas

 Researcher
 Ph.D Student

 Mob (UK): +44 (0) 7447700937
 Mob (Cyprus): +357 99530608

 [image: View Christos Kannas's profile on 
 LinkedIn]http://cy.linkedin.com/in/christoskannas


 --
 Start Your Social Network Today - Download eXo Platform
 Build your Enterprise Intranet with eXo Platform Software
 Java Based Open Source Intranet - Social, Extensible, Cloud Ready
 Get Started Now And Turn Your Intranet Into A Collaboration Platform
 http://p.sf.net/sfu/ExoPlatform
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] build error

2014-04-04 Thread Patrick Walters
Hi All,

I ran into an error building the RDKit from the last git pull on OS-X 10.8

Has anyone else run into this?

Thanks,

Pat


Linking CXX shared library ../../../lib/libSubstructMatch.dylib
[ 30%] Built target SubstructMatch
[ 30%] [BISON][SmilesY] Building parser with bison 2.3
smiles.yy:48.9-16: syntax error, unexpected identifier, expecting string
make[2]: *** [../Code/GraphMol/SmilesParse/smiles.tab.cpp] Error 1
make[1]: *** [Code/GraphMol/SmilesParse/CMakeFiles/SmilesParse.dir/all]
Error 2
make: *** [all] Error 2
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] build error

2014-04-04 Thread Patrick Walters
Thanks Greg, that did the trick


On Fri, Apr 4, 2014 at 12:26 PM, Greg Landrum greg.land...@gmail.comwrote:

 Hi Pat,

 If you add -DRDK_USE_FLEXBISON=OFF to your cmake command line (rerun
 cmake with this added), the problem should go away.

 I need to make this the default. Sorry for the inconvenience.

 -greg

 On Friday, April 4, 2014, Patrick Walters wpwalt...@gmail.com wrote:

 Hi All,

 I ran into an error building the RDKit from the last git pull on OS-X 10.8

 Has anyone else run into this?

 Thanks,

 Pat


 Linking CXX shared library ../../../lib/libSubstructMatch.dylib
 [ 30%] Built target SubstructMatch
 [ 30%] [BISON][SmilesY] Building parser with bison 2.3
 smiles.yy:48.9-16: syntax error, unexpected identifier, expecting string
 make[2]: *** [../Code/GraphMol/SmilesParse/smiles.tab.cpp] Error 1
 make[1]: *** [Code/GraphMol/SmilesParse/CMakeFiles/SmilesParse.dir/all]
 Error 2
 make: *** [all] Error 2


--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 2D pharmacophore question

2013-09-03 Thread Patrick Walters
Hi All,

I was working through the 2D pharmacophore example in the Getting Started
docs

http://www.rdkit.org/docs/GettingStartedInPython.html#d-pharmacophore-fingerprints

and I threw an exeception that I don't understand.   Here's my code

==
#!/usr/bin/env python

from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit.Chem.Pharm2D.SigFactory import SigFactory
from rdkit.Chem.Pharm2D import Generate

mol = Chem.MolFromSmiles(CC12CCC3C(CCc4cc(O)ccc34)C2CC(O)C1O)

fdefName = 'MinimalFeatures.fdef'
featFactory = ChemicalFeatures.BuildFeatureFactory(fdefName)

sigFactory = SigFactory(featFactory,minPointCount=2,maxPointCount=3)
sigFactory.SetBins([(0,2),(2,5),(5,8)])
sigFactory.Init()
print sigFactory.GetSigSize()

Generate.Gen2DFingerprint(mol,sigFactory)

==

MinimalFeatures.fdef comes from $RDBASE/Docs/Book/data/MinimalFeatures.fdef

Here's the exception

===

  File ./test2.py, line 18, in module
Generate.Gen2DFingerprint(mol,sigFactory)
  File
/Users/walters/software/RDKIT/2013_09_01/rdkit-master/rdkit/Chem/Pharm2D/Generate.py,
line 154, in Gen2DFingerprint
_ShortestPathsMatch(match,perm,sig,dMat,sigFactory)
  File
/Users/walters/software/RDKIT/2013_09_01/rdkit-master/rdkit/Chem/Pharm2D/Generate.py,
line 69, in _ShortestPathsMatch
idx = sigFactory.GetBitIdx(featureSet,dist,sortIndices=False)
  File
/Users/walters/software/RDKIT/2013_09_01/rdkit-master/rdkit/Chem/Pharm2D/SigFactory.py,
line 248, in GetBitIdx
raise IndexError,'distance bin not found: feats: %s; dists=%s; bins=%s;
scaffolds: %s'%(fams,dists,self._bins,self._scaffolds)
IndexError: distance bin not found: feats: ['Acceptor', 'Aromatic',
'Donor']; dists=[1, 5, 1]; bins=[(0, 2), (2, 5), (5, 8)]; scaffolds: [0,
[(0,), (1,), (2,)], 0, [(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (0, 1,
2), (0, 2, 1), (0, 2, 2), (1, 0, 0), (1, 0, 1), (1, 0, 2), (1, 1, 0), (1,
1, 1), (1, 1, 2), (1, 2, 0), (1, 2, 1), (1, 2, 2), (2, 0, 1), (2, 0, 2),
(2, 1, 0), (2, 1, 1), (2, 1, 2), (2, 2, 0), (2, 2, 1), (2, 2, 2)], 0]

==

Can someone tell me what I'm doing wrong? At first I thought I just needed
to increase the maximum of the last bin boundary, but that doesn't seem to
do it.

Thanks in advance,

Pat
--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] random compound

2013-07-04 Thread Patrick Walters
I'm not an expert on this, but I think this function just randomizes the
order of the atoms in a molecule.  Generating a random molecule for a
particular molecular formula that is consistent with rules of valence is
kind of tricky.  If you're interested in doing this sort of thing, you may
want to look at the work of Jean-Louis Reymond.

http://reymond.dcb.unibe.ch/

Pat


On Thu, Jul 4, 2013 at 4:31 PM, Yingfeng Wang ywang...@gmail.com wrote:

 This is the case I tested.

 from rdkit import Chem
 from rdkit.Chem import Randomize

 mymol =
 Chem.MolFromInchi(InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14))
 rmol  = Chem.Randomize.RandomizeMol(mymol)
 Chem.MolToInchi(rmol)

 Finally, I got the same structure.

 InChI=1S/C10H9N3O/c1-7-11-10(14)9(13-12-7)8-5-3-2-4-6-8/h2-6H,1H3,(H,11,12,14)


 On Thu, Jul 4, 2013 at 3:35 PM, Yingfeng Wang ywang...@gmail.com wrote:

 Dear Greg,

 Could you please help me to check whether function RandomizeMol(mol) is
 able to generate a random compound?


 http://www.rdkit.org/docs/api/rdkit.Chem.Randomize-module.html#RandomizeMol

 If so, could you please give me more details?

 Thanks.


 On Tue, Jul 2, 2013 at 11:22 AM, Greg Landrum greg.land...@gmail.comwrote:


 On Tue, Jul 2, 2013 at 5:20 PM, Yingfeng Wang ywang...@gmail.comwrote:

 If I have the formula, is RDKit able to generate a random compound
 based on the given formula?


 It's not, but it is kind of an interesting problem to think about.[1]

 -greg
 [1] Note: this is absolutely *not* me saying that I'm going to do it. :-)





 --
 This SF.net email is sponsored by Windows:

 Build for Windows Store.

 http://p.sf.net/sfu/windows-dev2dev
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New module for RDKit - PANDAS integration

2013-04-22 Thread Patrick Walters
I just started playing around with the Pandas module, this is very cool
stuff.  Thanks so much Nikolas for the contribution.  I definitely owe you
a beer at the UGM.  It might be worth noting that the you need to install
PIL in order to use the Pandas module.  Everything will install without a
problem, but you'll get an exception like this when you try to print a
dataframe without PIL.

File /Users/walters/python/RDKIT_2013_04_21/rdkit/sping/PIL/pidPIL.py,
line 33, in module
import Image, ImageFont, ImageDraw

Best,

Pat



On Sun, Apr 21, 2013 at 5:00 PM, Taka Seri serit...@gmail.com wrote:

 Dear Greg.

 Thank you your quick reply !
 The modified version was worked without AvalonTools .
 That's nice tool .
 I appreciate your kindness.

 Takayuki

 2013/4/22 Greg Landrum greg.land...@gmail.com

 Dear Takayuki,

 On Sun, Apr 21, 2013 at 1:30 PM, Taka Seri serit...@gmail.com wrote:
 
  I'm interested in this work
  I want to use PandasTools.
  But I got error message, ImportError: cannot import name
 pyAvalonTools.

 I just checked in a modified version that will work when the avalon
 tools are not installed.

 If you want to install the avalon tools anyway, there's information
 below that shows how:

 
  So, I tried to rebuild RDKit like this.
  $ cmake -D RDK_BUID_AVALON_SUPPORT=ON
  But build was failed.
 
  -- Configuring done
  CMake Error at Code/cmake/Modules/RDKitUtils.cmake:35 (add_library):
Cannot find source file:
 
  /common/layout.c
 
Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm
 .hpp
.hxx .in .txx
  Call Stack (most recent call first):
External/AvalonTools/CMakeLists.txt:43 (rdkit_library)
 
  If anyone who has suggestion, please help me.

 You need to tell it where to find the source for the avalon tools.

 - Download the source from here:

 http://sourceforge.net/projects/avalontoolkit/files/AvalonToolkit_1.1_beta/AvalonToolkit_1.1_beta.source.tar/download

 - Create an avalon tools directory somewhere, for example in
 /usr/local/src/avalontools.
 - Extract the tar file in that directory.
 - Run cmake as follows:
 cmake -DAVALONTOOLS_DIR=/usr/local/src/avalontools/SourceDistribution
 -DRDK_BUILD_AVALON_SUPPORT=ON

 Best,
 -greg




 --
 Precog is a next-generation analytics platform capable of advanced
 analytics on semi-structured data. The platform includes APIs for building
 apps and a phenomenal toolset for data science. Developers can use
 our toolset for easy data analysis  visualization. Get a free account!
 http://www2.precog.com/precogplatform/slashdotnewsletter
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] non-smallest rings

2013-01-22 Thread Patrick Walters
If you're just looking for 6 membered rings, you can define a SMARTS that
matches 6 membered rings like this *1~*~*~*~*~*1.   You can also use this
approach to identify all rings (at least those within reason).  You can use
an expression like this
[*1+string.join([*~]*x,)+*1 for x in range(1,19)]
to generate SMARTS for all rings with size 3 to 20.  Now you can match
these to your molecule and get all of the rings (example below).

import string
from rdkit import Chem

class RingFinder:
def __init__(self):
self.ringSmartsList = [*1+string.join([*~]*x,)+*1 for x in
range(1,19)]
self.ringPatList = [(x.count(*),Chem.MolFromSmarts(x)) for x in
self.ringSmartsList]

def findAllRings(self,mol):
ringList = []
for size,pat in self.ringPatList:
for match in mol.GetSubstructMatches(pat):
ringList.append([size,match])
return ringList

ringFinder = RingFinder()
smiles = COc1ccc(cc1O[C@H]1C[C@@H]2CC[C@H]1C2)C1CNC(=O)NC1
mol = Chem.MolFromSmiles(smiles)
print ringFinder.findAllRings(mol)

If you run this you'll get two 5 membered rings and 3 six membered rings
for the molecule above.

Pat


On Mon, Jan 21, 2013 at 10:13 AM, Paul Emsley pems...@mrc-lmb.cam.ac.ukwrote:


 I am making heavy weather of the following problem - and am wondering if I
 am missing something (such as a useful RDKit function).

 I am working on this beasty (as an example):

 http://www.rcsb.org/pdb/**ligand/ligandsummary.do?hetId=**0CPhttp://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=0CP

 COc1ccc(cc1O[C@H]1C[C@@H]2CC[**C@H]1C2)C1CNC(=O)NC1

 which has a norbornane substituent. I am trying to prepare input for a
 downstream program that needs to know if the norbornane atoms are in a
 6-membered ring [1].  RingInfo gives me the 2 5-membered rings.  I am
 strugging to make use of that information to find 6-membered rings.  I have
 been using makeRingNeighborMap() and pickFusedRings().  Am I missing an
 RDKit function that finds all rings?

 Cheers,

 Paul.


 [1] actually, all atoms but it is the norbornane atoms with which I
 struggle.




 --
 Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
 MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
 with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
 MVPs and experts. SALE $99.99 this month only -- learn more at:
 http://p.sf.net/sfu/learnmore_122412
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SLN Parse Errors

2013-01-03 Thread Patrick Walters
Hi All,

I've been trying to use RDKit to parse the SLN queries in a recent paper
from Jonathan Baell at Monash
http://pubs.acs.org/doi/abs/10.1021/ci300461a

RDKit is able to successfully parse most of the queries, but is unable to
handle 18 of 539.  It looks like the problem is with the NOT construct.
 Has anyone run into this?  Is anyone aware of a workaround?

Thanks,

Pat

HetC(=O)O-[!R]C[8]:Hev(Any[NOT=O,S[TAC=2],C[TAC=4],N[TAC=3]]):Hev:Hev(Any[IS=Hal,C#N,C(F)(F)F,S(=O)=O,C(=O)NOT=C(=O)OH]):Hev:Hev(Any[NOT=O,S[TAC=2],C[TAC=4],N[TAC=3]]):@8
HetC(=O)O-[!R]C[8]:Hev(Any[IS=Hal,C#N,C(F)(F)F,S(=O)=O,C(=O)NOT=C(=O)OH]):Hev:Hev(Any[NOT=O,S[TAC=2],C[TAC=4],N[TAC=3]]):Hev:Hev(Any[NOT=O,S[TAC=2],C[TAC=4],N[TAC=3]]):@8
CCH=[!R]C(Any[IS=H,C])Any[IS=C#N,C(=O)NOT=C(=O)Any[IS=N,O]]
Any[IS=C,O]CH=[!R]C(Any[IS=C(=O),S(=O),C#N,Hal,C(Hal)(Hal)HalNOT=C(=O)OH])Any[IS=C(=O),S(=O),C#NNOT=C(=O)OH]
C[1](C(=O)OC[5]:C:C:C:C:C:@5CH
=@1)Any[IS=C(=O),C(=S),S(=O),C#N,Hal,C(Hal)(Hal)HalNOT=C(=O)OH]
C[1](=CHOC[5]:C:C:C:C:C:@5C
@1=O)Any[IS=C(=O),C(=S),S(=O),C#N,Hal,C(Hal)(Hal)HalNOT=C(=O)OH]
C[1]:N:Any(Any[IS=Hal,S(=O)(=O)C]):Any:Any(Any[IS=Hal,C(C)=NO,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):Any:@1
C[1]:N:Any(Any[IS=Hal,C(C)=NO,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):Any:Any(Any[IS=Hal,S(=O)(=O)C]):Any:@1
C[1]:N:Any(Any[IS=Hal,S(=O)(=O)C]):Any(Any[IS=Hal,C(C)=NO,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):Any:Any:@1
C[1]:N:Any(Any[IS=Hal,S(=O)(=O)C]):Any:Any:Any(Any[IS=Hal,C(C)=NO,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):@1
C[1](Any[IS=Hal,C(C)=NO,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):N:Any(Any[IS=Hal,S(=O)(=O)C]):Any(Any[NOT=N]):Any(Any[NOT=N]):Any:@1
C[1]:N:Any:Any(Any[IS=Hal,C(C)=NO,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):Any(Any[IS=Hal,S(=O)(=O)C]):Any:@1
C[1](Any[NOT=N,O]):C(Any[IS=Hal,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):C(Any[IS=Hal,S(=O)(=O)C,C[r](=O)NC]):C(Any[NOT=N,O]):C(Any[NOT=N,O]):C(Any[IS=Hal,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):@1
C[1](Any[IS=Hal,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):C(Any[IS=Hal,S(=O)(=O)C,C[r](=O)NC]):C(Any[IS=Hal,C#N,C(=O),C(F)(F)F,S(=O)=ONOT=C(=O)OH]):C(Any[NOT=N,O]):C(Any[NOT=N,O]):C(Any[NOT=N,O]):@1
N(Any[IS=H,C[TAC=4]NOT=C[TAC=4]-[R]C[TAC=4]N])(Any[IS=H,C[TAC=4]NOT=C[TAC=4]-[R]C[TAC=4]N])(Any[IS=H,C[TAC=4]NOT=C[TAC=4]-[R]C[TAC=4]N])max=1
Any[IS=H,CNOT=C=O]N[!r](Any[IS=H,CNOT=C=O])C(=O)Cmax=2
Hev[!rNOT=NC(=O)NC(=O)]Hev[!rNOT=NC(=O)NC(=O)]Hev[!rNOT=NC(=O)NC(=O)]Hev[!rNOT=NC(=O)NC(=O)]Hev[!rNOT=NC(=O)NC(=O)]Hev[!rNOT=NC(=O)NC(=O)]Hev[!rNOT=NC(=O)NC(=O)]Hev[!rNOT=NC(=O)NC(=O)]
Hev[!rNOT=C=O,S(=O)(=O),N*S(=O),N*(C=O)]Hev[!rNOT=C=O,S(=O)(=O),N*S(=O),N*(C=O)]Hev[!rNOT=C=O,S(=O)(=O),N*S(=O),N*(C=O)]Hev[!rNOT=C=O,S(=O)(=O),N*S(=O),N*(C=O)]Hev[!rNOT=C=O,S(=O)(=O)]Any[IS=CH3,OH,NH2,N(CH3)CH3]
--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122712___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss