Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-28 Thread JP Ebejer
Hi Paolo!

Nice to hear from you -- and thanks for the lightning-fix+working example.
Very helpful as usual.  (I don't imagine you need me to open a github issue
on this, but I'd be happy to if you think that is helpful/want to keep
a record).

Any thoughts on whether it is useful to reionize after neutralizing charges
in the pipeline above?

Many thanks,

On Thu, 24 Jun 2021 at 18:58, Paolo Tosco 
wrote:

> Hi JP,
>
> the problem is caused by the reaction SMARTS that standardizes pyridine
> *N*-oxides being not very specific and also hitting your molecule, which
> is not actually an *N*-oxide but rather a *N*-hydroxypyridinium ion.
> I will submit a PR to fix the reaction pattern; in the meantime you can
> fix the problem by loading a custom list of normalization reaction SMARTS
> as shown in this gist:
>
> https://gist.github.com/ptosco/2b19142ff8fd6afdfee12836cec73d4f
>
> HTH, cheers
> p.
>
> On Thu, Jun 24, 2021 at 11:40 AM JP Ebejer 
> wrote:
>
>> Apologies I took my sweet time to reply, I went down the standardization
>> rabbit-hole and went through most of the material (thanks Matthew and
>> Francois, but also links from other notebooks).  The recording of the
>> OpenScience session is excellent and crystal clear as usual Greg.  I
>> enjoyed that.
>>
>> I have collated code to do the standardization as follows (I am putting
>> this here, for when my future self searches this list for the same thing in
>> 6 years time*):
>>
>> 0. Cleanup
>> 1. FragmentParent
>> 2. Uncharge
>> 3. Canonicalize Tautomer
>>
>> My only question left, is whether I should reionize between steps 2 and
>> 3.  What do you think?  My opinion is, probably, that there is no harm in
>> doing so (so I should do it).  Earlier, Greg said that cleanup does
>> reionization, but perhaps it is worth redoing after the uncharge step?  Or
>> is this just a waste of CPU cycles?  Any thoughts?
>>
>> Also, there is something slightly weird going on.  A (successfully)
>> sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when
>> passed to Cleanup(...) starts spitting out can't kekulize errors.  I have
>> created a jupyter notebook to highlight this;
>> https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b.
>> Any ideas what is going on?  IMHO cleanup should not choke on sanitized
>> (correct) molecules.  Is there a way to catch when these errors happen?  As
>> a bonus, FragmentParent(...) on the original sanitized molecule also
>> exhibits this unexpected behaviour (not shown in the notebook). Could this
>> be because it's doing an internal cleanup?
>>
>> * The exact code is here:
>> https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/
>>
>>
>>
>>
>> On Fri, 18 Jun 2021 at 15:08, Greg Landrum 
>> wrote:
>>
>>> Hi JP,
>>>
>>> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer 
>>> wrote:
>>>
>>>>
>>>> I am trying to standardize(/normalize?) some molecules from different
>>>> sources, to generate a set of descriptors for them.  I have done this a
>>>> number of times, and each time I find the process slightly confusing.  I
>>>> have the following questions please, if you don't mind:
>>>>
>>>>
>>> As a starting point in case you want more information about this topic.
>>> I did a webinar/presentation on this topic earlier this year as part of
>>> the RSC Open Science series.
>>>
>>> My materials for that are in github:
>>> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
>>> and there's a youtube recording:
>>> https://www.youtube.com/watch?v=eWTApNX8dJQ
>>>
>>>
>>>
>>>> 1.  What is the relation between molvs and rdkit (I remember there was
>>>> an integration project between the two a while back).  When I call
>>>> rdMolStandardize does rdkit code or molvs code get called?  The github repo
>>>> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
>>>>
>>>
>>> When you call operations from rdMolStandardize it invokes RDKit code.
>>> That code was started by Susan Leung as a Google Summer of Code project and
>>> we have continued to improve and expand that code since then.
>>>
>>>
>>>> 2.  What is the difference between standardization and normalization of
>>>> a molecule?  Does one automatically imply the other or should these two
>>>> processes b

Re: [Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-24 Thread JP Ebejer
Apologies I took my sweet time to reply, I went down the standardization
rabbit-hole and went through most of the material (thanks Matthew and
Francois, but also links from other notebooks).  The recording of the
OpenScience session is excellent and crystal clear as usual Greg.  I
enjoyed that.

I have collated code to do the standardization as follows (I am putting
this here, for when my future self searches this list for the same thing in
6 years time*):

0. Cleanup
1. FragmentParent
2. Uncharge
3. Canonicalize Tautomer

My only question left, is whether I should reionize between steps 2 and 3.
What do you think?  My opinion is, probably, that there is no harm in doing
so (so I should do it).  Earlier, Greg said that cleanup does reionization,
but perhaps it is worth redoing after the uncharge step?  Or is this just a
waste of CPU cycles?  Any thoughts?

Also, there is something slightly weird going on.  A (successfully)
sanitized mol from SMILES "Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O", which when
passed to Cleanup(...) starts spitting out can't kekulize errors.  I have
created a jupyter notebook to highlight this;
https://nbviewer.jupyter.org/gist/jp-um/7cd80faa794b3545e8aedf838a1e7f6b.
Any ideas what is going on?  IMHO cleanup should not choke on sanitized
(correct) molecules.  Is there a way to catch when these errors happen?  As
a bonus, FragmentParent(...) on the original sanitized molecule also
exhibits this unexpected behaviour (not shown in the notebook). Could this
be because it's doing an internal cleanup?

* The exact code is here:
https://bitsilla.com/blog/2021/06/standardizing-a-molecule-using-rdkit/




On Fri, 18 Jun 2021 at 15:08, Greg Landrum  wrote:

> Hi JP,
>
> On Thu, Jun 17, 2021 at 8:37 PM JP Ebejer  wrote:
>
>>
>> I am trying to standardize(/normalize?) some molecules from different
>> sources, to generate a set of descriptors for them.  I have done this a
>> number of times, and each time I find the process slightly confusing.  I
>> have the following questions please, if you don't mind:
>>
>>
> As a starting point in case you want more information about this topic.
> I did a webinar/presentation on this topic earlier this year as part of
> the RSC Open Science series.
>
> My materials for that are in github:
> https://github.com/greglandrum/RSC_OpenScience_Standardization_202104
> and there's a youtube recording:
> https://www.youtube.com/watch?v=eWTApNX8dJQ
>
>
>
>> 1.  What is the relation between molvs and rdkit (I remember there was an
>> integration project between the two a while back).  When I call
>> rdMolStandardize does rdkit code or molvs code get called?  The github repo
>> for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
>>
>
> When you call operations from rdMolStandardize it invokes RDKit code. That
> code was started by Susan Leung as a Google Summer of Code project and we
> have continued to improve and expand that code since then.
>
>
>> 2.  What is the difference between standardization and normalization of a
>> molecule?  Does one automatically imply the other or should these two
>> processes be both run on a molecule?
>>
>
> I would be surprised if there were universal agreement about this, but
> when I use the terms normalization typically refers to making changes to
> molecules to get "functional groups" (loosely defined) into a normal form,
> while standardization is getting the molecules into a standard form in
> preparation for doing something with them. Normalization is often part of
> standardization, standardization can also include things like stripping
> salts, neutralizing molecules, etc.
> Normalization involves applying transformations like converting -N(=O)=O
> to -[N+](=O)[O-] and converting -[S+2]([O-])[O-] to -S(=O)=O;
>
>
>> 3.  Specifically, what is the difference between
>> rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
>> rdMolStandardize.Normalize(mol).  Should I call any of these manually three
>> after I run "standardization/cleaning operations" such as uncharging,
>> reionizing, etc?
>>
>
> SanitizeMol() is different from the others: it does a small amount of
> normalization - fixing groups like nitro which are commonly drawn in a
> hypervalent state but which can be represented in a charge-separated form
> without needing weird valences - and some validation - rejecting molecules
> with atoms that have non-physical valences, rejecting molecules that cannot
> be kekulized - and a bunch of chemistry perception - ring finding,
> calculating valences, finding aromatic systems, etc.
>
> rdMolStandardize.Normalize() applies a bunch of standard transformations
> to a molecule.
>
> rdMolSta

[Rdkit-discuss] RDKit molecule standardization/normalization protocol

2021-06-17 Thread JP Ebejer
Dear all,

I am trying to standardize(/normalize?) some molecules from different
sources, to generate a set of descriptors for them.  I have done this a
number of times, and each time I find the process slightly confusing.  I
have the following questions please, if you don't mind:

1.  What is the relation between molvs and rdkit (I remember there was an
integration project between the two a while back).  When I call
rdMolStandardize does rdkit code or molvs code get called?  The github repo
for molvs hasn't been updated in a while (2 yrs), but rdMolStandardize has.
2.  What is the difference between standardization and normalization of a
molecule?  Does one automatically imply the other or should these two
processes be both run on a molecule?
3.  Specifically, what is the difference between
rdMolStandardize.Cleanup(mol), Chem.SanitizeMol(mol),
rdMolStandardize.Normalize(mol).  Should I call any of these manually three
after I run "standardization/cleaning operations" such as uncharging,
reionizing, etc?
4.  I understand what uncharge does, but what does reionizer do?
5.  Is there a way to chain operations together
standardize+ChooseLargestFragment+uncharge+normalize (am not sure the order
makes sense here), other than creating a class instance for each calling
the method, returning a new mol and using this mol in the next operation?

Apologies for the many questions.  Have I missed the documentation about
this?  I have found some excellent examples here:
https://github.com/susanhleung/rdkit/blob/dev/GSOC2018_MolVS_Integration/rdkit/Chem/MolStandardize/tutorial/MolStandardize.ipynb
(thanks!).  This is not exactly a cleaning pipeline, but still quite
helpful to understand these methods.

Many thanks,
JP
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Sanitization of molecules breaks something for conformer gen (Possible Bug?)

2018-11-14 Thread JP
> It would be great if you (or your student) could create a github issue
for this, I will go ahead and take a look.

I will, thanks for looking into this.


On Thu, 15 Nov 2018 at 06:35, Greg Landrum  wrote:

> Hi JP,
>
> I am able to reproduce this.
> It's not directly connected to the standardization itself, since a
> standardized molecule works fine with the embedding:
>
> In [8]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@
> @H]([C@@H]([C@H](O3)CO)O)O')
>
> In [11]: nmol = rdMolStandardize.Cleanup(omol)
> [06:27:57] Initializing MetalDisconnector
> [06:27:57] Running MetalDisconnector
> [06:27:57] Initializing Normalizer
> [06:27:57] Running Normalizer
>
> In [12]: nomh = Chem.AddHs(nmol)
>
> In [13]: AllChem.EmbedMolecule(nomh)
> Out[13]: 0
>
>
> The actual problem is connected to the way the RDKit interprets the smiles
> that it generates for the input molecule (no standardization required):
>
> In [14]: omol = Chem.MolFromSmiles('C1=NN(C2=NC=NC(=O)[C@@H]21)[C@H]3[C@
> @H]([C@@H]([C@H](O3)CO)O)O')
>
> In [15]: smi = Chem.MolToSmiles(omol)
>
> In [16]: nmol = Chem.MolFromSmiles(smi)
>
> In [17]: hnmol = Chem.AddHs(nmol)
>
> In [18]: AllChem.EmbedMolecule(hnmol)
> Out[18]: -1
>
> In [19]: print(smi)
> O=C1N=CN=C2[C@H]1C=NN2[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O
>
>
> It would be great if you (or your student) could create a github issue for
> this, I will go ahead and take a look.
>
>
> Best,
> -greg
>
>
>
>
> On Wed, Nov 14, 2018 at 4:17 PM JP  wrote:
>
>> Dear all,
>>
>> Using the latest/greatest 2018.09.1.
>>
>> I have an MSc student who is working on some targets in DUDE.
>>
>> If we take some specific molecules from there (e.g. "C1=NN(C2=NC=NC(=O)[C@
>> @H]21)[C@H]3[C@@H]([C@@H]([C@H](O3)CO)O)O"), sanitize them using r
>> dMolStandardize.StandardizeSmiles(smiles), and we then generate
>> conformers (with EmbedMultipleConfs and ETKDGv2) -- the conformer
>> generation step hangs.  If we omit the sanitization step, conf. gen. works
>> fine as expected.
>>
>> Any clues as to what may be causing this?  My bet in the above example is
>> something to do with chirality, i.e. [C@@H].  Any hint on a possible
>> solution?
>>
>> I'd also like to thank whoever it was who worked on integrating the
>> cleaning code (molvs) into RDKit.  This is such a critical, common task -
>> great to have something out of the box to do it.
>>
>> We have an example jupyter notebook which highlights the problem here:
>> https://nbviewer.jupyter.org/gist/jp-um/528a300f6b46251377f3129576b61616
>>
>> Also, a list of other molecules which exhibit this same behaviour (just
>> the ones we came across, as we only looked at a small subset of DUDE
>> targets):
>>
>> Adenosine A2a receptor (GPCR)/ 28499( C1=CC2=c3nn/c(=N\N=C\[C@
>> @H]4C=CC=N4)[nH]c3=N[C@@H]2C=C1 )
>> Adenosine A2a receptor (GPCR)/ 9903( 
>> [NH3+]NCC1=C(C(=O)[O-])[C@H]2C=Cc3cnc(Cl)cc3C2=N1
>> )
>> Adenosine A2a receptor (GPCR)/ 23728( 
>> C1=C[C@@H]2N=CC=C2C=C1[C@@H]1N=CN=C1c1c1
>> )
>> Progesterone Receptor/ 14194(
>> Cc1ccc(S(=O)(=O)C(Sc2c2)=S=NC23CC4CC(CC(C4)C2)C3)cc1 )
>> Progesterone Receptor/ 14821( Cc1ccc(N2C(=O)[C@@H]3[C@@H]4C[C@H]5[C@H
>> ](O[C@@]2(C(C)C)[C@@H]53)[C@@H]4O)c(C)c1 )
>> Adenosine A2a receptor (GPCR)/ 4014(
>> CCc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(OC)ccc3OC)=N[C@@H]1[NH+]=2 )
>> Progesterone Receptor/ 61(
>> CC(C)=C/C=C1\Oc2ccc(F)cc2-c2ccc3c(c21)C(C)=CC(C)(C)N3 )
>> Progesterone Receptor/ 67(
>> CC1=CC(C)(C)Nc2ccc3c(c21)C(=C1SCCCS1)Oc1ccc(F)cc1-3 )
>> Adenosine A2a receptor (GPCR)/ 29753( 
>> Cc1ccc2c(c1)=C(CCN1C=C[C@H]3C(=CNc4nc(C)nn43)C1=O)C[NH+]=2
>> )
>> Adenosine A2a receptor (GPCR)/ 14471(
>> CCc1nn2c(c1-c1ccc(Cl)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@@H]12 )
>> Adenosine A2a receptor (GPCR)/ 2411(
>> Cc1ccc2c(c1)=C1N=NC(N/N=C/c3cc(O)c(O)c(Br)c3)=[NH+][C@H]1[NH+]=2 )
>> HIVPR/ 21585( 
>> O=[N+]([O-])c1ccc(N/N=C2/c3cc(Cl)ccc3N3c4c4[C@@H]2N3c2c2)c([N+](=O)[O-])c1
>> )
>> Adenosine A2a receptor (GPCR)/ 13221(
>> Cc1nn2c(c1-c1ccc(F)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@H]12 )
>> Leukotriene A4 hydrolase (Protease)/ 8094( C[C@@H]([NH3+])[C@@H]1[C@H
>> ]2CC[C@H]3C[C@H](C2)C[C@@H]31 )
>> Leukotriene A4 hydrolase (Protease)/ 4803( CC1=N[C@@H]2C=C(OC[C@
>> @H]3CCN(C(=O)OC(C)(C)C)C3)C=C[C@H]2S1 )
>> Adenosine A2a receptor (GPCR)/ 8106(
>> O=CC1=CN=C2C=CC(c3([N+](=O)[O-])c3)=C[C@H]12 )
>> Thymidine kinase/ 2696( CC(=O)O[C@H]1CC[C@@]2(COS(C)(=O)=O)[C@@H]3

[Rdkit-discuss] Sanitization of molecules breaks something for conformer gen (Possible Bug?)

2018-11-14 Thread JP
Dear all,

Using the latest/greatest 2018.09.1.

I have an MSc student who is working on some targets in DUDE.

If we take some specific molecules from there (e.g. "C1=NN(C2=NC=NC(=O)[C@
@H]21)[C@H]3[C@@H]([C@@H]([C@H](O3)CO)O)O"), sanitize them using r
dMolStandardize.StandardizeSmiles(smiles), and we then generate conformers
(with EmbedMultipleConfs and ETKDGv2) -- the conformer generation step
hangs.  If we omit the sanitization step, conf. gen. works fine as expected.

Any clues as to what may be causing this?  My bet in the above example is
something to do with chirality, i.e. [C@@H].  Any hint on a possible
solution?

I'd also like to thank whoever it was who worked on integrating the
cleaning code (molvs) into RDKit.  This is such a critical, common task -
great to have something out of the box to do it.

We have an example jupyter notebook which highlights the problem here:
https://nbviewer.jupyter.org/gist/jp-um/528a300f6b46251377f3129576b61616

Also, a list of other molecules which exhibit this same behaviour (just the
ones we came across, as we only looked at a small subset of DUDE targets):

Adenosine A2a receptor (GPCR)/ 28499( C1=CC2=c3nn/c(=N\N=C\[C@
@H]4C=CC=N4)[nH]c3=N[C@@H]2C=C1 )
Adenosine A2a receptor (GPCR)/ 9903(
[NH3+]NCC1=C(C(=O)[O-])[C@H]2C=Cc3cnc(Cl)cc3C2=N1
)
Adenosine A2a receptor (GPCR)/ 23728(
C1=C[C@@H]2N=CC=C2C=C1[C@@H]1N=CN=C1c1c1
)
Progesterone Receptor/ 14194(
Cc1ccc(S(=O)(=O)C(Sc2c2)=S=NC23CC4CC(CC(C4)C2)C3)cc1 )
Progesterone Receptor/ 14821( Cc1ccc(N2C(=O)[C@@H]3[C@@H]4C[C@H]5[C@H](O[C@
@]2(C(C)C)[C@@H]53)[C@@H]4O)c(C)c1 )
Adenosine A2a receptor (GPCR)/ 4014(
CCc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(OC)ccc3OC)=N[C@@H]1[NH+]=2 )
Progesterone Receptor/ 61(
CC(C)=C/C=C1\Oc2ccc(F)cc2-c2ccc3c(c21)C(C)=CC(C)(C)N3 )
Progesterone Receptor/ 67(
CC1=CC(C)(C)Nc2ccc3c(c21)C(=C1SCCCS1)Oc1ccc(F)cc1-3 )
Adenosine A2a receptor (GPCR)/ 29753(
Cc1ccc2c(c1)=C(CCN1C=C[C@H]3C(=CNc4nc(C)nn43)C1=O)C[NH+]=2
)
Adenosine A2a receptor (GPCR)/ 14471(
CCc1nn2c(c1-c1ccc(Cl)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@@H]12 )
Adenosine A2a receptor (GPCR)/ 2411(
Cc1ccc2c(c1)=C1N=NC(N/N=C/c3cc(O)c(O)c(Br)c3)=[NH+][C@H]1[NH+]=2 )
HIVPR/ 21585( 
O=[N+]([O-])c1ccc(N/N=C2/c3cc(Cl)ccc3N3c4c4[C@@H]2N3c2c2)c([N+](=O)[O-])c1
)
Adenosine A2a receptor (GPCR)/ 13221(
Cc1nn2c(c1-c1ccc(F)cc1)NC=C1C(=O)N(c3ncn[nH]3)C=C[C@H]12 )
Leukotriene A4 hydrolase (Protease)/ 8094( C[C@@H]([NH3+])[C@@H]1[C@H
]2CC[C@H]3C[C@H](C2)C[C@@H]31 )
Leukotriene A4 hydrolase (Protease)/ 4803( CC1=N[C@@H]2C=C(OC[C@
@H]3CCN(C(=O)OC(C)(C)C)C3)C=C[C@H]2S1 )
Adenosine A2a receptor (GPCR)/ 8106(
O=CC1=CN=C2C=CC(c3([N+](=O)[O-])c3)=C[C@H]12 )
Thymidine kinase/ 2696( CC(=O)O[C@H]1CC[C@@]2(COS(C)(=O)=O)[C@@H]3CC[C@
]4(C)C5=C(O)C(=O)CO[C@@]4(CC5)[C@@H]3CC[C@]2(O)C1 )
Adenosine A2a receptor (GPCR)/ 5075(
Cc1ccc2c(c1)=C1N=[NH+]C(N/N=C/c3cc(Br)c(O)c(O)c3Br)=N[C@@H]1[NH+]=2 )
Adenosine A2a receptor (GPCR)/ 22643( COc1cc([N+](=O)[O-])ccc1NC(=O)[C@
@H]1[C@@H]2C[C@@H]3OC(=O)[C@@H]1[C@@H]3C2 )
Progesterone Receptor/ 182(
CC1=CC(C)(C)Nc2ccc3c(c21)/C(=C/C1C1)Oc1ccc(F)cc1-3 )

Many thanks for your attention, looking forward to hear any insights about
this issue.

JP
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit 2018.03.1, python 3, Postgresql and conda cocktail

2018-05-31 Thread JP
Dear all,

Long time, no type.

I am on Debian 9.4 (stretch) and I want to install RDKit latest and
greatest (2018.03.1 -- as I am after the excellent ETKDGv2), and the
Postgresql cartridge all in python3 (as python2 is for the damned! :-) ).

Reading the install instructions at http://www.rdkit.org/docs/Install.html,
the best way is to use conda (which I am).  But when I install conda
install -c rdkit rdkit-postgresql in the rdkit env, my rdkit version gets
thrown back in time and is downgraded to:

rdkit 2016.03.4   np111py35_1rdkit

(from 2018.03.1).

If I install conda package rdkit-postgresql95, the RDKit package installed
is 2017.09.2.0-hd1c72e7_1 (so stiill no ETKDGv2).  Is there a clean conda
solution for this?

If a conda solution is not available, I don't mind installing Postgresql
(10.4), Rdkit (2018.03) manually from source, I usually do this (I'm new to
conda).  But the problem then becomes how to tell RDKit to throw its
libraries in the python3.5 system installation ... so import rdkit finds
all the stuff in the py35 installation.

Is there something I am missing?  What would you recommend?

Thanks,
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit and Google Summer of Code 2018

2018-01-16 Thread JP
Joining the fray, +1 for MolVS

On 16 January 2018 at 16:00, Brian Cole  wrote:

> +1 to the MolVS project as well.
>
> Perhaps an easy bite-size project is to incorporate the open source mae
> parser code into core RDKit: https://github.com/schrodinger/maeparser
>
>
> On Mon, Jan 15, 2018 at 9:08 PM, Francois BERENGER <
> beren...@bioreg.kyushu-u.ac.jp> wrote:
>
>> On 01/16/2018 05:51 AM, Tim Dudgeon wrote:
>> > Incorporating and "industrialising" Matt's MolVS tautomer and
>> > standardizer code?
>> > http://molvs.readthedocs.io/en/latest/index.html
>>
>> If we can vote, I would vote for this one.
>>
>> > On 15/01/18 07:09, Greg Landrum wrote:
>> >> Dear all,
>> >>
>> >> We've been invited again to participate in the OpenChemistry
>> >> application for Google Summer of Code.
>> >>
>> >> In order to participate we need ideas for projects and mentors to go
>> >> along with them.
>> >>
>> >> The current list of RDKit ideas is being maintained here:
>> >> http://wiki.openchemistry.org/GSoC_Ideas_2018#RDKit_Project_Ideas
>> >>
>> >> (Note: at the point that I'm pressing "send", that's still a copy of
>> >> last year's project ideas).
>> >>
>> >> If you're willing to be a mentor (please ask me about the ~5
>> >> hours/week required here) or have ideas, please reply to this thread.
>> >>
>> >> Best,
>> >> -greg
>> >>
>> >>
>> >> 
>> --
>> >> Check out the vibrant tech community on one of the world's most
>> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >>
>> >>
>> >> ___
>> >> Rdkit-discuss mailing list
>> >> Rdkit-discuss@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> >
>> >
>> >
>> > 
>> --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >
>> >
>> >
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> >
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Docker with (latest) rdkit+jupyter

2017-11-22 Thread JP
Thanks Markus -- great pointer.

This kind of works for me now (just as you suggested -- install python 3.5
and everything in that environment).

FROM continuumio/miniconda3
MAINTAINER Greg Landrum 

ENV LANG C

# install and activate python 3.5 -- this is a requirement below
RUN conda create -n py35 python=3.5
ENV PATH /opt/conda/envs/py35/bin:$PATH

# install the RDKit:
RUN conda config --add channels  https://conda.anaconda.org/rdkit
# note including jupyter in this brings in rather a lot of extra stuff
RUN conda install --name py35 -y nomkl rdkit pandas cairo cairocffi jupyter

RUN mkdir /notebooks
CMD jupyter-notebook --ip="*" --no-browser --allow-root
--notebook-dir=/notebooks

On 21 November 2017 at 18:51, Markus Sitzmann 
wrote:

> Hi JP,
>
> From the Docker log you posted it is obvious that the build starts from
> the latest miniconda version which than will use python 3.6 as default,
> however one of the python packages still relies python 3.5.
>
> One thing you can try is to tell the conda install command in the docker
> script to go back to python 3.5 or create a python 3.5 based environment.
> Unfortunately I just don’t remember out of my head which option you have to
> use for this but you fill find it in the conda documentation.
>
> And as much I like the idea of conda, it is unfortunately one of the
> biggest troublemakers in my personal projects.
>
> Another point is, if you look for one of the recent post from Greg here on
> the list, there is another problem with the latest conda version you might
> run into.
>
>
> Markus
>
>
> -
> |  Markus Sitzmann
> |  markus.sitzm...@gmail.com
>
> On 21. Nov 2017, at 16:53, Tim Dudgeon  wrote:
>
> I've got some dockerfiles that might be worth a look.
> https://github.com/InformaticsMatters/docker_jupyter
>
> Not sure if they will help.
>
> Tim
>
>
>
> On 21/11/2017 15:25, JP wrote:
>
> Yo RDKitters,
>
> I am running a CADD workshop for a group of MSc students and would like to
> show them some some RDKit awesomeness.
>
> I thought the best way to do this is to use an rdkit enabled docker image
> + jupyter notebooks (they are comfortable with python).
>
> In preparation, I tried building the docker image from the docker file at
> https://github.com/rdkit/rdkit_containers/tree/master/docker/run_conda3
> but this fails on Ubuntu 16.04.3 LTS with the following error:
>
> $ docker build -t run_rdkit_conda https://raw.githubusercontent.
> com/rdkit/rdkit_containers/master/docker/run_conda3/Dockerfile
> Downloading build context from remote url: https://raw.githubusercontent.
> com/rdkit/rdkit_containers/master/docker/run_conda3/Dockerfile 357B
> Sending build context to Docker daemon  2.048kB
> Step 1/7 : FROM continuumio/miniconda3
> latest: Pulling from continuumio/miniconda3
> 85b1f47fba49: Pull complete
> 6b3cb0c49789: Pull complete
> fecb432dacf0: Pull complete
> f461f7e3890d: Pull complete
> Digest: sha256:604cda0c0be5d40cc26db31912d8b1
> b7276840a56544b846abef441b32d987fc
> Status: Downloaded newer image for continuumio/miniconda3:latest
>  ---> f700f7f570c7
> Step 2/7 : MAINTAINER Greg Landrum 
>  ---> Running in ad6a648c18ba
>  ---> 18e6d6093d5b
> Removing intermediate container ad6a648c18ba
> Step 3/7 : ENV PATH /opt/conda/bin:$PATH
>  ---> Running in e21cf8e5332f
>  ---> ddef65292068
> Removing intermediate container e21cf8e5332f
> Step 4/7 : ENV LANG C
>  ---> Running in efa12ef17f37
>  ---> 137d7e20350d
> Removing intermediate container efa12ef17f37
> Step 5/7 : RUN conda config --add channels  https://conda.anaconda.org/
> rdkit
>  ---> Running in 79566bf4b6e9
>  ---> 032965875391
> Removing intermediate container 79566bf4b6e9
> Step 6/7 : RUN conda install -y nomkl rdkit pandas cairo cairocffi jupyter
>  ---> Running in c5aa6417a63a
> Fetching package metadata .
> Solving package specifications: .
>
> UnsatisfiableError: The following specifications were found to be in
> conflict:
>   - cairocffi -> python 3.5* -> xz 5.0.5
>   - python 3.6*
> Use "conda info " to see the dependencies for each package.
>
> The command '/bin/sh -c conda install -y nomkl rdkit pandas cairo
> cairocffi jupyter' returned a non-zero code: 1
>
> Any ideas?
> JP
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge

[Rdkit-discuss] Docker with (latest) rdkit+jupyter

2017-11-21 Thread JP
Yo RDKitters,

I am running a CADD workshop for a group of MSc students and would like to
show them some some RDKit awesomeness.

I thought the best way to do this is to use an rdkit enabled docker image +
jupyter notebooks (they are comfortable with python).

In preparation, I tried building the docker image from the docker file at
https://github.com/rdkit/rdkit_containers/tree/master/docker/run_conda3 but
this fails on Ubuntu 16.04.3 LTS with the following error:

$ docker build -t run_rdkit_conda
https://raw.githubusercontent.com/rdkit/rdkit_containers/master/docker/run_conda3/Dockerfile
Downloading build context from remote url:
https://raw.githubusercontent.com/rdkit/rdkit_containers/master/docker/run_conda3/Dockerfile
   357B
Sending build context to Docker daemon  2.048kB
Step 1/7 : FROM continuumio/miniconda3
latest: Pulling from continuumio/miniconda3
85b1f47fba49: Pull complete
6b3cb0c49789: Pull complete
fecb432dacf0: Pull complete
f461f7e3890d: Pull complete
Digest:
sha256:604cda0c0be5d40cc26db31912d8b1b7276840a56544b846abef441b32d987fc
Status: Downloaded newer image for continuumio/miniconda3:latest
 ---> f700f7f570c7
Step 2/7 : MAINTAINER Greg Landrum 
 ---> Running in ad6a648c18ba
 ---> 18e6d6093d5b
Removing intermediate container ad6a648c18ba
Step 3/7 : ENV PATH /opt/conda/bin:$PATH
 ---> Running in e21cf8e5332f
 ---> ddef65292068
Removing intermediate container e21cf8e5332f
Step 4/7 : ENV LANG C
 ---> Running in efa12ef17f37
 ---> 137d7e20350d
Removing intermediate container efa12ef17f37
Step 5/7 : RUN conda config --add channels  https://conda.anaconda.org/rdkit
 ---> Running in 79566bf4b6e9
 ---> 032965875391
Removing intermediate container 79566bf4b6e9
Step 6/7 : RUN conda install -y nomkl rdkit pandas cairo cairocffi jupyter
 ---> Running in c5aa6417a63a
Fetching package metadata .
Solving package specifications: .

UnsatisfiableError: The following specifications were found to be in
conflict:
  - cairocffi -> python 3.5* -> xz 5.0.5
  - python 3.6*
Use "conda info " to see the dependencies for each package.

The command '/bin/sh -c conda install -y nomkl rdkit pandas cairo cairocffi
jupyter' returned a non-zero code: 1

Any ideas?
JP
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is python 2.6(.6) still supported?

2017-07-19 Thread JP
Thanks for the eye opener Greg!  Will investigate anaconda.  I have always
somewhat resisted ...

Have a good evening!  And thanks for the impressive <5min time_to_reply



On 19 July 2017 at 22:09, Greg Landrum  wrote:

> Hi JP,
>
> Python 2.6 is no longer supported (it's not supported by the python
> community either). My recommendation for using the RDKit on Centos6 (really
> on any system) would be to use anaconda python.
>
> -greg
>
>
>
> On Wed, Jul 19, 2017 at 10:00 PM, JP  wrote:
>
>>
>> Dear all,
>>
>> A quick question - is python 2.6.6 still supported (this is the default
>> python on my current Centos distrib)?
>>
>> I am trying to install THE RDKit 2017.03.3 on Centos 6.3 (for my sins)
>> and everything works fine (just needed to install boost-serialization as
>> well due to a weird linking error).  The problem is that the following
>> tests fail:
>>
>>   6 - pyBV (Failed)
>>  35 - pyDepictor (Failed)
>>  49 - pyChemReactions (Failed)
>>  65 - pyPartialCharges (Failed)
>>  98 - pyGraphMolWrap (Failed)
>> 107 - pyRanker (Failed)
>> 111 - pythonTestDirML (Failed)
>> 112 - pythonTestDirDataStructs (Failed)
>> 113 - pythonTestDirDbase (Failed)
>> 116 - pythonTestDirChem (Failed)
>>
>> On investigating further, this is due to the assertRaisesRegexp(...) and
>> assertIn(...) calls. which were added to the UnitTest class in 2.7 ... so
>> missing in mine.
>>
>> Thanks,
>> JP
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Is python 2.6(.6) still supported?

2017-07-19 Thread JP
Dear all,

A quick question - is python 2.6.6 still supported (this is the default
python on my current Centos distrib)?

I am trying to install THE RDKit 2017.03.3 on Centos 6.3 (for my sins) and
everything works fine (just needed to install boost-serialization as well
due to a weird linking error).  The problem is that the following tests
fail:

  6 - pyBV (Failed)
 35 - pyDepictor (Failed)
 49 - pyChemReactions (Failed)
 65 - pyPartialCharges (Failed)
 98 - pyGraphMolWrap (Failed)
107 - pyRanker (Failed)
111 - pythonTestDirML (Failed)
112 - pythonTestDirDataStructs (Failed)
113 - pythonTestDirDbase (Failed)
116 - pythonTestDirChem (Failed)

On investigating further, this is due to the assertRaisesRegexp(...) and
assertIn(...) calls. which were added to the UnitTest class in 2.7 ... so
missing in mine.

Thanks,
JP
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Installation woes: Ubuntu 16.04, Boost 1.61, and RDKit not living so peacefully together after all ...

2017-06-22 Thread JP
Just to give this closure on the list -- the problem was that I had static
libraries for boost (.a files).  Once I rebuilt boost as shared libraries
(.so) everything worked fine.

Thanks to Greg and Paolo for their help.

On 21 June 2017 at 11:02, Greg Landrum  wrote:

> Let's move this off the list for a bit until we get it resolved.
>
> Can you please do: "VERBOSE=1 make testReaction" and send the output just
> to me?
>
> Thanks,
> -greg
>
>
> On Wed, Jun 21, 2017 at 10:24 AM, JP  wrote:
>
>> Looks like it is finding the correct version of boost (and it is finding
>> boost serialize).
>>
>> /opt/rdkit/rdkit-Release_2017_03_2/build$ cmake -D
>> RDK_BUILD_INCHI_SUPPORT=ON -D BOOST_ROOT=/opt/boost_1_61_0/ -D
>> Boost_NO_SYSTEM_PATHS=ON ..
>> -- Could NOT find InChI in system locations (missing:  INCHI_LIBRARY
>> INCHI_INCLUDE_DIR)
>> CUSTOM_INCHI_PATH = /opt/rdkit/rdkit-Release_2017_03_2/External/INCHI-API
>> -- Found InChI software locally
>> -- Boost version: 1.61.0
>> -- Found the following Boost libraries:
>> --   python
>> -- Boost version: 1.61.0
>> -- Found the following Boost libraries:
>> --   thread
>> --   system
>> --   chrono
>> --   date_time
>> --   atomic
>> -- Boost version: 1.61.0
>> -- Found the following Boost libraries:
>> --   serialization
>> == Using strict rotor definition
>> == Updating Filters.cpp from pains file
>> == Done updating pains files
>> -- Boost version: 1.61.0
>> -- Found the following Boost libraries:
>> --   regex
>> -- Configuring done
>> -- Generating done
>> -- Build files have been written to: /opt/rdkit/rdkit-Release_2017_
>> 03_2/build
>>
>> But the error prevails.
>>
>> If I set up rdkit to use just the system wide boost (1.58), not my
>> specific install (1.61), rdkit builds successfully.  I do this by clearing
>> BOOST_ROOT, LD_LIBRARY_PATH, and removing the boost-related cmake flags.
>> So this is certainly an issue with wrong libraries being picked up.
>>
>>
>>
>>
>> On 21 June 2017 at 07:29, Greg Landrum  wrote:
>>
>>> did you build boost serialize?
>>>
>>> On Mon, Jun 19, 2017 at 12:03 PM, JP 
>>> wrote:
>>>
>>>> Hi Greg !
>>>>
>>>> Unfortunately that didn't help (I delete everything in my build
>>>> directory, then):
>>>>
>>>> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DBOOST_ROOT=/opt/boost_1_61_0/
>>>> -DBoost_NO_SYSTEM_PATHS=ON ..
>>>>
>>>> and make as usual.
>>>>
>>>> [ 62%] Linking CXX executable testReaction
>>>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference
>>>> to `boost::archive::text_iarchive_impl>>> chive>::load_override(boost::archive::class_name_type&)'
>>>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference
>>>> to 
>>>> `boost::archive::archive_exception::archive_exception(boost::archive::archive_exception
>>>> const&)'
>>>> collect2: error: ld returned 1 exit status
>>>> Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/build.make:116:
>>>> recipe for target 'Code/GraphMol/ChemReactions/testReaction' failed
>>>> make[2]: *** [Code/GraphMol/ChemReactions/testReaction] Error 1
>>>> CMakeFiles/Makefile2:4157: recipe for target
>>>> 'Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all' failed
>>>> make[1]: *** [Code/GraphMol/ChemReactions/C
>>>> MakeFiles/testReaction.dir/all] Error 2
>>>> Makefile:160: recipe for target 'all' failed
>>>> make: *** [all] Error 2
>>>>
>>>> Perhaps this isn't related to the system vs user-install, boost after
>>>> all?
>>>>
>>>>
>>>> On 19 June 2017 at 10:38, Greg Landrum  wrote:
>>>>
>>>>> If you have a system boost install that you do not want to use, you
>>>>> should be sure to add "-D Boost_NO_SYSTEM_PATHS=ON" to the cmake 
>>>>> arguments.
>>>>> This will (well, should) disable any usage of the system boost.
>>>>>
>>>>> -greg
>>>>>
>>>>>
>>>>> On Mon, Jun 19, 2017 at 9:39 AM, JP 
>>>>> wrote:
>>>>>
>>>>>> HI Paul,
>>>>>>
>>>>>> Funny you should mention that.  I have boost 1.61 (installed manua

Re: [Rdkit-discuss] Installation woes: Ubuntu 16.04, Boost 1.61, and RDKit not living so peacefully together after all ...

2017-06-21 Thread JP
Looks like it is finding the correct version of boost (and it is finding
boost serialize).

/opt/rdkit/rdkit-Release_2017_03_2/build$ cmake -D
RDK_BUILD_INCHI_SUPPORT=ON -D BOOST_ROOT=/opt/boost_1_61_0/ -D
Boost_NO_SYSTEM_PATHS=ON ..
-- Could NOT find InChI in system locations (missing:  INCHI_LIBRARY
INCHI_INCLUDE_DIR)
CUSTOM_INCHI_PATH = /opt/rdkit/rdkit-Release_2017_03_2/External/INCHI-API
-- Found InChI software locally
-- Boost version: 1.61.0
-- Found the following Boost libraries:
--   python
-- Boost version: 1.61.0
-- Found the following Boost libraries:
--   thread
--   system
--   chrono
--   date_time
--   atomic
-- Boost version: 1.61.0
-- Found the following Boost libraries:
--   serialization
== Using strict rotor definition
== Updating Filters.cpp from pains file
== Done updating pains files
-- Boost version: 1.61.0
-- Found the following Boost libraries:
--   regex
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/rdkit/rdkit-Release_2017_
03_2/build

But the error prevails.

If I set up rdkit to use just the system wide boost (1.58), not my specific
install (1.61), rdkit builds successfully.  I do this by clearing
BOOST_ROOT, LD_LIBRARY_PATH, and removing the boost-related cmake flags.
So this is certainly an issue with wrong libraries being picked up.




On 21 June 2017 at 07:29, Greg Landrum  wrote:

> did you build boost serialize?
>
> On Mon, Jun 19, 2017 at 12:03 PM, JP  wrote:
>
>> Hi Greg !
>>
>> Unfortunately that didn't help (I delete everything in my build
>> directory, then):
>>
>> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DBOOST_ROOT=/opt/boost_1_61_0/
>> -DBoost_NO_SYSTEM_PATHS=ON ..
>>
>> and make as usual.
>>
>> [ 62%] Linking CXX executable testReaction
>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference
>> to `boost::archive::text_iarchive_impl> iarchive>::load_override(boost::archive::class_name_type&)'
>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference
>> to 
>> `boost::archive::archive_exception::archive_exception(boost::archive::archive_exception
>> const&)'
>> collect2: error: ld returned 1 exit status
>> Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/build.make:116:
>> recipe for target 'Code/GraphMol/ChemReactions/testReaction' failed
>> make[2]: *** [Code/GraphMol/ChemReactions/testReaction] Error 1
>> CMakeFiles/Makefile2:4157: recipe for target
>> 'Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all' failed
>> make[1]: *** [Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all]
>> Error 2
>> Makefile:160: recipe for target 'all' failed
>> make: *** [all] Error 2
>>
>> Perhaps this isn't related to the system vs user-install, boost after all?
>>
>>
>> On 19 June 2017 at 10:38, Greg Landrum  wrote:
>>
>>> If you have a system boost install that you do not want to use, you
>>> should be sure to add "-D Boost_NO_SYSTEM_PATHS=ON" to the cmake arguments.
>>> This will (well, should) disable any usage of the system boost.
>>>
>>> -greg
>>>
>>>
>>> On Mon, Jun 19, 2017 at 9:39 AM, JP  wrote:
>>>
>>>> HI Paul,
>>>>
>>>> Funny you should mention that.  I have boost 1.61 (installed manually
>>>> in /opt) and system boost I installed via sudo apt-get install
>>>>
>>>> /opt/rdkit/rdkit-Release_2017_03_2/build$ dpkg -s libboost-dev | grep
>>>> 'Version'
>>>> Version: 1.58.0.1ubuntu1
>>>>
>>>> However I pass the BOOST path to cmake via:
>>>>
>>>> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DBOOST_ROOT=/opt/boost_1_61_0/  ..
>>>>
>>>> (I also have the $BOOST_ROOT env variable set, so I think that is
>>>> redundant.  Whatever).  cmake output clearly shows it is finding/using
>>>> boost 1.61
>>>>
>>>> Using make VERBOSE=1 I get:
>>>>
>>>> [ 62%] Linking CXX executable testReaction
>>>> cd /opt/rdkit/rdkit-Release_2017_03_2/build/Code/GraphMol/ChemReactions
>>>> && /usr/bin/cmake -E cmake_link_script CMakeFiles/testReaction.dir/link.txt
>>>> --verbose=1
>>>> /usr/bin/c++-mpopcnt -Wno-deprecated -Wno-unused-function
>>>> -fno-strict-aliasing -fPIC -Wall -Wextra -O3 -DNDEBUG
>>>> CMakeFiles/testReaction.dir/testReaction.cpp.o  -o testReaction
>>>> -rdynamic ../../../lib/libRDKitChemReactions.so.1.2017.03.2
>>>> ../../../lib/libRDKitChemTransforms.so.1.2017.03.2
>>>>

Re: [Rdkit-discuss] Installation woes: Ubuntu 16.04, Boost 1.61, and RDKit not living so peacefully together after all ...

2017-06-19 Thread JP
Hi Greg !

Unfortunately that didn't help (I delete everything in my build directory,
then):

cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DBOOST_ROOT=/opt/boost_1_61_0/
-DBoost_NO_SYSTEM_PATHS=ON ..

and make as usual.

[ 62%] Linking CXX executable testReaction
../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference to
`boost::archive::text_iarchive_impl::load_override(boost::archive::class_name_type&)'
../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference to
`boost::archive::archive_exception::archive_exception(boost::archive::archive_exception
const&)'
collect2: error: ld returned 1 exit status
Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/build.make:116:
recipe for target 'Code/GraphMol/ChemReactions/testReaction' failed
make[2]: *** [Code/GraphMol/ChemReactions/testReaction] Error 1
CMakeFiles/Makefile2:4157: recipe for target
'Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all' failed
make[1]: *** [Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all]
Error 2
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

Perhaps this isn't related to the system vs user-install, boost after all?


On 19 June 2017 at 10:38, Greg Landrum  wrote:

> If you have a system boost install that you do not want to use, you should
> be sure to add "-D Boost_NO_SYSTEM_PATHS=ON" to the cmake arguments. This
> will (well, should) disable any usage of the system boost.
>
> -greg
>
>
> On Mon, Jun 19, 2017 at 9:39 AM, JP  wrote:
>
>> HI Paul,
>>
>> Funny you should mention that.  I have boost 1.61 (installed manually in
>> /opt) and system boost I installed via sudo apt-get install
>>
>> /opt/rdkit/rdkit-Release_2017_03_2/build$ dpkg -s libboost-dev | grep
>> 'Version'
>> Version: 1.58.0.1ubuntu1
>>
>> However I pass the BOOST path to cmake via:
>>
>> cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DBOOST_ROOT=/opt/boost_1_61_0/  ..
>>
>> (I also have the $BOOST_ROOT env variable set, so I think that is
>> redundant.  Whatever).  cmake output clearly shows it is finding/using
>> boost 1.61
>>
>> Using make VERBOSE=1 I get:
>>
>> [ 62%] Linking CXX executable testReaction
>> cd /opt/rdkit/rdkit-Release_2017_03_2/build/Code/GraphMol/ChemReactions
>> && /usr/bin/cmake -E cmake_link_script CMakeFiles/testReaction.dir/link.txt
>> --verbose=1
>> /usr/bin/c++-mpopcnt -Wno-deprecated -Wno-unused-function
>> -fno-strict-aliasing -fPIC -Wall -Wextra -O3 -DNDEBUG
>> CMakeFiles/testReaction.dir/testReaction.cpp.o  -o testReaction
>> -rdynamic ../../../lib/libRDKitChemReactions.so.1.2017.03.2
>> ../../../lib/libRDKitChemTransforms.so.1.2017.03.2
>> ../../../lib/libRDKitDescriptors.so.1.2017.03.2
>> ../../../lib/libRDKitFingerprints.so.1.2017.03.2
>> ../../../lib/libRDKitDepictor.so.1.2017.03.2
>> ../../../lib/libRDKitFileParsers.so.1.2017.03.2 -lboost_serialization
>> ../../../lib/libRDKitPartialCharges.so.1.2017.03.2
>> ../../../lib/libRDKitMolTransforms.so.1.2017.03.2
>> ../../../lib/libRDKitEigenSolvers.so.1.2017.03.2
>> ../../../lib/libRDKitFilterCatalog.so.1.2017.03.2
>> ../../../lib/libRDKitSubgraphs.so.1.2017.03.2
>> ../../../lib/libRDKitSmilesParse.so.1.2017.03.2
>> ../../../lib/libRDKitSubstructMatch.so.1.2017.03.2
>> ../../../lib/libRDKitGraphMol.so.1.2017.03.2
>> ../../../lib/libRDKitRDGeometryLib.so.1.2017.03.2
>> ../../../lib/libRDKitDataStructs.so.1.2017.03.2 -lboost_serialization
>> ../../../lib/libRDKitCatalogs.so.1.2017.03.2
>> ../../../lib/libRDKitRDGeneral.so.1.2017.03.2 -lboost_thread
>> -lboost_system -lpthread -Wl,-rpath,/opt/rdkit/rdkit-Re
>> lease_2017_03_2/build/lib
>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference
>> to `boost::archive::text_iarchive_impl> iarchive>::load_override(boost::archive::class_name_type&)'
>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference
>> to 
>> `boost::archive::archive_exception::archive_exception(boost::archive::archive_exception
>> const&)'
>> collect2: error: ld returned 1 exit status
>> Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/build.make:116:
>> recipe for target 'Code/GraphMol/ChemReactions/testReaction' failed
>> make[2]: *** [Code/GraphMol/ChemReactions/testReaction] Error 1
>> make[2]: Leaving directory '/opt/rdkit/rdkit-Release_2017_03_2/build'
>> CMakeFiles/Makefile2:4157: recipe for target
>> 'Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all' failed
>> make[1]: *** [Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all]

Re: [Rdkit-discuss] Installation woes: Ubuntu 16.04, Boost 1.61, and RDKit not living so peacefully together after all ...

2017-06-19 Thread JP
HI Paul,

Funny you should mention that.  I have boost 1.61 (installed manually in
/opt) and system boost I installed via sudo apt-get install

/opt/rdkit/rdkit-Release_2017_03_2/build$ dpkg -s libboost-dev | grep
'Version'
Version: 1.58.0.1ubuntu1

However I pass the BOOST path to cmake via:

cmake -DRDK_BUILD_INCHI_SUPPORT=ON -DBOOST_ROOT=/opt/boost_1_61_0/  ..

(I also have the $BOOST_ROOT env variable set, so I think that is
redundant.  Whatever).  cmake output clearly shows it is finding/using
boost 1.61

Using make VERBOSE=1 I get:

[ 62%] Linking CXX executable testReaction
cd /opt/rdkit/rdkit-Release_2017_03_2/build/Code/GraphMol/ChemReactions &&
/usr/bin/cmake -E cmake_link_script CMakeFiles/testReaction.dir/link.txt
--verbose=1
/usr/bin/c++-mpopcnt -Wno-deprecated -Wno-unused-function
-fno-strict-aliasing -fPIC -Wall -Wextra -O3 -DNDEBUG
CMakeFiles/testReaction.dir/testReaction.cpp.o  -o testReaction -rdynamic
../../../lib/libRDKitChemReactions.so.1.2017.03.2
../../../lib/libRDKitChemTransforms.so.1.2017.03.2
../../../lib/libRDKitDescriptors.so.1.2017.03.2
../../../lib/libRDKitFingerprints.so.1.2017.03.2
../../../lib/libRDKitDepictor.so.1.2017.03.2
../../../lib/libRDKitFileParsers.so.1.2017.03.2 -lboost_serialization
../../../lib/libRDKitPartialCharges.so.1.2017.03.2
../../../lib/libRDKitMolTransforms.so.1.2017.03.2
../../../lib/libRDKitEigenSolvers.so.1.2017.03.2
../../../lib/libRDKitFilterCatalog.so.1.2017.03.2
../../../lib/libRDKitSubgraphs.so.1.2017.03.2
../../../lib/libRDKitSmilesParse.so.1.2017.03.2
../../../lib/libRDKitSubstructMatch.so.1.2017.03.2
../../../lib/libRDKitGraphMol.so.1.2017.03.2
../../../lib/libRDKitRDGeometryLib.so.1.2017.03.2
../../../lib/libRDKitDataStructs.so.1.2017.03.2 -lboost_serialization
../../../lib/libRDKitCatalogs.so.1.2017.03.2
../../../lib/libRDKitRDGeneral.so.1.2017.03.2 -lboost_thread -lboost_system
-lpthread -Wl,-rpath,/opt/rdkit/rdkit-Release_2017_03_2/build/lib
../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference to
`boost::archive::text_iarchive_impl::load_override(boost::archive::class_name_type&)'
../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference to
`boost::archive::archive_exception::archive_exception(boost::archive::archive_exception
const&)'
collect2: error: ld returned 1 exit status
Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/build.make:116:
recipe for target 'Code/GraphMol/ChemReactions/testReaction' failed
make[2]: *** [Code/GraphMol/ChemReactions/testReaction] Error 1
make[2]: Leaving directory '/opt/rdkit/rdkit-Release_2017_03_2/build'
CMakeFiles/Makefile2:4157: recipe for target
'Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all' failed
make[1]: *** [Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all]
Error 2
make[1]: Leaving directory '/opt/rdkit/rdkit-Release_2017_03_2/build'
Makefile:160: recipe for target 'all' failed
make: *** [all] Error 2

/opt/boost_1_61_0/lib is also the first path in LD_LIBRARY_PATH.

Thanks for your help and time.  I really appreciate it.

Cheers

On 16 June 2017 at 14:12, Paul Emsley  wrote:

> On 16/06/2017 12:08, JP wrote:
>
>> Hi Folks,
>>
>> Must have been eons ago last time I posted to this mailing list.  The
>> italians have a saying "chi non muore si rivede".
>>
>> I am trying to install the RDKit (release 2017_03_2) from source, without
>> conda, and I thought this will be a breeze.  But I am getting an error.  I
>> am pretty sure this is because of the boost version I am using (1.61).
>>
>> The error is:
>>
>> [ 62%] Linking CXX executable testReaction
>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined
>> reference to
>> `boost::archive::text_iarchive_impl> iarchive>::load_override(boost::archive::class_name_type&)'
>> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined
>> reference to
>> `boost::archive::archive_exception::archive_exception(boost:
>> :archive::archive_exception
>> const&)'
>> collect2: error: ld returned 1 exit status
>> Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/build.make:116:
>> recipe for
>> target 'Code/GraphMol/ChemReactions/testReaction' failed
>> make[2]: *** [Code/GraphMol/ChemReactions/testReaction] Error 1
>> CMakeFiles/Makefile2:4157: recipe for target
>> 'Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all' failed
>> make[1]: *** [Code/GraphMol/ChemReactions/C
>> MakeFiles/testReaction.dir/all] Error 2
>> Makefile:160: recipe for target 'all' failed
>> make: *** [all] Error 2
>>
>
> If you are using an outdated version of boost, this is not th

[Rdkit-discuss] Installation woes: Ubuntu 16.04, Boost 1.61, and RDKit not living so peacefully together after all ...

2017-06-16 Thread JP
Hi Folks,

Must have been eons ago last time I posted to this mailing list.  The
italians have a saying "chi non muore si rivede".

I am trying to install the RDKit (release 2017_03_2) from source, without
conda, and I thought this will be a breeze.  But I am getting an error.  I
am pretty sure this is because of the boost version I am using (1.61).

The error is:

[ 62%] Linking CXX executable testReaction
> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference to
> `boost::archive::text_iarchive_impl text_iarchive>::load_override(boost::archive::class_name_type&)'
> ../../../lib/libRDKitChemReactions.so.1.2017.03.2: undefined reference to
> `boost::archive::archive_exception::archive_exception(
> boost::archive::archive_exception const&)'
> collect2: error: ld returned 1 exit status
> Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/build.make:116:
> recipe for target 'Code/GraphMol/ChemReactions/testReaction' failed
> make[2]: *** [Code/GraphMol/ChemReactions/testReaction] Error 1
> CMakeFiles/Makefile2:4157: recipe for target 'Code/GraphMol/ChemReactions/
> CMakeFiles/testReaction.dir/all' failed
> make[1]: *** [Code/GraphMol/ChemReactions/CMakeFiles/testReaction.dir/all]
> Error 2
> Makefile:160: recipe for target 'all' failed
> make: *** [all] Error 2


My setup:
Ubuntu 16.04.2
(manually installed) boost v1.61
RDKit release - 2017.03.2

But I read somewhere that "Boost 1.61 and the RDKit work together.".
https://sourceforge.net/p/rdkit/mailman/message/35124509/  - so my question
is, is this combo expected to work or not?

As I side note, but this is just a deprecation warning, I do get a ton of
these,
/opt/boost_1_61_0/include/boost/type_traits/detail/template_arity_spec.hpp:13:84:
note: #pragma message: NOTE: Use of this header (template_arity_spec.hpp)
is deprecated
 # pragma message("NOTE: Use of this header (template_arity_spec.hpp) is
deprecated")

I know, I know, not the most glamorous of a comeback ...

Beers and Cheers,
JP
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] [off topic] What comes next

2015-12-08 Thread JP
May the force be with you!

(Looking forward to some excellent and very exciting times for the RDKit
community!)


On 8 December 2015 at 06:06, Greg Landrum  wrote:

> TL;DR: I'm leaving Novartis at the end of January. Starting in February I
> will be splitting my time between a position at KNIME.com and a company I
> will start around the RDKit.
>
> Some more info, in some kind of order. After being with NIBR for more than
> 9 years I decided that I wanted to switch back to a small organization. I
> wasn't quite sure what exactly that would be, but something involving the
> RDKit was a must. This is where I've landed.
>
> At KNIME.com I will be working with the rest of the team to expand KNIME's
> footprint and impact in the life sciences industry. It's a small company,
> so I'm sure I will be offering opinions in other areas as well. More to
> come on that once I've officially started. It's going to be fun. :-)
>
> I'm not 100% sure how the RDKit company is going to develop. I don't even
> have a name for it yet! The company will definitely offer support and
> services (training, sponsored development, collaborative projects, etc)
> around the RDKit. It will have a strong commitment to the open-source code.
> It doesn't seem likely that I'm going to end up pursuing a model with paid
> "enterprise" components or where customers pay for early access to
> features. These things work for some open-source projects, but they don't
> feel right to me. Having said that, really small companies need to remain
> very flexible, so this could all change (well, not all of it, the strong
> commitment to the open-source code will remain a fixed point). We'll see,
> it's exciting!
>
> I'll let you know once I've got the company started; I am planing to do a
> lot of my thinking about the business model "out loud" and will definitely
> be looking for feedback.
>
> Best,
> -greg
>
>
>
> --
> Go from Idea to Many App Stores Faster with Intel(R) XDK
> Give your users amazing mobile app experiences with Intel(R) XDK.
> Use one codebase in this all-in-one HTML5 development environment.
> Design, debug & build mobile apps & 2D/3D high-impact games for multiple
> OSs.
> http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit Tools for the IPython Notebook

2015-07-02 Thread JP
Just WOW Axel.

This is useful.  Perhaps they will be merged in some future version of
RDKit?

-
Jean-Paul Ebejer
Early Stage Researcher

On 2 July 2015 at 15:20, George Papadatos  wrote:

> Axel, this is seriously cool!
> Many thanks!
>
> George
>
> On 2 July 2015 at 13:31, Axel Pahl  wrote:
>
>>  Dear fellow RDKitters,
>>
>> the RDKit community is always so helpful that I wanted share back two
>> functions that I use in the IPython Notebook from which I thought that they
>> could be of use to others, as well.
>>
>> - show_table:
>> Display a list of molecules in a table with molecule properties as
>> columns.
>> When an ID property is given, the table becomes interactive and compounds
>> can be selected.
>> I know that this can be also done with PandasTools but that might be
>> overkill in some situations. Also the table from Pandas is not interactive
>> to my knowledge.
>>
>> - jsme:
>> Display Peter Ertl's Javascript Melecule Editor to enter a molecule
>> directly in the IPython notebook (how cool is that??)
>>
>> If you are interested, please have a look at the GitHub
>>  repo and the example
>> 
>> notebook.
>>
>> Kind regards,
>> Axel
>>
>>
>> --
>> Don't Limit Your Business. Reach for the Cloud.
>> GigeNET's Cloud Solutions provide you with the tools and support that
>> you need to offload your IT needs and focus on growing your business.
>> Configured For All Businesses. Start Your Cloud Today.
>> https://www.gigenetcloud.com/
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Memory management during conformer generation

2015-06-24 Thread JP
Isn't the problem here that you are keeping an array (jobs) and you keep
adding molecules to it never letting the garbage collector collect/clear
any memory ?  If your file has a million molecules, you will have an array
of a million molecules in memory...

Why dont you process each single molecule (set name / remove similar confs
etc / remove high energy stuff), write it to file and release it ? in the
if mol: clause...

Cheers
JP

-
Jean-Paul Ebejer
Early Stage Researcher

On 24 June 2015 at 16:47, az  wrote:

>  Hi
>
>  Using the cookbook code as basis (apologies if I should have posted in
> the corresponding topic), I've put together a script to generate conformers
> for my smiles library. Works like a charm too, aside from the fact that
> after 10-20 hours, I'm out of RAM and swap (the memory consumption seems to
> be accumulating with each iteration). I'd appreciate any hints for getting
> this resolved (any other ones as well).
>
> Thanks a lot,
> Adam
>
> the code
>
> max_workers = 16
>
> def generateconformations(m, n, name=''):
> m = Chem.AddHs(m)
> ids=AllChem.EmbedMultipleConfs(m, numConfs=n, pruneRmsThresh=0.5,
> randomSeed=1)
> etable=[] ## Gathers conformer energies
>
> for id in ids:
> ff = AllChem.UFFGetMoleculeForceField(m, confId=id)
> ff.Minimize()
> etable.append(ff.CalcEnergy())
>
> return PropertyMol(m), list(ids), etable, name
>
> input_dir, output_dir = sys.argv[1:3]
> n = 75 ## Conformer number
>
> os.chdir(input_dir)
> for ifile in glob.glob('*.s*'):
>
> raw_file = open(ifile, 'r') ## To get back molecule name later on
> ofile = os.path.join(output_dir, 'conf_' + ifile)
>
> if 'smiles' in ifile:
> suppl = Chem.SmilesMolSupplier(ifile, titleLine=False,
> delimiter='\t')
> ofile = ofile.replace('.smiles', '.sdf')
> sdfinput = False
>
> if not os.path.isfile(ofile):
>
> writer = Chem.SDWriter(ofile)
>
> print 'Processing %s' %os.path.abspath(ifile),
> datetime.datetime.now()
>
> if sdfinput == False:
> with futures.ProcessPoolExecutor(max_workers=max_workers) as
> executor:
> # Submit a set of asynchronous jobs
> jobs = []
>
> for mol in suppl:
> if mol:
> raw_line = raw_file.readline().split()[1] ##
> extracting molecule name from the olriginal ifile
> job = executor.submit(generateconformations, mol,
> n, raw_line) ## returns molecules and associated ids / untill here the
> conformers cannot be pickled
> jobs.append(job)
>
> for job in jobs:
> mol, ids, etable, name = job.result()
> mol.SetProp("_Name", name) ## Restoring lost property
> mine = min(etable) ## Lowest conformer energy
>
> for i in ids:
> if etable[i] > mine + 20: ## Conformers with
> energies greater then min+20 will not be written
> ids.remove(i)
> for i in ids:
> for j in ids:
> if i != j:
> if AllChem.GetConformerRMS(mol, i, j) <
> 0.5: ## 0.5 A threshold for keeping conformers
> ids.remove(j)
> for id in ids:
> writer.write(mol, confId=id)
>
> writer.close()
>
> else:
> print "%s exists, skipping" % ofile
>
> ===
>
>
>
>
>
> --
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecular dis / similarity using fingerprints

2015-05-27 Thread JP
Thanks all for the lit. references (and for the ever useful TL;DR).  It now
seems clear that 0.7 is too high a value for ECFP4 (you convinced me).

Yes George, that was what I was trying to do - make statements like "this
compound library is more diverse than this other", and quantify that
diversity with a set of numbers.

-
Jean-Paul Ebejer
Early Stage Researcher

On 26 May 2015 at 12:57, George Papadatos  wrote:

> Hi JP,
>
> Aha, so you're looking for a threshold that will exhibit the optimal
> balance between the false positives and false negatives in the
> *biological* *activity* space. This threshold varies depending on the
> fingerprint and the dataset of course.
> See here for some generalised insights:
>
> (1) Papadatos, G.; Cooper, A. W. J.; Kadirkamanathan, V.; Macdonald, S.
> J. F.; McLay, I. M.; Pickett, S. D.; Pritchard, J. M.; Willett, P.; Gillet,
> V. J. Analysis of Neighborhood Behavior in Lead Optimization and Array
> Design. *J. Chem. Inf. Model.* *2009*, *49*, 195–208.
>
> especially Figure 17, and
>
> (2) Muchmore, S. W.; Debe, D. A.; Metz, J. T.; Brown, S. P.; Martin, Y.
> C.; Hajduk, P. J. Application of Belief Theory to Similarity Data Fusion
> for Use in Analog Searching and Lead Hopping. *J. Chem. Inf. Model.*
> *2008*, *48*, 941–948.
>
> and also Greg's blog post:
>
> http://rdkit.blogspot.co.uk/2013/10/fingerprint-thresholds.html
>
>
> The TL/DR version is that for ECFP_4, this threshold should be around
> 0.45-0.55.
> Wrt methodology, are you trying to score/rank the
> intra-diversity/heterogeneity for different structure sets?
>
>
> Cheers,
>
> George
>
>
>
> On 26 May 2015 at 11:59, JP  wrote:
>
>>
>> On 25 May 2015 at 22:23, Tim Dudgeon  wrote:
>>
>>> Maybe a clustering approach may work? Something like sphere exclusion
>>> clustering with counting the number of clusters at 0.9 - 0.8 similarity)?
>>> With 30K structures it sounds computationally tractable?
>>
>>
>> Thanks Tim for this idea.  I hadn't heard of sphere exclusion.  The
>> problem is we still need a distance / similarity function (which using ECFP
>> with high similarity 0.8-0.9 would result in very few compounds being
>> thrown out).  I think the real issue here is selecting a sensible
>> similarity threshold which defines my idea of "similarity".  But that is a
>> tricky number to get right - too high and you remove nothing, too low and
>> you start catching "different" molecules.  I guess the best thing is try a
>> few values (0.5, 0.6, 0.7, 0.8, 0.9) and have a visual look at the
>> remaining compounds.
>>
>> -
>> JP
>>
>>
>> --
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Molecular dis / similarity using fingerprints

2015-05-26 Thread JP
On 25 May 2015 at 22:23, Tim Dudgeon  wrote:

> Maybe a clustering approach may work? Something like sphere exclusion
> clustering with counting the number of clusters at 0.9 - 0.8 similarity)?
> With 30K structures it sounds computationally tractable?


Thanks Tim for this idea.  I hadn't heard of sphere exclusion.  The problem
is we still need a distance / similarity function (which using ECFP with
high similarity 0.8-0.9 would result in very few compounds being thrown
out).  I think the real issue here is selecting a sensible similarity
threshold which defines my idea of "similarity".  But that is a tricky
number to get right - too high and you remove nothing, too low and you
start catching "different" molecules.  I guess the best thing is try a few
values (0.5, 0.6, 0.7, 0.8, 0.9) and have a visual look at the remaining
compounds.

-
JP
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Molecular dis / similarity using fingerprints

2015-05-25 Thread JP
RDKitters,

I have a partial RDKit / partial Methodology question.  I hope this email
isn't much of the "how long is a piece of string" nature.

I have a set of molecules (~30,000) which I would like to get a structural
"diversity index" for.  So I thought easy - generate some fingerprint I
fancy (ECFP-like, rad 2), take a threshold I fancy (0.7), select a
similarity metric I fancy (Tanimoto) and apply these to the set in a
pairwise fashion (you can only do this for a small-ish number of
molecules).  The resulting distribution of Tanimoto scores defines the
similarity (or dissimilarity) of the set.

First of all is there a better way to do this? Does anyone have a feel for
the numbers to use (fingerprint type, radius, no of bits)?  Is there some
'Industry standard'?  Which method should I use
GetMorganFingerprintAsBitVect or GetMorganFingerprint (considering I wanted
ECFP like fingerprints) ?  What determines when to use one over the other?

All my scores are rather low even for relatively similar structures -- so I
think one of my parameters must be off.  Just adding (or removing) a
carbonyl drops my score to 0.43.
I made this notebook example:
http://nbviewer.ipython.org/gist/malteseunderdog/6af446c0dbb1ac9840e7

To the RDKit question: GetMorganFingerprintAsBitVect and
GetMorganFingerprint give different tanimoto scores (with same radius: 2).
This is of course because for the explicit bit vector we can set the length
of the vector/fingerprint.  Is there an equivalence between the two? (say
using n bits gives same results as GetMorganFingerprint).  How come the
GetMorganFingerprint method has no user-defined length for the
fingerprint?  What are the hashed equivalents of these fingerprints (e.g.
GetHashedMorganFingerprint) ?

Take care,
JP

ps A small suggestion, if I am allowed.  The fingerprint classes could do
with an informative toString (or non Java equivalent) - I know there is
ToBitString, but you need to call that explicitly when printing
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Help building the RDKit cookbook

2015-04-30 Thread JP
The build for 'make singlehtml' works (so does 'make html' and removing the
dep).  But the generated documentation has dead links.

I think these APIs are needed because of the last (bottom) section
'Additional Information' has links to the Python and C++ APIs, which of
course are not present (and the link is dead).  (An ugly hack could be to
link to the online versions of these ?).  Where are these two API docs
stored on the file system?
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Help building the RDKit cookbook

2015-04-30 Thread JP
Yo Folks,

I need some help building the RDKit documentation (how meta, I need
documentation on the documentation).

I go in $RDBASE/Docs/Book and I 'make html' which barfs the following:

mkdir -p _build/html/api
mkdir -p _build/html/cppapi
cp /opt/RDKit_master/rdkit/docs/*  _build/html/api
cp: cannot stat ‘/opt/RDKit_master/rdkit/docs/*’: No such file or directory
make: *** [apidocs] Error 1

This is obvious why ($RDBASE/rdkit/docs/ has nothing in it! - but the dir
exists).  I also did a find $RDBASE -name "docs" but this returns only one
directory which I created myself. I guess my question has two parts to it.

The first is what is the make process trying to copy (what are the correct
values for APIDOCSHOME  and CPPAPIDOCSHOME)?  And the second is should this
work out of the box (why don't the defaults of 'make html' just work) ?

Thank you!

-
Jean-Paul Ebejer
Early Stage Researcher
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Clustering in RDKit (take 2 - missing wiki link)

2015-04-14 Thread JP
This is now at:
https://github.com/rdkit/rdkit/blob/master/Docs/Book/Cookbook.rst

-
Jean-Paul Ebejer
Early Stage Researcher

On 11 April 2015 at 10:46, JP  wrote:

> Hi RDKitters!
>
> I have a bit of python RDKit clustering code using Butina which is
> commented with:
> # Ripped off from https://code.google.com/p/rdkit/wiki/ClusteringMolecules
>
> Sadly, and as it happens I need to refer back to this.
>
> This was written by Greg, I think as a result of this "nudge":
>
> https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02449.html
>
> This page doesn't seem to exist anymore in the WIki.  Is this because of a
> technical / administrative glitch?  Or has this been removed purposefully
> (perhaps the functionality is not supported anymore) ?
>
> Thanks!
>
> -
> Jean-Paul Ebejer
> Early Stage Researcher
>
--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Clustering in RDKit (take 2 - missing wiki link)

2015-04-11 Thread JP
Hi RDKitters!

I have a bit of python RDKit clustering code using Butina which is
commented with:
# Ripped off from https://code.google.com/p/rdkit/wiki/ClusteringMolecules

Sadly, and as it happens I need to refer back to this.

This was written by Greg, I think as a result of this "nudge":
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg02449.html

This page doesn't seem to exist anymore in the WIki.  Is this because of a
technical / administrative glitch?  Or has this been removed purposefully
(perhaps the functionality is not supported anymore) ?

Thanks!

-
Jean-Paul Ebejer
Early Stage Researcher
--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] XAMPP server No module named RDkit

2015-03-05 Thread JP
I don't have access to a Windows OS to play with this.  Sorry I cannot be
of more help.

On my system (Apache2 and Ubuntu) - you hardcore the env variables in a
file called envvars in /etc/apache2.

You should be able to use the SetEnv directive in the the apache conf file
or in the sites conf files.
This is explained here:
http://httpd.apache.org/docs/current/mod/mod_env.html

A really terrible way to do this would be directly in PHP before your
shell_exec call (http://php.net/manual/en/function.putenv.php).  This is
bound to bite you later.



-
Jean-Paul Ebejer
Early Stage Researcher

On 5 March 2015 at 10:25, Sujit Tangadpalliwar <
sujit.tangadpalli...@gmail.com> wrote:

> Dear Ebejer,
>
> Thanks for your reply.
> I dont have python and RDkit path in my XAMPP environment variable path.
> but python and RDkit path is set on my system (windows server 2003)
> coluld you please suggest how to add python and RDkit path in XAMPP
> environment variable .
>
> Thanks in advance
>
> Regards
> Sujit
>
>
> On Wed, Mar 4, 2015 at 3:44 AM, JP  wrote:
>
>> (Make sure your LD_LIBRARY_PATH env variable is also set to $RDBASE\lib,
>> as you don't list this)
>>
>> In Unix and Linux environments, the issue is that the user running apache
>> is a system account and has no default shell or bash environment.  This has
>> been tackled before here:
>> http://comments.gmane.org/gmane.science.chemistry.rdkit.user/2812
>>
>> I don't know how apache behaves under Windows.  If you want to check that
>> your environment is set up correctly create a php page with
>>
>> > phpinfo();
>> ?>
>>
>> Access this page and have a look at the environment section.  You can see
>> that all the RDKit required variables are here, in my case:
>> EnvironmentVariableValueLD_LIBRARY_PATH/opt/RDKit_2014_03_1/lib:
>> APACHE_RUN_DIR/var/run/apache2APACHE_PID_FILE/var/run/apache2/apache2.pid
>> PATH/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
>> APACHE_LOCK_DIR/var/lock/apache2LANGCAPACHE_RUN_USERwww-data
>> APACHE_RUN_GROUPwww-dataAPACHE_LOG_DIR/var/log/apache2PWD/PYTHONPATH
>> /opt/RDKit_2014_03_1:
>>
>>
>>
>> -
>> Jean-Paul Ebejer
>> Early Stage Researcher
>>
>> On 4 March 2015 at 11:39, Sujit Tangadpalliwar <
>> sujit.tangadpalli...@gmail.com> wrote:
>>
>>> Dear all,
>>> Greetings...
>>>
>>> I am trying to execute python script from PHP (test.php) on XAMPP server.
>>> My python script (test.py) unable to import rdkit rdkitand showing error
>>> message "n import rdkit ImportError: No module named rdkit"
>>> when trying to run same python script from system through cmd or through
>>> python shell its running properly.
>>>
>>> My environmental variable are
>>> RDBASE: C:\RDKit_2013_09_1
>>> PYTHONPATH: %RDBASE%
>>> path: C:\RDKit_2013_09_1\lib
>>>
>>> My files look like:
>>> %%test.py
>>> print "111"
>>> from rdkit import Chem
>>> print "222"
>>>
>>> %%test.php
>>> 
>>> 
>>> >> $result = shell_exec("C:/Python27/python.exe
>>> D:/Databases/XAMPP/htdocs/jsmol/PredictR/python_prog/asas.py  2>&1");
>>> echo $result;
>>> ?>
>>> 
>>> 
>>>
>>> %%output
>>> 111 Traceback (most recent call last): File "path/test.py", line 2, in
>>> import rdkit ImportError: No module named rdkit
>>>
>>> Please help me out to sort this problem
>>> Thanks in advance.
>>>
>>>
>>> --
>>> Warm regards
>>> Sujit R. Tangadpalliwar
>>> PhD Research Scholar,
>>> NIPER, Mohali.
>>>
>>>
>>>
>>> --
>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>> for all
>>> things parallel software development, from weekly thought leadership
>>> blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
> Warm regards
> Sujit R. Tangadpalliwar
> PhD Research Scholar,
> NIPER, Mohali.
>
>
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] XAMPP server No module named RDkit

2015-03-04 Thread JP
(Make sure your LD_LIBRARY_PATH env variable is also set to $RDBASE\lib, as
you don't list this)

In Unix and Linux environments, the issue is that the user running apache
is a system account and has no default shell or bash environment.  This has
been tackled before here:
http://comments.gmane.org/gmane.science.chemistry.rdkit.user/2812

I don't know how apache behaves under Windows.  If you want to check that
your environment is set up correctly create a php page with



Access this page and have a look at the environment section.  You can see
that all the RDKit required variables are here, in my case:
EnvironmentVariableValueLD_LIBRARY_PATH/opt/RDKit_2014_03_1/lib:
APACHE_RUN_DIR/var/run/apache2APACHE_PID_FILE/var/run/apache2/apache2.pid
PATH/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
APACHE_LOCK_DIR/var/lock/apache2LANGCAPACHE_RUN_USERwww-dataAPACHE_RUN_GROUP
www-dataAPACHE_LOG_DIR/var/log/apache2PWD/PYTHONPATH/opt/RDKit_2014_03_1:



-
Jean-Paul Ebejer
Early Stage Researcher

On 4 March 2015 at 11:39, Sujit Tangadpalliwar <
sujit.tangadpalli...@gmail.com> wrote:

> Dear all,
> Greetings...
>
> I am trying to execute python script from PHP (test.php) on XAMPP server.
> My python script (test.py) unable to import rdkit rdkitand showing error
> message "n import rdkit ImportError: No module named rdkit"
> when trying to run same python script from system through cmd or through
> python shell its running properly.
>
> My environmental variable are
> RDBASE: C:\RDKit_2013_09_1
> PYTHONPATH: %RDBASE%
> path: C:\RDKit_2013_09_1\lib
>
> My files look like:
> %%test.py
> print "111"
> from rdkit import Chem
> print "222"
>
> %%test.php
> 
> 
>  $result = shell_exec("C:/Python27/python.exe
> D:/Databases/XAMPP/htdocs/jsmol/PredictR/python_prog/asas.py  2>&1");
> echo $result;
> ?>
> 
> 
>
> %%output
> 111 Traceback (most recent call last): File "path/test.py", line 2, in
> import rdkit ImportError: No module named rdkit
>
> Please help me out to sort this problem
> Thanks in advance.
>
>
> --
> Warm regards
> Sujit R. Tangadpalliwar
> PhD Research Scholar,
> NIPER, Mohali.
>
>
>
> --
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-23 Thread JP
Ok so I got out my test set of 6,940,083 molecules.  First, I generated the
inchi using 2014_09_2.  I then checked out (and built) the master (with
Greg's latest commits) from github and regenerated the inchis for all these
molecules.

3,257 molecules (of 6,940,083) gave me a different inchis between the
current production version and the development (github) one.

For these 3,257 molecules I hammered the
http://cactus.nci.nih.gov/chemical/structure/%s/stdinchi site and assumed
this to be the 'correct' inchi (those great guys will have an interesting
spike in their web traffic last Fri evening).  In 6 (out of 3,257) cases we
get different Inchis from cactus.nci.nih.gov vs RDKit github development
version (2015.03.1pre).

Here is the list (first inchi is the 2014_09_2, second one is the
2015.03.1pre generated one, third inchi is the cactus.nci.nih.gov):

O=C(/N=c1/[nH]ncs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
MPQBIWRBISQCLJ-BETUJISGSA-N MPQBIWRBISQCLJ-JOCQHMNTSA-N
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13+
# RDKit 2014_09_2
InChI=1S/C18H19N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h1-4,10-13H,5-9H2,(H,21,22,24)/t12-,13-
# RDKit 2015.03.1pre
InChI=1S/C18H29N5O2S/c24-16(21-18-22-20-11-26-18)13-7-5-12(6-8-13)9-23-10-19-15-4-2-1-3-14(15)17(23)25/h12-15,19-20H,1-11H2,(H,21,22,24)/t12-,13-,14?,15?
# cactus.nci.nih.gov

O=C(/N=c1\[nH]c(-c2n2)cs1)[C@H]1CC[C@H](Cn2cnc3c3c2=O)CC1
CZKXHWCYFFXKGH-CALCHBBNSA-N CZKXHWCYFFXKGH-QAQDUYKDSA-N
InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17+
InChI=1S/C24H23N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h1-7,12,14-17H,8-11,13H2,(H,27,28,30)/t16-,17-
InChI=1S/C24H39N5O2S/c30-22(28-24-27-21(14-32-24)20-7-3-4-12-25-20)17-10-8-16(9-11-17)13-29-15-26-19-6-2-1-5-18(19)23(29)31/h16-21,25-26H,1-15H2,(H,27,28,30)/t16-,17-,18?,19?,20?,21?

CCOC(=O)Cc1cs/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)[nH]1
GAXCPQSXDNGSQV-IYBDPMFKSA-N GAXCPQSXDNGSQV-WKILWMFISA-N
InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16+
InChI=1S/C23H26N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h3-6,13-16H,2,7-12H2,1H3,(H,25,26,29)/t15-,16-
InChI=1S/C23H36N4O4S/c1-2-31-20(28)11-17-13-32-23(25-17)26-21(29)16-9-7-15(8-10-16)12-27-14-24-19-6-4-3-5-18(19)22(27)30/h15-19,24H,2-14H2,1H3,(H,25,26,29)/t15-,16-,17?,18?,19?

COCc1n[nH]/c(=N/C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)s1
YVZJPKUMKXPZTK-OKILXGFUSA-N YVZJPKUMKXPZTK-HDJSIYSDSA-N
InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14+
InChI=1S/C20H23N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h2-5,12-14H,6-11H2,1H3,(H,22,24,26)/t13-,14-
InChI=1S/C20H33N5O3S/c1-28-11-17-23-24-20(29-17)22-18(26)14-8-6-13(7-9-14)10-25-12-21-16-5-3-2-4-15(16)19(25)27/h13-17,21,23H,2-12H2,1H3,(H,22,24,26)/t13-,14-,15?,16?,17?

COC(=O)c1[nH]/c(=N\C(=O)[C@H]2CC[C@H](Cn3cnc4c4c3=O)CC2)sc1C(C)C
KNDSLDLCZNAXPK-IYBDPMFKSA-N KNDSLDLCZNAXPK-WKILWMFISA-N
InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16+
InChI=1S/C24H28N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h4-7,13-16H,8-12H2,1-3H3,(H,26,27,29)/t15-,16-
InChI=1S/C24H38N4O4S/c1-14(2)20-19(23(31)32-3)26-24(33-20)27-21(29)16-10-8-15(9-11-16)12-28-13-25-18-7-5-4-6-17(18)22(28)30/h14-20,25H,4-13H2,1-3H3,(H,26,27,29)/t15-,16-,17?,18?,19?,20?

CC(C)[C@H]1CC[C@H](C(=O)N[C@H](Cc2c2)C(=O)/N=c2\[nH]ncs2)CC1
OKTRHZCAACPPLC-FGTMMUONSA-N OKTRHZCAACPPLC-KZNAEPCWSA-N
InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17+,18-/m1/s1
InChI=1S/C21H28N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h3-7,13-14,16-18H,8-12H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1
InChI=1S/C21H36N4O2S/c1-14(2)16-8-10-17(11-9-16)19(26)23-18(12-15-6-4-3-5-7-15)20(27)24-21-25-22-13-28-21/h14-18,22H,3-13H2,1-2H3,(H,23,26)(H,24,25,27)/t16-,17-,18-/m1/s1


I have looked at these molecules in MarvinSketch to try to figure out why
different inchis are being generated.  Perhaps there is a problem in RDKit
which is always detecting one of the rings as aromatic (the Inchi doesn't
seem to agree on the aromaticity).

I hope this is helpful.
JP



-
Jean-Paul Ebejer
Ea

[Rdkit-discuss] Round-tripping Aromatic (And Radical) N using SMILES - Fails

2015-02-19 Thread JP
Hi there RDKitters,

I'm (round) tripping a lot these days.  First it was InChIs now it's kekule
radical Nitrogen atoms.

The round trip SMILES -> RDKit Mol -> SMILES -> RDKit Mol fails, but I
don't think it should (if it was successfully instanced to an RDKit
molecule the first time, converting it to SMILES and back to the molecule
should not 'break' my molecule subsequently).  Note that the non-radical
versions of the molecules work.

Here's my notebook explaining the problem: http://goo.gl/Ma9klv

Perhaps, bug #340 ( https://github.com/rdkit/rdkit/issues/340 ) is more
general than first thought, and doesn't apply only to CTABs?  This somehow
looks related.

Good Thursday!

-
Jean-Paul Ebejer
Early Stage Researcher
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit compile is successful, but python does see RDKit?

2015-02-17 Thread JP
Hi Stephen,

As Christos pointed out, it is almost always the environment variables
which get you.  What is the error message you are getting?

Some installation instructions specific for Ubuntu may be found at (work
and tested till version 14.04):
http://www.blopig.com/blog/2013/02/how-to-install-rdkit-on-ubuntu-12-04/

Take Care,
JP


-
Jean-Paul Ebejer
Early Stage Researcher

On 17 February 2015 at 17:20, Stephen O'hagan 
wrote:

>  Hi,
>
>
>
> On one our Ubuntu machines, I’ve installed RDKit (compiled from source to
> get the latest version);  ctest passed all tests.
>
>
>
> Cmake seemed to detect the correct python version and boost libs.
>
>
>
> However, python does not see the RDkit module(s).
>
>
>
> Any ideas what might be going wrong?
>
>
>
> 
>
> Dr. Steve O'Hagan,
>
> Computer Officer,
>
> Bioanalytical Sciences Group,
>
> School of Chemistry,
>
> Manchester Institute of Biotechnology,
>
> University of Manchester,
>
> 131, Princess St,
>
> MANCHESTER M1 7DN.
>
>
>
> Email: soha...@manchester.ac.uk
>
> Phone: 0161 306 4562
>
>
>
>
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit, Inchi, Stereochemistry !

2015-02-17 Thread JP
Hi there,

I have a question for the 3D enabled of you (I wish the world looked like
GTA2 !)

I am seeing a case of an RDKit mol -> Inchi -> RDKit mol, that I think is
changing the  stereochemistry of the molecule.  I have 12 example-pairs
where this happens (but all very structurally similar).  I don't care much
that the last rdkit molecule is a different tautomer than the starting one
- but if this is the case the stereochemistry should still be conserved, no?

I did an ipython notebook (most useful tool of the decade after RDKit?)
gist here:

http://nbviewer.ipython.org/urls/gist.githubusercontent.com/anonymous/7c158926a0f3bf9a4978/raw/d91cc808ac91eccc8bf0e45d9eacd2af382e5105/gistfile1.txt

I appreciate if anyone could shed some light.  I'd just like to understand.

Thank you for your time!

-
JP
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Inchi installation in postgresql database driving me mad

2015-02-13 Thread JP
This! This! This was it.  Great find Jan. Thanks.  First instance in my
life where commenting a bit of code broke it. :)

-
Jean-Paul Ebejer
Early Stage Researcher

On 12 February 2015 at 22:36, Jan Holst Jensen 
wrote:

>  On 2015-02-12 17:50, JP wrote:
>
> My Makefile now looks:
>
>>
>>>  # -
>>> # Variables used and default values:
>>> USE_INCHI=1  # enables InChI functions; requires rdkit built with
>>> inchi support
>>> # USE_AVALON=0 # enables avalon fingerprint; requires rdkit built
>>> with avalon support
>>> USE_POPCOUNT=1   # enables use of the CPU's popcount instruction
>>> # USE_THREADS=0# links against boost.system; required with
>>> non-ancient boost versions if inchi is enabled or the rdkit is built with
>>> threadsafe SSS
>>> # STATIC_LINK=1# link against the static RDKit libraries
>>> # 
>>>
>>>
> Hi JP,
>
> You were almost there. I had this problem too. The USE_INCHI line should
> read
>
> USE_INCHI=1
>
> and not
>
> USE_INCHI=1  # enables InChI functions; requires rdkit built with
> inchi support
>
>  I guess the comment gets included into the USE_INCHI variable and then
> the check for "is USE_INCHI == 1" in the makefile fails.
>
> Cheers
> -- Jan
>
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Inchi installation in postgresql database driving me mad

2015-02-12 Thread JP
select mol_inchikey(mol_from_smiles('CC'));
mol_inchikey
-
 OTMSDBZUPAUEDD-UHFFFAOYSA-N

But this was a lot of work.

Whatever I did, I could not get into the inchi section of the makefile
(which I filled up with $(info debuggingmessages) in true testing
tradition).  My make version is 3.81.  So I just commented the inchi ifeq
and endif lines (lol).  The make actually failed after this.  From the
error message I noticed I required a sudo apt-get install
libboost-system-dev.  I am using boost 1.42.0 for the interested.  Then
everything as normal:

make clean
make
sudo make install

in  $RDBASE/Code/PgSQL/rdkit

dropdb testdb
createdb testdb

and

create extension rdkit;
select mol_inchikey(mol_from_smiles('CC'));
mol_inchikey
-
 OTMSDBZUPAUEDD-UHFFFAOYSA-N

And then a tear of joy and a laptop kicked out of the window.

Thanks Greg and Roccardo for your ideas -- I wouldn't have noticed this if
you hadn't pointed me in the right direction!




-
Jean-Paul Ebejer
Early Stage Researcher

On 12 February 2015 at 16:57, JP  wrote:

>
> On 12 February 2015 at 16:31, JP  wrote:
>
>> No such luck - yet.
>>
>> I stopped the database.
>>
>> My Makefile now looks:
>>
>> # -
>> # Variables used and default values:
>> USE_INCHI=1  # enables InChI functions; requires rdkit built with
>> inchi support
>> # USE_AVALON=0 # enables avalon fingerprint; requires rdkit built
>> with avalon support
>> USE_POPCOUNT=1   # enables use of the CPU's popcount instruction
>> # USE_THREADS=0# links against boost.system; required with
>> non-ancient boost versions if inchi is enabled or the rdkit is built with
>> threadsafe SSS
>> # STATIC_LINK=1# link against the static RDKit libraries
>> # 
>>
>> (how on earth did I miss this first time around)
>>
>> I then:
>>
>> make clean
>> make
>> sudo make install
>>
>>
> Just to be clear I do these in $RDBASE/Code/PgSQL/rdkit
>
>
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Inchi installation in postgresql database driving me mad

2015-02-12 Thread JP
On 12 February 2015 at 16:31, JP  wrote:

> No such luck - yet.
>
> I stopped the database.
>
> My Makefile now looks:
>
> # -
> # Variables used and default values:
> USE_INCHI=1  # enables InChI functions; requires rdkit built with
> inchi support
> # USE_AVALON=0 # enables avalon fingerprint; requires rdkit built with
> avalon support
> USE_POPCOUNT=1   # enables use of the CPU's popcount instruction
> # USE_THREADS=0# links against boost.system; required with non-ancient
> boost versions if inchi is enabled or the rdkit is built with threadsafe SSS
> # STATIC_LINK=1# link against the static RDKit libraries
> # 
>
> (how on earth did I miss this first time around)
>
> I then:
>
> make clean
> make
> sudo make install
>
>
Just to be clear I do these in $RDBASE/Code/PgSQL/rdkit
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Inchi installation in postgresql database driving me mad

2015-02-12 Thread JP
No such luck - yet.

I stopped the database.

My Makefile now looks:

# -
# Variables used and default values:
USE_INCHI=1  # enables InChI functions; requires rdkit built with inchi
support
# USE_AVALON=0 # enables avalon fingerprint; requires rdkit built with
avalon support
USE_POPCOUNT=1   # enables use of the CPU's popcount instruction
# USE_THREADS=0# links against boost.system; required with non-ancient
boost versions if inchi is enabled or the rdkit is built with threadsafe SSS
# STATIC_LINK=1# link against the static RDKit libraries
# 

(how on earth did I miss this first time around)

I then:

make clean
make
sudo make install

I switch on the DB here...

make installcheck runs like a charm (9 tests)

createdb testdb
psql testdb
create extension rdkit;

testdb=# select mol_inchi(mol_from_smiles('CC'));
  mol_inchi
-
 InChI not available
(1 row)

BOOM !!!








-
Jean-Paul Ebejer
Early Stage Researcher

On 12 February 2015 at 15:52, Riccardo Vianello  wrote:

> Hi Jean-Paul,
>
>
> On Thu, Feb 12, 2015 at 3:37 PM, JP  wrote:
>
>> cd $RDBASE/Code/PgSQL/rdkit
>> make clean
>> make
>> sudo make install # in order to get this to work I had to change the this
>> line in the Makefile PG_CONFIG  = /opt/postgresql-9.3.4/bin/pg_config
>> make installcheck # before this I restart postgresql
>>
>
> just a guess, but maybe in the above you need to edit the Makefile to set
> the value for the USE_INCHI variable, or pass this variable on the make
> command line? (default configuration is documented at the top of the
> Makefile)
>
> Best,
> Riccardo
>
>
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Inchi installation in postgresql database driving me mad

2015-02-12 Thread JP
Hi there RDKitters,

I am trying to install inchi functionality from the database, and all I
keep getting is:

testdb=# select mol_inchi('CCC'::mol);
  mol_inchi
-
 InChI not available
(1 row)

SO what I have done till now...

>From external/INCHI-API - I execute ./download-inchi.sh
This downloads all the correct inchi library file... Then following the
instructions on https://code.google.com/p/rdkit/wiki/BuildingTheCartridge
(but of course, turning inchi support on):

cmake -DRDK_BUILD_INCHI_SUPPORT=ON ..
make clean
make
make install

And now if I go in python:

Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import rdkit
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem
>>> AllChem.INCHI_AVAILABLE
True
>>>

Brilliant.  Here I stop my running postgresql installation.

cd $RDBASE/Code/PgSQL/rdkit
sudo rm /opt/postgresql-9.3.4/lib/rdkit.so

cd $RDBASE/Code/PgSQL/rdkit
make clean
make
sudo make install # in order to get this to work I had to change the this
line in the Makefile PG_CONFIG  = /opt/postgresql-9.3.4/bin/pg_config
make installcheck # before this I restart postgresql

All final 9 tests pass.  And I can see a new rdkit.so in the postgresql lib
directory.

I login a database
createdb testdb
testdb=#create extension rdkit;
testdb=# load 'rdkit.so' # just to be sure latest is loaded...
testdb=# select mol_inchi('CCC'::mol);
  mol_inchi
-
 InChI not available
(1 row)

This also looks good:
echo $LD_LIBRARY_PATH
/opt/postgresql-9.3.4/lib:/opt/RDKit_2014_09_2/lib:



I am going crazy trying iterations and permutations of the above.  Can
anyone tell me what I am doing wrong?  One added combination is perhaps
that I have other databases which have the rdkit extension installed (from
a previous version).

This is using RDKit 2014_09_2





-
Jean-Paul Ebejer
Early Stage Researcher
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] molecule not draw well

2015-02-02 Thread JP
Just a FYI

The following molecule: Cc1ccc(C[NH+]2C32CC(NC(=S)Nc2c2C)C3)cc1
looks broken when drawn with 2014.09.1 (attached).

Thanks,

-
Jean-Paul Ebejer
Early Stage Researcher
--
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Ridiculously easy problem - capture RDKit's error messages

2015-01-23 Thread JP
yo Folks,

You cannot use exception handling for this - as the molecule-input methods
do not throw exceptions.  You can see this with:

#!/usr/bin/env python

import rdkit
from rdkit import Chem
try:
print 1
Chem.MolFromSmiles('XXX')
print 2
except Exception as msg:
print 3
print "That's broken mate: ", msg

# Outputs:
1
[19:27:21] SMILES Parse Error: syntax error for input: XXX
2

I suspect this is the behaviour you want - I frequently read multi-million
molecule files, and I don't want the whole process to fail because of a few
invalid molecules.  So returning None on those few molecules is kind of
intuitive.  I can see people arguing the other way round (they'd rather get
an exception and handle invalid cases - but then you'd need this
boilerplate everytime) - so it is hard to take a side in this argument.  It
is, of course, always good to know why a molecule failed (and which one),
but as Greg said it's something on the todo list.  Meanwhile I came up with
this code which I think does what I want without opening another process (I
am leaving it here as I know I will be looking this up in a few months
time).

#!/usr/bin/env python

import os
import sys

import rdkit
from rdkit import Chem

# keep the current stderr
stderr_fileno = sys.stderr.fileno()
stderr_save = os.dup(stderr_fileno)
# file descriptor of log file
stderr_fd = open('error.log', 'w')
os.dup2(stderr_fd.fileno(), stderr_fileno)

Chem.MolFromSmiles('X') # logging from this goes into error.log

# close the log file
stderr_fd.close()
# restore old sys err
os.dup2(stderr_save, stderr_fileno)

Chem.MolFromSmiles('Y') # logging from this goes to std err (terminal?)



The idea for this is lifted from
http://stackoverflow.com/questions/24277488/in-python-how-to-capture-the-stdout-from-a-c-shared-library-to-a-variable
- but this isn't very pythonesque and/or understandable.  So all in all an
unproductive afternoon fighting with 6 to 8 lines of code, but I can say
the python has been finally tamed.  For today.

Have a good weekend folks and thanks for all your answers.

Beertime,
JP

p.s. I am somewhat relieved this is not as straightforward as I first
thought.

-
Jean-Paul Ebejer
Early Stage Researcher

On 23 January 2015 at 16:13, Abhik Seal  wrote:

> Hi JP,
>
> I think you can use try and catch ,
>
> try:
> throws()
> return 0
> except Exception, err:
> sys.stderr.write('ERROR: %s\n' % str(err))
> return 1
>
>  Hope this helps.
>
> Abhik Seal
> Indiana University Bloomington
> School of Informatics and Computing
> Cheminformatics and Chemgenomics group <http://registratio54.wix.com/ccrg>
> abs...@indiana.edu
> http://mypage.iu.edu/~abseal/index.htm
>
> On Fri, Jan 23, 2015 at 8:58 AM, JP  wrote:
>
>>
>> Yo RDKitters,
>>
>> I am stuck on something so basic, its embarrassing.  But for the life of
>> me I cannot figure it out on my own.  This is probably more of a python
>> question than an RDKit one.
>>
>> I want to capture the RDKit warning/error message from python.  e.g.
>>
>> >>> import rdkit
>> >>> from rdkit import Chem
>> >>> Chem.MolFromSmiles('XXX')
>> [14:51:32] SMILES Parse Error: syntax error for input: XXX
>>
>> I want to capture that error message (which in this case isn't very
>> informative, but if you read in a mol2 you can get something like
>> [14:25:46] 3ZGZ.A: warning - O.co2 with non C.2 or S.o2 neighbor. Which I
>> am also interested in).  The big picture is that I built a web up where you
>> upload an sdf, mol, mol2 or smi file - and I want to show RDKit's error
>> message if there something funny in the query file.
>>
>> I have tried the obvious (I think):
>>
>> #!/usr/bin/env python
>> import sys
>> import rdkit
>> from rdkit import Chem
>>
>> f = open("err.log", "w")
>> original_stderr = sys.stderr
>> sys.stderr = f
>> Chem.MolFromSmiles('XXX')
>> sys.stderr = original_stderr
>> f.close()
>>
>> This still shows the error message in the terminal and not in the file.
>> I tried the same for stdout, still no cigar.
>>
>> Any ideas ?
>>
>> THANKS!
>> JP
>>
>>
>> --
>> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
>> GigeNET is offering a free month of service with a new server in Ashburn.
>> Choose from 2 high performing configs, both with 100TB of bandwidth.
>> Higher redundancy.Lower latency.Increased capacity.Completely comp

[Rdkit-discuss] Ridiculously easy problem - capture RDKit's error messages

2015-01-23 Thread JP
Yo RDKitters,

I am stuck on something so basic, its embarrassing.  But for the life of me
I cannot figure it out on my own.  This is probably more of a python
question than an RDKit one.

I want to capture the RDKit warning/error message from python.  e.g.

>>> import rdkit
>>> from rdkit import Chem
>>> Chem.MolFromSmiles('XXX')
[14:51:32] SMILES Parse Error: syntax error for input: XXX

I want to capture that error message (which in this case isn't very
informative, but if you read in a mol2 you can get something like
[14:25:46] 3ZGZ.A: warning - O.co2 with non C.2 or S.o2 neighbor. Which I
am also interested in).  The big picture is that I built a web up where you
upload an sdf, mol, mol2 or smi file - and I want to show RDKit's error
message if there something funny in the query file.

I have tried the obvious (I think):

#!/usr/bin/env python
import sys
import rdkit
from rdkit import Chem

f = open("err.log", "w")
original_stderr = sys.stderr
sys.stderr = f
Chem.MolFromSmiles('XXX')
sys.stderr = original_stderr
f.close()

This still shows the error message in the terminal and not in the file.  I
tried the same for stdout, still no cigar.

Any ideas ?

THANKS!
JP
--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
http://p.sf.net/sfu/gigenet___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Count formal charges of molecule in database (RDKit Puzzle Time)

2014-09-22 Thread JP
Ola RDKitters,

I have a molecule in postgresql, and I would like to calculate the overall
formal charge of the molecule as separate + and - counts.

I currently came up with (warning: hack ahead!)

substruct_count(rdkitmol, mol_from_smarts('[-]'), true) +
(substruct_count(rdkitmol, mol_from_smarts('[--]'), true) * 2) +
(substruct_count(rdkitmol, mol_from_smarts('[---]'), true) * 3) as neg

But this being RDKit, there probably is a better way (and what about my
[U+4] ?).

Thanks and, hopefully, see you soon,

-
Jean-Paul Ebejer
Early Stage Researcher
--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Having problems with installing RDKit_2014_03_1 in Ubuntu

2014-07-26 Thread JP
Not a direct solution to your problem, but have you tried the Ubuntu
specific instructions at:
http://www.blopig.com/blog/2013/02/how-to-install-rdkit-on-ubuntu-12-04/

I have installed it successfully on 14.04.


-
Jean-Paul Ebejer
Early Stage Researcher


On 24 July 2014 15:40, Jessica Krause  wrote:

> Dear all,
>
> I tried to install RDKit 2014 on Ubuntu 14.04 but I did
> not succeed!
>
>
> While executing the make command in the RDKit_2014_03_1/build directory, I
> recieved the following error:
>
> [  0%] Built target inchi_support
> [  1%] Built target RDGeneral
> [  3%] Built target RDGeneral_static
> [  3%] Built target testDict
> Linking CXX shared library ../../lib/libRDBoost.so
> /usr/bin/ld: /usr/local/lib/libpython2.7.a(exceptions.o): relocation
> R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared
> object; recompile with -fPIC
> /usr/local/lib/libpython2.7.a: error adding symbols: Bad value
> collect2: error: ld returned 1 exit status
> make[2]: *** [lib/libRDBoost.so.1.2014.03.1] Error 1
> make[1]: *** [Code/RDBoost/CMakeFiles/RDBoost.dir/all] Error 2
> make: *** [all] Error 2
>
>
>
>
> the environmental variables that I have used are:
>
> export RDBASE=opt/RDKit_2014_03_1/
> export
> LD_LIBRARY_PATH=opt/RDKit_2014_03_1/build/lib/:usr/local/src/boost_1_55_0/libs/
> export PYTHONPATH=opt/RDKit_2014_03_1/
>
>
> Please help me with this problem.
>
> Thanks in advance.
>
> Regards,
> Jessica Krause
>
>
>
> --
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Error in reading sdf format

2014-07-23 Thread JP
The problem is in the data file -- I added an example, benzene, from
wikipedia, and fixed the first one of your molecules for you (attached).

Amongst other things - the first three lines are header lines (
http://en.wikipedia.org/wiki/Chemical_table_file), you only have two of
those.

I find the specification very useful when I hit these kind of problems.
 Here it is: http://c4.cabrillo.edu/404/ctfile.pdf


On 23 July 2014 22:26, Abhik Seal  wrote:

> Hi RDkiters,
>
> I have a sdf file attached(2 molecules only) and when i want to do simple
> printing of the mol using code below i am getting an error not sure what is
> wrong with the file
>
> >>> from rdkit import Chem
>
> >>> suppl = Chem.SDMolSupplier('data.sdf')
>
> >>> for mol in suppl:
>
> ... print mol
>
> ...
>
> [21:15:14] ERROR: Cannot convert to int on line 4
>
> [21:15:14] ERROR: moving to the begining of the next molecule
>
> None
>
> [21:15:14] ERROR: Cannot convert to int on line 106
>
> [21:15:14] ERROR: moving to the begining of the next molecule
>  Any help on this problem ?
>
> Abhik Seal
> Indiana University Bloomington
> School of Informatics and Computing
> Cheminformatics and Chemgenomics group 
> abs...@indiana.edu
> http://mypage.iu.edu/~abseal/index.htm
>
>
> --
> Want fast and easy access to all the code in your enterprise? Index and
> search up to 200,000 lines of code with a free copy of Black Duck
> Code Sight - the same software that powers the world's largest code
> search on Ohloh, the Black Duck Open Hub! Try it now.
> http://p.sf.net/sfu/bds
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


data2.sdf
Description: Binary data
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Need for speed -- postgresql / rdkit use of indices(/indexes)

2014-07-14 Thread JP
suggests to the query planner whether to use the index or not (based on a
sample of the table) - but I haven't noticed any difference.  Does the
RDKit indexing technology support this?

Apologies for the long email and happy Monday to everyone!
JP






-
Jean-Paul Ebejer
Early Stage Researcher
--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck®
Code Sight™ - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Explicit valence error when reading sdf files

2014-07-12 Thread JP
On 11 July 2014 23:41, Wendy Carande  wrote:

> 10104489
>   TRC 05231419153D
> PM6 optimization, min free energy conformation
>  14 14  0  0  0  0  0  0  0  0999 V2000
>-0.43072.08890.2792 H   0  0  0  0  0  0  0  0  0  0  0  0
> 0.04071.10710.2148 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.40080.94840.5227 C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.6973   -0.0195   -0.1759 C   0  0  0  0  0  0  0  0  0  0  0  0
> 1.99411.81220.8291 H   0  0  0  0  0  0  0  0  0  0  0  0
> 1.9923   -0.31340.4365 C   0  0  0  0  0  0  0  0  0  0  0  0
>-0.1378   -1.2635   -0.2668 N   0  0  0  0  0  0  0  0  0  0  0  0
>-2.17100.0301   -0.5321 C   0  0  0  0  0  0  0  0  0  0  0  0
> 3.0439   -0.47530.6673 H   0  0  0  0  0  0  0  0  0  0  0  0
> 1.1631   -1.37240.0355 C   0  0  0  0  0  0  0  0  0  0  0  0
>-2.87660.56890.4954 F   0  0  0  0  0  0  0  0  0  0  0  0
>-2.37750.9405   -1.5182 F   0  0  0  0  0  0  0  0  0  0  0  0
>-2.6216   -0.9493   -0.8245 H   0  0  0  0  0  0  0  0  0  0  0  0
> 1.6684   -3.1599   -0.1690 Br  0  0  0  0  0  0  0  0  0  0  0  0
>   2  1  1  0  0  0  0
>   2  3  1  0  0  0  0
>   3  5  1  0  0  0  0
>   4  2  1  0  0  0  0
>   6  3  2  0  0  0  0
>   6  9  1  0  0  0  0
>   7  4  2  0  0  0  0
>   7 10  2  0  0  0  0
>   8  4  1  0  0  0  0
>   8 11  1  0  0  0  0
>  10  6  1  0  0  0  0
>  12  8  1  0  0  0  0
>  13  8  1  0  0  0  0
>  14 10  1  0  0  0  0
> M  RAD  1   2   2
> M  END
>
>
>

This is not a problem with RDKit, but a chemistry problem.

Your structure has a tetra valent N (you have an uncharged nitrogen atom in
the ring with 4 bonds in your structure).  If you add a + charge to the
nitrogen (M CHG line in the sdf, see below), RDKit is able to read in your
structure.  You can easily do this using a free program such as
MarvinSketch (it also shows you where  your original error is).


[image: Inline images 1]

--- PYTHON CODE 

>>> import rdkit
>>> from rdkit import Chem
>>> s = Chem.SDMolSupplier(/tmp/test_fixed.sdf')
>>> s.next()

>>>


--- FIXED SDF FILE 


  Mrv0541 07121410173D -76.23192
PM6 optimization, min free energy conformation
 14 14  0  0  0  0999 V2000
   -0.43072.08890.2792 H   0  0  0  0  0  0  0  0  0  0  0  0
0.04071.10710.2148 C   0  0  0  0  0  0  0  0  0  0  0  0
1.40080.94840.5227 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6973   -0.0195   -0.1759 C   0  0  0  0  0  0  0  0  0  0  0  0
1.99411.81220.8291 H   0  0  0  0  0  0  0  0  0  0  0  0
1.9923   -0.31340.4365 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1378   -1.2635   -0.2668 N   0  3  0  0  0  0  0  0  0  0  0  0
   -2.17100.0301   -0.5321 C   0  0  1  0  0  0  0  0  0  0  0  0
3.0439   -0.47530.6673 H   0  0  0  0  0  0  0  0  0  0  0  0
1.1631   -1.37240.0355 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.87660.56890.4954 F   0  0  0  0  0  0  0  0  0  0  0  0
   -2.37750.9405   -1.5182 F   0  0  0  0  0  0  0  0  0  0  0  0
   -2.6216   -0.9493   -0.8245 H   0  0  0  0  0  0  0  0  0  0  0  0
1.6684   -3.1599   -0.1690 Br  0  0  0  0  0  0  0  0  0  0  0  0
  2  1  1  0  0  0  0
  2  3  1  0  0  0  0
  3  5  1  0  0  0  0
  4  2  1  0  0  0  0
  6  3  2  0  0  0  0
  6  9  1  0  0  0  0
  7  4  2  0  0  0  0
  7 10  2  0  0  0  0
  8  4  1  0  0  0  0
  8 11  1  0  0  0  0
 10  6  1  0  0  0  0
 12  8  1  0  0  0  0
 13  8  1  0  0  0  0
 14 10  1  0  0  0  0
M  CHG  1   7   1
M  RAD  1   2   2
M  END
>  
Cs(1)

>  
-105.218958525368

>  
36.3827 1.8490
118.9528 0.6797
121.3287 1.7880
146.7422 3.2258
230.2547 4.6726
300.5702 5.7138
328.8019 1.4348
361.6117 0.5034
402.5823 0.1183
552.4995 20.1778
573.7578 0.7088
621.4207 13.4753
682.9353 27.8339
701.6618 5.4059
844.3396 76.4557
881.5112 51.8745
935.0009 0.0366
986.5135 4.6250
1020.1213 12.4436
1073.2578 27.3170
1132.0055 17.4835
1149.0508 5.7188
1174.1069 3.8183
1193.8903 14.3170
1225.8361 3.1755
1250.0146 94.6662
1258.0689 25.6122
1333.4544 115.3666
1444.9060 96.2140
1474.6392 0.2878
1604.5610 58.4422
1630.9742 34.4613
2636.6222 77.6239
2737.5860 21.2480
2745.9489 233.0648
2756.3587 214.6328

>  
1046.3408417

>  
72.95

>  
-16.3678412270001

>  
-8.0618952540101

>  
-0.00499623410644148

>  
-0.492544809359508

>  
0.00599090900680561

>  
98.5832126490001

>  
-48.5566496802496

>  
0.492544809149928

>  
82.215371422

>  
40.4947544262395

>  
180.798584071

>  
0.54643002523

>  
0.00692393043054713

>  
167.141981607954

>  
12.6954699037459

>  
-65.71527629439

>  
-76.6311751243748

>  
2.8934

>  
-5.447286492133e+02

>  
-76.2319210753677

>  
745.714161110022

>  
0.33918

>  
-0.37718

>  
1836.33033030781

>  
-3.721651577705e+01

>  
-0.03800

>  
-0.0105562568746138

>  
-1.166109

>  
3.105243637196e+02

>  
0.0

[Rdkit-discuss] Minimizing a Boron containing molecule with MMFF94 ... surprising

2014-07-03 Thread JP
When I have a Boron atom in a compound I am not able to get a MMFF94
forcefield
(RDKit 2014_03_1).  Is this an issue with the MMFF94 forcefield
specification or with the implementation of it in RDKit?

The following code reproduces this problem.


import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem

def emin(smiles):
mol = Chem.MolFromSmiles(smiles) # no problem mol
mol_h = AllChem.AddHs(mol) # always add Hs
conformer_ids = AllChem.EmbedMultipleConfs(mol_h, numConfs = 5)
prop = AllChem.MMFFGetMoleculeProperties(mol_h, mmffVariant="MMFF94")
for conf_id in conformer_ids:
ff = AllChem.MMFFGetMoleculeForceField(mol_h, prop, conf_id)
print ff # for boron none

emin("CCC(C)(C)C") # works
emin("BCC(C)(C)C") # doesnt work



Output:

jp@jp-Galago-UltraPro:~/tmp$ ./emin.py





None
None
None
None
None

Any ideas what is going on in the Boron-containing case?


-
Jean-Paul Ebejer
Early Stage Researcher
--
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Behaviour change between RDKit versions (HasSubstructMatch)

2014-06-19 Thread JP
yo RDKitters,

writing this email while waiting eagerly for the Uruguay-England match in
an hour or so (blame the beer for any lack of consistency beneath).

Can someone explain which changes in the new RDKit result in the following
behaviour change.  Somehow "all" (as in all five of them) my tests are
failing now - which is fine, I'll change my code to use MolFromSmarts
instead (this works).

Using RDKit_2013_09_2:

>>> import rdkit
>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles('CCN=[N+]=[N-]')
>>> npos = Chem.MolFromSmiles("[N+]")
>>> mol.HasSubstructMatch(npos)
True

Using RDKit_2014_03_1

>>> import rdkit
>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles('CCN=[N+]=[N-]')
>>> npos = Chem.MolFromSmiles("[N+]")
>>> mol.HasSubstructMatch(npos)
False

mol.Debug() and npos.Debug() seem to be giving me the same output in both
versions.  I understand the workaround (which is go with MolFromSmarts,
which creates QueryAtoms), but I'd like to understand what is going on
behind the scenes which triggered the behaviour change.

Thank-you!

-
Jean-Paul Ebejer
Early Stage Researcher
--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Opposite of GetSubstructureMatches()

2014-04-17 Thread JP
Thank you Christos, Greg,

As usual, very helpful...


On 17 April 2014 13:40, Christos Kannas  wrote:

> Hi Greg,
>
> Thats why I had that strange "bug" with my hydrophobic - hydrophilic
> fragmentation, I was using PathToSubmol with list of atom indices too.
>
> And the solution I've actually found is, as Greg said, iterate through the
> atoms of the molecule and find the bonds that connect my query pattern,
> hydrophobic atoms, to atoms that are not part of it, aka are hydrophilic.
> Then I used Chem.FragmentOnBonds to break the molecule on that list of
> bonds.
>
> Here is the IPython Notebook that shows what I'm doing
> http://nbviewer.ipython.org/gist/CKannas/10975497
> I do not break rings, if some ring atoms are mapped as hydrophobic, and I
> also keep terminal carbons (CH3) connected to the adjacent hydrophobic
> group, if any. I hope these assumptions are chemically correct.
>
> Best,
>
> Christos
>
> Christos Kannas
>
> Researcher
> Ph.D Student
>
> Mob (UK): +44 (0) 7447700937
> Mob (Cyprus): +357 99530608
>
> [image: View Christos Kannas's profile on 
> LinkedIn]<http://cy.linkedin.com/in/christoskannas>
>
>
> On 17 April 2014 10:18, Greg Landrum  wrote:
>
>>
>> On Thu, Apr 17, 2014 at 10:32 AM, JP  wrote:
>>
>>>
>>> On 16 April 2014 19:13, Christos Kannas  wrote:
>>>
>>>> Chem.PathToSubmol(mol, path)
>>>
>>>
>>> Hi there Christos,
>>>
>>> Many thanks for your reply (and idea of using nbviewer)
>>>
>>> There is still something strange happening which I cannot figure out -
>>> my atom index is a tuple with six elements - and in the resulting "submol"
>>> I get seven atoms.  Also the ring is opened in a chain (so some of the
>>> properties are changing).
>>>
>>> A simple example here:
>>> http://nbviewer.ipython.org/gist/anonymous/10964449
>>>
>>> Any ideas?
>>>
>>
>> PathToSubmol is underdocumented. It's expecting a list/tuple of bond
>> indices; not atom indices.
>>
>> What you need to do is loop over the atoms in your match and find all the
>> bonds that they are involved in that go to other atoms in the match. Pass
>> that tuple/list to PathToSubmol and you should get what you want.
>>
>> If you're ok having dummies marking attachment points (which I suspect
>> you aren't), you could use Chem.ReplaceSidechains(), but otherwise I don't
>> think there's an easier way to do this.
>>
>> -greg
>>
>>
>>
>> --
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/NeoTech
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> --
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Opposite of GetSubstructureMatches()

2014-04-17 Thread JP
On 16 April 2014 19:13, Christos Kannas  wrote:

> Chem.PathToSubmol(mol, path)


Hi there Christos,

Many thanks for your reply (and idea of using nbviewer)

There is still something strange happening which I cannot figure out - my
atom index is a tuple with six elements - and in the resulting "submol" I
get seven atoms.  Also the ring is opened in a chain (so some of the
properties are changing).

A simple example here:
http://nbviewer.ipython.org/gist/anonymous/10964449

Any ideas?

-
Jean-Paul Ebejer
Early Stage Researcher
--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Opposite of GetSubstructureMatches()

2014-04-16 Thread JP
Hi there RDKitters,

This is probably an easy one, but I cannot find anything in the docs or the
mailing list.

I have a tuple of atom Ids (e.g. 21,22,24,26,27) and a mol and I would like
to extract the substructure (molecule) which matches those indices.  Note
that in my case this will be a connected subgraph of the molecule (no
fragmentation).

This is pretty much the opposite of GetSubstruct family of methods which
give Mol -> Indices.  I want Indices -> Mol.

Is there a convenience method to do this?

-
Jean-Paul Ebejer
Early Stage Researcher
--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] cite?

2014-04-16 Thread JP
BibTeX entry:

@MISC{rdkit,
  title = {{RDK}it: Open-source cheminformatics},
  howpublished = {\url{http://www.rdkit.org}},
  note = {[Online; accessed 11-April-2013]},
  key = {RDKit, online}
}


On 16 April 2014 08:55, Greg Landrum  wrote:

>
>
> On Wed, Apr 16, 2014 at 8:16 AM,  wrote:
>
>>
>> I have used this citation:
>>
>> RDKit, Open-Source Cheminformatics. http://www.rdkit.org.
>
>
> There has not (yet) been an RDKit paper published, so there's no
> "official" citation. Paul's suggestion above is what I would also use.
>
> -greg
>
>
>
> --
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] An ultimate way to compute 3D coordinates?

2014-04-05 Thread JP
I don't know about the "ultimate way": but this works for me (to generate n
conformers):

writer = Chem.SDWriter('some_file.sdf')
# add Hydrogens
molH = Chem.AddHs(mol)
# create n conformers for molecule
confIds = AllChem.EmbedMultipleConfs(molH, n)
# E optimize
for confId in confIds:
AllChem.UFFOptimizeMolecule(molH, confId=confId)
# write to output file
writer.write(molH, confId=confId)

You should replace the EmbedMultipleConfs with EmbedMolecule if you are
only interested in generating only one conformer.  UFFOptimizeMolecule(...)
returns an integer, which if 0 tells you the optimization has converged (or
1 otherwise).

UFF is significantly faster, and I do not think the results are worse of
than the ones generated for MMFF.  At least for the small molecules I was
looking at, but I am sure there are exceptions to this.  Paolo has done a
lot of excellent work on the forcefields, and I think the amide and
carbonyl planarity issues for UFF have now been fixed.






-
Jean-Paul Ebejer
Early Stage Researcher


On 5 April 2014 13:35, Michał Nowotka  wrote:

> Hi,
>
> I've found this (
> http://code.google.com/p/rdkit/wiki/Generating3DCoordinates) wiki page
> suggesting how to compute 3D coordinates:
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
>
> m = Chem.MolFromSmiles('c1c1C(=O)O')
> AllChem.EmbedMolecule(m)
> # the molecule now has a crude conformation, clean it up:
> AllChem.UFFOptimizeMolecule(m)
>
> On the other hand, "Getting started document" describes this differently:
>
>
> AllChem.EmbedMolecule(m2)AllChem.UFFOptimizeMolecule(m2)
>
> In the meantime, someone suggested that I should call:
>
> Chem.AddHs(m)
>
> Before calculating 3D properties.
>
> So what is an ultimate way of doing this? Lets assume I already have rdkit 
> molecule:
>
> m = Chem.MolFromSmiles('Cc1c1')
>
>
> or:
>
> m = Chem.MolFromMolFile('data/input.mol')
>
> what should I do with 'm' to compute 3D coordinates?
>
> Also, once we have MMFF implemented in rdkit, is there any benefit of using 
> UFF (apart from maybe backwards compatibility, as this is a new feature)?
>
>
> Is UFF significantly faster then MMFF?
>
> Kind regards,
>
> Michał Nowotka
>
>
>
>
> --
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Possible rotatable bonds replacement

2014-01-31 Thread JP
My 2p worth:

I am not a big fan of outright replacing the NumRotatableBonds
implementation (option 2).  This is quite a popular descriptor which is
used in many ways (e.g. QSAR models, conformer generation, property
calculation, etc.).  IF we are lucky (or skilful, or have had enough time),
we have tests written out for everything which will break as soon as soon
as we get different rotatable bonds count, and different results.  We can
then revalidate our protocols using the new (strict) rotatable counts.
 Perhaps we get better correlations/enrichments/AUCs etc ! Yeah!

On the other hand option (1), having two methods NumRotatableBonds() and
NumStrictRotatableBonds() will lead to some confusion.  Greg has a point
about different people and/or libraries intermixing between the two.

Like Paul, I prefer option (3) - with the default behaviour giving the old
rotatable counts (not strict).  This does not come for free either, as the
API becomes slightly less clean (and what to do in the future when, for
example, someone finds a non-SMARTS based way to do this -- add another
parameter?).  Still I think this is the less of all evils.

Thanks Toby & Greg!
JP


On 31 January 2014 06:54,  wrote:

> > I could add the new descriptor as Toby provided it. People are then
> > free to pick between NumRotatableBonds() and NumStrictRotatableBonds
> > (). This has the advantage of maintaining strict backwards
> > compatibility, but I could imagine it being confusing/irritating to
> > people using the code to have to choose between them (or, worse, using
> both).
> >
> > Another option is to just replace the current NumRotatableBonds()
> > SMARTS with the new one.
> > This loses backwards compatibility, but replaces NumRotableBonds()
> > with something more correct.
> >
> > Finally, I could take a hybrid approach: replace the default
> > NumRotatableBonds() with the new one, but add an extra argument that
> > allows the old one to be used.
>
> >
> > I'm leaning towards the second option. I'd normally go with the
> > third, but I almost view this as a bug fix for the rotatable bonds
> definition.
> >
> > Comments? suggestions? Other options?
>
> I like your idea of your hybrid approach which would mean backwards
> compatibility.
>
>
> paul
>
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.merckgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>
>
> --
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Fw: Add a charge to an atom and modify Hs accordingly (protonation)

2014-01-15 Thread JP
Hi there Chan, Others,

First of all, many thanks for the helpful example and the explanation --
but I require some further clarification more specifically about the
following:

m = Chem.MolFromSmiles('O')
print m.GetAtomWithIdx(0).GetNumImplicitHs()
print m.GetAtomWithIdx(0).GetNumExplicitHs()
print m.GetAtomWithIdx(0).GetTotalNumHs()

m = Chem.MolFromSmiles('[OH2]')
print m.GetAtomWithIdx(0).GetNumImplicitHs()
print m.GetAtomWithIdx(0).GetNumExplicitHs()
print m.GetAtomWithIdx(0).GetTotalNumHs()

m = Chem.MolFromSmiles('O([H])[H]')
print m.GetAtomWithIdx(0).GetNumImplicitHs()
print m.GetAtomWithIdx(0).GetNumExplicitHs()
print m.GetAtomWithIdx(0).GetTotalNumHs()

## Results:
2 0 2
0 2 2
2 # ?? 0 # ?? 2

Why aren't the results of [OH2] and O([H])[H] the same?  is this another
SMILES gotcha?  Am I not declaring the Hs explicitly in both cases?  You
see why I am confused?

Your example works fine, unless the user specifies C(=O)[OH] - in that case
I need to remove the H manually in an editable mol (or I get the incorrect
valency errors).

Also this must be related to Greg's "Rethinking the RDKit's implicit
Hydrogen handling" (
https://www.mail-archive.com/rdkit-devel@lists.sourceforge.net/msg00077.html).
 Does anyone know if this has been implemented? (My guess by reading the
Atom API doc is no
http://www.rdkit.org/Python_Docs/rdkit.Chem.rdchem.Atom-class.html)

Thank you all for your help!
JP


-
Jean-Paul Ebejer
Early Stage Researcher


On 15 January 2014 06:28, S.L. Chan  wrote:

>
>   - Forwarded Message -
>  *From:* S.L. Chan 
> *To:* JP 
> *Sent:* Tuesday, January 14, 2014 10:03 PM
> *Subject:* Re: [Rdkit-discuss] Add a charge to an atom and modify Hs
> accordingly (protonation)
>
> Hello JP,
>
> You need to bear in mind that by default RDKit does not store the
> hydrogens explicitly. RemoveHs() generally does not do anything on
> a fresh molecule, and GetNumExplicitHs() generally returns 0 for all
> atoms.
>
> Implicit hydrogens mean the hydrogen atoms implied.
> GetNumImplicitHs() returns the number of hydrogens EXPECTED to
> be bonded to that atom. This is why it returned 1 even though
> your molecule did not have explicit hydrogens.
>
> With these in mind you should be able to achieve what you want:
>
> >>> m = Chem.MolFromSmiles('C(=O)O')
> >>> m2 = Chem.AddHs(m)
> >>> m2.GetNumAtoms()
> 5
> >>> m.GetAtomWithIdx(2).SetFormalCharge(-1)
> >>> Chem.SanitizeMol(m)
> >>> m3 = Chem.AddHs(m)
> >>> m3.GetNumAtoms()
> 4
> >>> Chem.MolToSmiles(m)
> 'O=C[O-]'
>
> Finally, remember that AddHs() does not generate coordinates for
> the hydrogens. You need to use AllChem.EmbedMolecule() if you
> need coordinates for them.
>
> Ling
>
>
>
>   --
>  *From:* JP 
> *To:* "rdkit-discuss@lists.sourceforge.net" <
> rdkit-discuss@lists.sourceforge.net>
> *Sent:* Tuesday, January 14, 2014 10:58 AM
> *Subject:* [Rdkit-discuss] Add a charge to an atom and modify Hs
> accordingly (protonation)
>
> Hi there,
>
> This must be really easy -- but anything I am trying is failing and I am
> losing my mind.  I want to add a charge (+ / -) to an atom and add or
> delete a connected H accordingly.
>
> I thought an easy way to do this was to remove all Hs from the molecule
> (removeHs), add a charge (SetFormalCharge), and re-add all Hs (AddHs).
>  This doesn't work (the sanitization check fails in AddHs, incorrect
> valence - which I understand) - what is the best way to do this?
>
> Also,
>
> >>> m = Chem.MolFromSmiles('C(=O)O')
> >>> m.Debug()
> Atoms:
> 0 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
>  1 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
> 2 8 O chg: 0  deg: 1 exp: 1 imp: 1 hyb: 3 arom?: 0 chi: 0
> Bonds:
> 0 0->1 order: 2 conj?: 1 aromatic?: 0
> 1 0->2 order: 1 conj?: 1 aromatic?: 0
> >>> m_noHs = Chem.RemoveHs(m)
> >>> m_noHs.Debug()
> Atoms:
> 0 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
> 1 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
>  2 8 O chg: 0  deg: 1 exp: 1 imp: 1 hyb: 3 arom?: 0 chi: 0
> Bonds:
> 0 0->1 order: 2 conj?: 1 aromatic?: 0
>  1 0->2 order: 1 conj?: 1 aromatic?: 0
>
> >>> m_noHs.GetAtomWithIdx(2).GetNumExplicitHs()
> 0
> >>> m_noHs.GetAtomWithIdx(2).GetNumImplicitHs()
> 1
>
> Why is there still an implicit H after I "removed" them?
>
> I have tried to use ReplaceSubstructs() for this (a bit of an overkill)
> but I then lose the 3D information of the orig

Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-15 Thread JP
Thanks Greg!  Much appreciated.

-
Jean-Paul Ebejer
Early Stage Researcher


On 15 January 2014 08:38, Greg Landrum  wrote:

>
> On Tue, Jan 14, 2014 at 11:48 AM, Greg Landrum wrote:
>
>> ok, it looks like something bad happened[1] when the PDB branch was
>> merged into trunk before the last release. Here's an example that worked
>> properly at the time of the UGM:
>>
>> In [5]: m =Chem.MolFromPDBFile('data/2FVD.pdb')
>> In [6]: Chem.MolToSmiles(m,canonical=False)
>> Out[6]: 'NC(C(O)NC(C  '
>>
>> Here's the notebook showing what's supposed to happen:
>>
>> http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Notebooks/Whats_new.ipynb
>>
>> I'll look into this as soon as I can and get it fixed.
>>
>
> I just tracked this down and fixed it. The changes are checked into
> github. Details about what happened are below.
>
> Here's my example from above now:
>
> In [3]: m = Chem.MolFromPDBFile('./2FVD.pdb')
> In [4]: Chem.MolToSmiles(m,canonical=False)
> Out[4]: 'NC(C(=O)NC(C(=O)NC(C(=O'
>
>
> This is, IMO, a major enough problem that it's worth doing a patch release
> to address it. Over the next few days, I will put together a list of fixes
> (not new features) that should be in the 2013_09_2 release and adjust the
> milestones for those issues. Please feel free to suggest additions. The
> list (currently empty) can be found here:
> https://github.com/rdkit/rdkit/issues?milestone=6
>
> For those who care, here's how the bug came about.
> The bond-type assignment code for standard PDB residues tests bonded atoms
> to make sure they are in the same residue. This code compares the two
> atoms' AtomPDBResidueInfo structures. Shortly before the 2013_09_1 release,
> I added an explicit residueNumber property to the AtomPDBResidueInfo class
> and switched the serialNumber property (previously used to store the
> residueNumber) to capture the actual serial number of the atom. I forgot to
> update the residue comparison code to reflect this change, so
> the SamePDBResidue() function was returning false unless the two atoms were
> the same. silly mistake.
>
> Best,
> -greg
>
>
>
>
> --
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Add a charge to an atom and modify Hs accordingly (protonation)

2014-01-14 Thread JP
Hi there,

This must be really easy -- but anything I am trying is failing and I am
losing my mind.  I want to add a charge (+ / -) to an atom and add or
delete a connected H accordingly.

I thought an easy way to do this was to remove all Hs from the molecule
(removeHs), add a charge (SetFormalCharge), and re-add all Hs (AddHs).
 This doesn't work (the sanitization check fails in AddHs, incorrect
valence - which I understand) - what is the best way to do this?

Also,

>>> m = Chem.MolFromSmiles('C(=O)O')
>>> m.Debug()
Atoms:
0 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
 1 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
2 8 O chg: 0  deg: 1 exp: 1 imp: 1 hyb: 3 arom?: 0 chi: 0
Bonds:
0 0->1 order: 2 conj?: 1 aromatic?: 0
1 0->2 order: 1 conj?: 1 aromatic?: 0
>>> m_noHs = Chem.RemoveHs(m)
>>> m_noHs.Debug()
Atoms:
0 6 C chg: 0  deg: 2 exp: 3 imp: 1 hyb: 3 arom?: 0 chi: 0
1 8 O chg: 0  deg: 1 exp: 2 imp: 0 hyb: 3 arom?: 0 chi: 0
 2 8 O chg: 0  deg: 1 exp: 1 imp: 1 hyb: 3 arom?: 0 chi: 0
Bonds:
0 0->1 order: 2 conj?: 1 aromatic?: 0
 1 0->2 order: 1 conj?: 1 aromatic?: 0

>>> m_noHs.GetAtomWithIdx(2).GetNumExplicitHs()
0
>>> m_noHs.GetAtomWithIdx(2).GetNumImplicitHs()
1

Why is there still an implicit H after I "removed" them?

I have tried to use ReplaceSubstructs() for this (a bit of an overkill) but
I then lose the 3D information of the original atom.

Many Thanks... and sorry for the repeated spamming :

-
Jean-Paul Ebejer
Early Stage Researcher
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-14 Thread JP
Apologies all -- but I am still having problems with this.

Reading
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03485.html

"As far as I understood, the PDB reader assigns bond orders to the amino
acids in a protein, but if a ligand is present it puts all bonds of it to
SINGLE bonds as auto bond-type perception is not trivial (see Roger's
comments)."

However I am unable to get bond orders for the protein side - am I doing
something wrong or is this the intended behaviour ?
I imagine I can use AssignBondOrdersFromTemplate() for the 20 amino acids
and set these myself -- or is there a better way to do this?

Also, is there a way to make AssignBondOrdersFromTemplate assign bond
orders to all matches?

>>> import rdkit
>>> from rdkit import Chem
>>> temp = Chem.MolFromSmiles('C=O')
>>> mol = Chem.MolFromSmiles('C(O)CC(O)')
>>> from rdkit.Chem import AllChem
>>> m2 = AllChem.AssignBondOrdersFromTemplate(temp, mol)
[12:24:56] WARNING: More than one matching pattern found - picking one
>>> print Chem.MolToSmiles(m2) # was expecting O=CCC=O
O=CCCO


Also another thing I don't quite understand is in the following below code,
I get a "WARNING: More than one matching pattern found - picking one" but
how can my template match multiple times (this is not symettrical) ?

# (Using RDKit_2013_09_1)
import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem

ligand_mol  = Chem.MolFromPDBBlock("""HETATM1  C1  MRC A1993
 30.994  82.769  82.139  1.00 18.68   C
HETATM2  C2  MRC A1993  29.949  82.382  81.280  1.00 18.38
  C
HETATM3  C3  MRC A1993  28.809  83.090  80.875  1.00 16.44
  C
HETATM4  C4  MRC A1993  27.794  82.511  79.886  1.00 17.11
  C
HETATM5  C5  MRC A1993  26.268  82.360  79.965  1.00 16.74
  C
HETATM6  C6  MRC A1993  25.256  81.832  78.911  1.00 17.00
  C
HETATM7  C7  MRC A1993  23.832  81.867  79.556  1.00 17.45
  C
HETATM8  C8  MRC A1993  23.758  81.056  80.927  1.00 16.89
  C
HETATM9  C9  MRC A1993  23.820  79.467  80.419  1.00 17.84
  C
HETATM   10  C10 MRC A1993  22.833  78.610  79.550  1.00 19.48
  C
HETATM   11  C11 MRC A1993  22.999  78.593  78.193  1.00 20.56
  C
HETATM   12  C12 MRC A1993  21.733  78.839  77.305  1.00 20.86
  C
HETATM   13  C13 MRC A1993  21.779  78.052  75.821  1.00 20.74
  C
HETATM   14  C14 MRC A1993  20.323  77.662  75.537  1.00 22.44
  C
HETATM   15  C15 MRC A1993  28.456  84.523  81.348  1.00 12.97
  C
HETATM   16  C16 MRC A1993  24.899  81.634  81.814  1.00 16.07
  C
HETATM   17  C1' MRC A1993  38.561  75.401  83.188  1.00 53.39
  C
HETATM   18  O1P MRC A1993  39.367  74.705  83.841  1.00 53.58
  O
HETATM   19  O1Q MRC A1993  38.963  76.034  82.185  1.00 52.93
  O
HETATM   20  C2' MRC A1993  37.074  75.480  83.615  1.00 51.57
  C
HETATM   21  C3' MRC A1993  36.915  75.997  85.071  1.00 48.41
  C
HETATM   22  C4' MRC A1993  35.513  76.588  85.323  1.00 45.07
  C
HETATM   23  C5' MRC A1993  35.443  78.068  84.897  1.00 41.55
  C
HETATM   24  C6' MRC A1993  34.033  78.631  85.167  1.00 37.19
  C
HETATM   25  C7' MRC A1993  33.490  79.356  83.929  1.00 34.17
  C
HETATM   26  C8' MRC A1993  33.454  80.886  84.151  1.00 31.34
  C
HETATM   27  C9' MRC A1993  32.082  81.519  83.803  1.00 27.63
  C
HETATM   28  O1A MRC A1993  32.056  81.880  82.413  1.00 22.28
  O
HETATM   29  O1B MRC A1993  31.044  83.885  82.667  1.00 20.31
  O
HETATM   30  O5  MRC A1993  26.209  81.625  81.183  1.00 16.19
  O
HETATM   31  O7  MRC A1993  23.503  83.224  79.735  1.00 14.98
  O
HETATM   32  O6  MRC A1993  25.399  82.787  77.821  1.00 15.00
  O
HETATM   33  O10 MRC A1993  22.868  77.384  78.981  1.00 21.90
  O
HETATM   34  C17 MRC A1993  21.395  80.405  77.027  1.00 20.53
  C
HETATM   35  O13 MRC A1993  22.524  76.868  75.987  1.00 21.25
  O
TER
END""")

template_ligand_mol = Chem.MolFromSmiles("C[C@H](O)[C@H](C)[C@@H]1O[C@H
]1C[C@H]2CO[C@@H](C/C(C)=C/C(=O)OC(O)=O)[C@@H](O)[C@H]2O")

ligand_mol_with_bonds =
AllChem.AssignBondOrdersFromTemplate(template_ligand_mol, ligand_mol)
# [12:33:39] WARNING: More than one matching pattern found - picking one

print Chem.MolToSmiles(ligand_mol)
# CC(CC(O)OCCCCC(O)O)CC1OCC(CC2OC2C(C)C(C)O)C(O)C1O
print Chem.MolToSmiles(ligand_mol_with_bonds)
# CC(=CC(=O)OC(=O)O)CC1OCC(CC2OC2C(C)C(C)O)C(O)C1O

Any help would be greatly appreciated.

Thanks,
JP


On 13 January 2014 21:02, JP  wrote:
>
> Thanks All - I think I am in a good place now.
>
> I can get the SMILES from Paul's mmcif links and then I can use Sereina
magic three lines to do what I want.  I'd cross my fingers - but with RDKit
you don't need to.
> This

Re: [Rdkit-discuss] PDB reader and bond perception

2014-01-13 Thread JP
Thanks All - I think I am in a good place now.

I can get the SMILES from Paul's mmcif links and then I can use Sereina
magic three lines to do what I want.  I'd cross my fingers - but with RDKit
you don't need to.
This works for all Chemical Components (or what other fashionable name they
go by these days) in the PDB.

For posterity: I have found a post in the mailing list started by James
which sheds some light on this:
https://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03481.html




On 13 January 2014 19:46, sereina riniker  wrote:

> Hi JP,
>
> If you have also a SMILES of the molecule you want to read from PDB, you
> can assign the bond orders based on this template:
>
> tmp = Chem.MolFromPDBFile(yourfilename)
> template = Chem.MolFromSmiles(yoursmiles)
> mol = AllChem.AssignBondOrdersFromTemplate(template, tmp)
>
> Is this what you're looking for?
>
> Best,
> Sereina
>
>
> 2014/1/13 JP 
>
>> RDKitters!
>>
>> Finally back on the mailing list!
>>
>> I am sure we've been through this at the UGM (my mind must have wandered
>> off!), but a quick question about the PDB reader and bond perception.  Is
>> this supported with the current PDB reader?  I remember that someone
>> (PaulE, perhaps?) was saying bond perception was painful, but there was
>> some dictionary for PDB ligands which helps (any idea the name of this
>> dictionary?).
>>
>> To the technical details.
>>
>> I am reading in the following PDB file with a simple MolFromPDBFile()
>> call:
>>
>> HETATM1  O1P 84T A1862 -27.016   9.387 -72.564  1.00 20.81
>> O
>> HETATM2  P   84T A1862 -27.282   9.818 -73.968  1.00 19.65
>> P
>> HETATM3  O2P 84T A1862 -27.881  11.176 -74.182  1.00 21.49
>> O
>> HETATM4  N   84T A1862 -25.869   9.583 -74.813  1.00 19.78
>> N
>> HETATM5  C   84T A1862 -25.759  10.010 -76.075  1.00 19.97
>> C
>> HETATM6  CA  84T A1862 -24.493   9.748 -76.807  1.00 19.75
>> C
>> HETATM7  CB  84T A1862 -24.794   8.678 -77.847  1.00 19.73
>> C
>> HETATM8  CG  84T A1862 -23.571   8.324 -78.681  1.00 19.70
>> C
>> HETATM9  CD2 84T A1862 -23.309   9.519 -79.611  1.00 18.49
>> C
>> HETATM   10  CD1 84T A1862 -23.863   6.932 -79.305  1.00 18.60
>> C
>> HETATM   11  OHB 84T A1862 -25.210   7.467 -77.223  1.00 19.17
>> O
>> HETATM   12  OH  84T A1862 -23.549   9.127 -75.984  1.00 20.33
>> O
>> HETATM   13  O   84T A1862 -26.672  10.517 -76.692  1.00 20.26
>> O
>> HETATM   14  O5' 84T A1862 -28.377   8.861 -74.619  1.00 19.39
>> O
>> HETATM   15  C5' 84T A1862 -28.002   7.536 -74.954  1.00 18.47
>> C
>> HETATM   16  C4' 84T A1862 -28.909   7.000 -76.012  1.00 18.24
>> C
>> HETATM   17  C3' 84T A1862 -28.901   7.826 -77.298  1.00 18.28
>> C
>> HETATM   18  C2' 84T A1862 -30.318   7.610 -77.768  1.00 18.69
>> C
>> HETATM   19  O2' 84T A1862 -30.789   8.641 -78.581  1.00 19.64
>> O
>> HETATM   20  O4' 84T A1862 -30.262   6.951 -75.529  1.00 18.80
>> O
>> HETATM   21  C1' 84T A1862 -31.152   7.470 -76.521  1.00 19.01
>> C
>> HETATM   22  N9  84T A1862 -31.753   8.732 -76.009  1.00 20.08
>> N
>> HETATM   23  C4  84T A1862 -33.033   9.013 -76.158  1.00 21.10
>> C
>> HETATM   24  N3  84T A1862 -34.018   8.339 -76.786  1.00 21.58
>> N
>> HETATM   25  C2  84T A1862 -35.263   8.846 -76.830  1.00 21.95
>> C
>> HETATM   26  C8  84T A1862 -31.223   9.701 -75.291  1.00 20.27
>> C
>> HETATM   27  N7  84T A1862 -32.173  10.618 -75.019  1.00 21.28
>> N
>> HETATM   28  C5  84T A1862 -33.315  10.213 -75.563  1.00 21.81
>> C
>> HETATM   29  C6  84T A1862 -34.624  10.702 -75.627  1.00 22.85
>> C
>> HETATM   30  N1  84T A1862 -35.550  10.010 -76.285  1.00 22.44
>> N
>> HETATM   31  N6  84T A1862 -35.008  11.862 -75.052  1.00 23.86
>> N
>> TER
>> END
>>
>> But I am losing all the double bond (and aromatic) information:
>>
>> m = Chem.MolFromPDBFile(sys.argv[1])
>> print Chem.MolToSmiles(m)
>>
>> Gives me:
>>
>> CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1
>>
>> As usual, many thanks for your time,
>>
>> -
>> Jean-Paul Ebejer
>> Early Stage Researcher
>>
>>
>> ---

[Rdkit-discuss] PDB reader and bond perception

2014-01-13 Thread JP
RDKitters!

Finally back on the mailing list!

I am sure we've been through this at the UGM (my mind must have wandered
off!), but a quick question about the PDB reader and bond perception.  Is
this supported with the current PDB reader?  I remember that someone
(PaulE, perhaps?) was saying bond perception was painful, but there was
some dictionary for PDB ligands which helps (any idea the name of this
dictionary?).

To the technical details.

I am reading in the following PDB file with a simple MolFromPDBFile() call:

HETATM1  O1P 84T A1862 -27.016   9.387 -72.564  1.00 20.81
  O
HETATM2  P   84T A1862 -27.282   9.818 -73.968  1.00 19.65
  P
HETATM3  O2P 84T A1862 -27.881  11.176 -74.182  1.00 21.49
  O
HETATM4  N   84T A1862 -25.869   9.583 -74.813  1.00 19.78
  N
HETATM5  C   84T A1862 -25.759  10.010 -76.075  1.00 19.97
  C
HETATM6  CA  84T A1862 -24.493   9.748 -76.807  1.00 19.75
  C
HETATM7  CB  84T A1862 -24.794   8.678 -77.847  1.00 19.73
  C
HETATM8  CG  84T A1862 -23.571   8.324 -78.681  1.00 19.70
  C
HETATM9  CD2 84T A1862 -23.309   9.519 -79.611  1.00 18.49
  C
HETATM   10  CD1 84T A1862 -23.863   6.932 -79.305  1.00 18.60
  C
HETATM   11  OHB 84T A1862 -25.210   7.467 -77.223  1.00 19.17
  O
HETATM   12  OH  84T A1862 -23.549   9.127 -75.984  1.00 20.33
  O
HETATM   13  O   84T A1862 -26.672  10.517 -76.692  1.00 20.26
  O
HETATM   14  O5' 84T A1862 -28.377   8.861 -74.619  1.00 19.39
  O
HETATM   15  C5' 84T A1862 -28.002   7.536 -74.954  1.00 18.47
  C
HETATM   16  C4' 84T A1862 -28.909   7.000 -76.012  1.00 18.24
  C
HETATM   17  C3' 84T A1862 -28.901   7.826 -77.298  1.00 18.28
  C
HETATM   18  C2' 84T A1862 -30.318   7.610 -77.768  1.00 18.69
  C
HETATM   19  O2' 84T A1862 -30.789   8.641 -78.581  1.00 19.64
  O
HETATM   20  O4' 84T A1862 -30.262   6.951 -75.529  1.00 18.80
  O
HETATM   21  C1' 84T A1862 -31.152   7.470 -76.521  1.00 19.01
  C
HETATM   22  N9  84T A1862 -31.753   8.732 -76.009  1.00 20.08
  N
HETATM   23  C4  84T A1862 -33.033   9.013 -76.158  1.00 21.10
  C
HETATM   24  N3  84T A1862 -34.018   8.339 -76.786  1.00 21.58
  N
HETATM   25  C2  84T A1862 -35.263   8.846 -76.830  1.00 21.95
  C
HETATM   26  C8  84T A1862 -31.223   9.701 -75.291  1.00 20.27
  C
HETATM   27  N7  84T A1862 -32.173  10.618 -75.019  1.00 21.28
  N
HETATM   28  C5  84T A1862 -33.315  10.213 -75.563  1.00 21.81
  C
HETATM   29  C6  84T A1862 -34.624  10.702 -75.627  1.00 22.85
  C
HETATM   30  N1  84T A1862 -35.550  10.010 -76.285  1.00 22.44
  N
HETATM   31  N6  84T A1862 -35.008  11.862 -75.052  1.00 23.86
  N
TER
END

But I am losing all the double bond (and aromatic) information:

m = Chem.MolFromPDBFile(sys.argv[1])
print Chem.MolToSmiles(m)

Gives me:

CC(C)C(O)C(O)C(O)NP(O)(O)OCC1CC(O)C(N2CNC3C2NCNC3N)O1

As usual, many thanks for your time,

-
Jean-Paul Ebejer
Early Stage Researcher
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atom coordinates

2013-11-08 Thread JP
Yes, of course - try this:

conf = mol.GetConformer()
pt = conf.GetAtomPosition(0)

Cheers
JP



On 8 November 2013 12:01, Michal Krompiec  wrote:

> Hello,
> In the Python API, is it possible to read the 3D coordinates of an
> atom (from a Mol object created from an SDF file with 3D coords)?
> Thanks,
> Michal
>
>
> --
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models.
> Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and
> register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most 
from the latest Intel processors and coprocessors. See abstracts and register
http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Best Practice: Git model for code submissions to the RDkit

2013-10-24 Thread JP
Hi RDkit-Devs,

** Disclaimer: Just used git as svn till now **

What are the best practices for submitting code changes to the RDKit
codebase via git?

Right now I do the following:

0. Fork the rdkit repository (upstream)
1. Make my changes on the master
2. Send a pull request to original RDKit repo

I have local commits I do not want to send in the pull request (e.g.
.gitignore file which ignores all build files).  Also I have some
"erroneous" commits in my forked repo which I would not like to send over).

The solution probably lies in using branches - but what is the best
practice to do this? Should all commits which I want to send be in the
branch and the commits I want to keep "private" be on the master (or on
another branch).  How do you do it?

Perhaps I am thinking too much in terms of SVN.

Cheers
JP

[small note:  By mistake, I sent this email from another address to the
mailing list and I got the "Waiting for moderator approval message" ...
just pointing this out perhaps there are other messages stuck in that queue]
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] rdkit mol objects from sql

2013-10-23 Thread JP
Does the following help you george?
http://comments.gmane.org/gmane.science.chemistry.rdkit.user/860



On 23 October 2013 17:11, George Papadatos  wrote:

> Hi RDKitters,
> I must have seen this in an ipython notebook but can't find it right now:
> If I have a table of rdkit mols generated by the cartridge, is there a way
> to retrieve them using a psycopg2 connection within python - ideally inside
> a pandas dataframe?
>
> I've got this snippet:
> import pandas as pd
> import psycopg2
> conn = psycopg2.connect("port=5432 user=chembl dbname=chembl_17")
> data = pd.read_sql(sql, conn)
>
> ...but I'm missing the step where I retrieve rdkit mol objects somehow
> instead of smiles.
>
> Many thanks in advance,
> George
>
>
>
> --
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Chemistry 101 question...

2013-10-21 Thread JP
yo RDKitters,

A quick question about some chemistry found in the RDKit from a computer
scientist.

Why is the nitro ("Nitro2") group in DATA/BaseFeatures.fdef
and Contrib/M_Kossner/BaseFeatures_DIP2_NoMicrospecies.fdef specified as
LumpedHydrophobe?

# nitro groups in the RD code are always: *-[N+](=O)[O-]
DefineFeature Nitro2 [N;D3;+](=O)[O-]
  Family LumpedHydrophobe
  Weights 1.0,1.0,1.0
EndFeature

Why is this a hydrophobe (it looks polar to me)?  Or is this one of the too
many corner cases?

Also a related question, I seem to recall that at the UGM (over a delicious
sandwich) there was talk of having a more complete fdef file out-of-the-box
- is this still on?

Many Thanks Folks,
JP
--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New MMFF-enabled RDKit branch

2013-09-23 Thread JP
On 21 September 2013 13:55, Greg Landrum  wrote:

> Something JP is particularly going to like is that Paolo also added the
> out-of-plane term to the RDKit UFF implementation.
>
> This means that this bug:
> https://github.com/rdkit/rdkit/issues/62
> which was formerly this bug:
> http://sourceforge.net/p/rdkit/bugs/205/
> can finally be closed. :-)
>


With this and the MMFF implementation, when you are going to ask what
functionality is lacking in RDKit at this
UGM<https://rdkitugm2.eventbrite.co.uk/>in Cambridge, there is going
to be a quiet two minutes (until someone
remembers the PDB parser.  Someone always remembers the PDB parser :) ).

Well done, Paolo - beer is on me at the
pub<http://en.wikipedia.org/wiki/The_Eagle_(pub)>
.
--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] New MMFF-enabled RDKit branch

2013-08-23 Thread JP
Well done Paolo!  During last year's UGM, when Greg asked for an RDKit
wishlist from the audience, this functionality was suggested by more than
one person.

Thanks!
JP


On 23 August 2013 04:28, Greg Landrum  wrote:

> Thanks Paolo!
>
> I think it will be great for the RDKit to finally have a better force
> field. UFF is certainly better than nothing, but having access to MMFF94 is
> a big step forward.
>
> For the list: once Paolo has decreased the size of the testing data (the
> original files total to >30MB) this will be available on the trunk. It
> should certainly be available in the next release.
>
> Best,
> -greg
>
>
>
>
> On Thu, Aug 22, 2013 at 10:27 PM, Paolo Tosco wrote:
>
>> Dear all,
>>
>> a new RDKit branch, labelled "ptosco-MMFF", is available for download.
>> It provides a full implementation of the MMFF force field, which can be
>> accessed through a C++ API rather similar to the one currently available
>> for the UFF force field; Python wrappers will follow very soon. The
>> implementation was validated against the official MMFF validation suite
>> (http://www.ccl.net/chemistry/resources/data/index.shtml) using a test
>> program included in the new branch
>> (Code/ForceField/MMFF/testMMFFForceField.cpp); all tests were passed for
>> both MMFF94 and MMFF94s variants.
>>
>> Kind regards,
>> Paolo
>>
>> --
>> ==
>> Paolo Tosco, Ph.D.
>> Department of Drug Science and Technology
>> Via Pietro Giuria, 9 - 10125 Torino (Italy)
>> Tel: +39 011 670 7680 | Mob: +39 348 5537206
>> Fax: +39 011 670 7687 | E-mail: paolo.to...@unito.it
>> http://open3dqsar.org | http://open3dalign.org
>> ==
>>
>>
>>
>> --
>> Introducing Performance Central, a new site from SourceForge and
>> AppDynamics. Performance Central is your source for news, insights,
>> analysis and resources for efficient Application Performance Management.
>> Visit us today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
>
> --
> Introducing Performance Central, a new site from SourceForge and
> AppDynamics. Performance Central is your source for news, insights,
> analysis and resources for efficient Application Performance Management.
> Visit us today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] How to specify similarity threshold for single query?

2013-07-11 Thread JP
I may have an idea for a workaround, and not a solution ...

Why not try to set the parameter rdkit.tanimoto_threshold in a stored
procedure and call the stored procedure from python?  Instead of trying set
xxx=yyy on the cursor directly.

Just an idea,
JP




On 11 July 2013 16:52, Michał Nowotka  wrote:

> Is is possible to specify similarity threshold for single query instead of
> relying on global "rdkit.tanimoto_threshold" which I'm having problems with?
>
> It would be much more intuitive to do so, otherwise I need to surround my
> similarity search SQL with statements changing global threshold and
> restoring it afterwards.
>
> Regards,
>
> Michał Nowotka
>
>
> --
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] A question of GitHub Issue tracking for RDKit

2013-07-04 Thread JP
yo folks,

I noticed some of entries in the sourceforge bug list are not in the github
list, e.g. my pet one http://sourceforge.net/p/rdkit/bugs/205/

Is there a specific reason for this?  Is it OK if I move the bugs which I
am interested in to GitHub?
Also, is there a 'search' function in the issue list in github?

Many Thanks,
JP
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDkit user beginner.

2013-07-04 Thread JP
On 4 July 2013 08:16,  wrote:

> from rdkit import Chem
> from rdkit.Chem import MACCSkeys
>
>
> nat = Chem.SDMolSupplier("nat.sdf")
> fps = [MACCSkeys.GenMACCSkeys(x) for x in nat]
> fp = DataStructs.FingerprintSimilarity(fps[0], fps[89])
> print fp
>

Welcome to the RDKit mailing lists!

As usual, the devil is in the detail.  You are using a lower case 'k' for
keys instead of an upper case one (i.e. should be 'MACCSkeys.GenMACCSKeys'
and not 'MACCSkeys.GenMACCSkeys').  Python being case-sensitive it barfs on
things like these.

This works:

from rdkit import Chem
from rdkit.Chem import MACCSkeys
from rdkit import DataStructs

fps = [MACCSkeys.GenMACCSKeys(x) for x in nat]
DataStructs.FingerprintSimilarity(fps[0],fps[1])


As a newbie there is a fantastic resource Greg put together, The RDKit
Bible - http://www.rdkit.org/RDKit_Docs.current.pdf - what you want is on
page 29

Cheers,
JP
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aromatic nitrogens

2013-06-25 Thread JP
That is interesting, my RDKit 2012_12_1 works - which version of RDKit are
you using?

Also, it is always better to paste minimal code snippets if you need help.
 It is hard to figure out what is wrong otherwise.

What happens when you copy this line in python (if you have access to a
python interpreter)?  MolFromSmiles does sanitization - so this would fail
if molecule is not valid.

>>> m = Chem.MolFromSmiles('N[C@@H](Cc1c[nH]cn1)C(O)=O')
>>> print m


And a non working mol:

>>> Chem.MolFromSmiles('O=C(O)[C@@H](N)Cc1cncn1')
[17:09:48] Can't kekulize mol




On 25 June 2013 17:47, Igor Filippov  wrote:

> I'm getting an exception at sanitizeMol - "can't kekulize" with this
> SMILES (and many many others) :(
>
> Thank you,
> Igor
>
>
> On Tue, Jun 25, 2013 at 12:14 PM, JP  wrote:
>
>>
>> On 25 June 2013 17:00, Igor Filippov  wrote:
>>
>>> Histidine
>>
>>
>> How about: N[C@@H](Cc1c[nH]cn1)C(O)=O
>>
>> >>> Chem.MolFromSmiles('N[C@@H](Cc1c[nH]cn1)C(O)=O')
>> 
>>
>>
>>
>
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] aromatic nitrogens

2013-06-25 Thread JP
On 25 June 2013 17:00, Igor Filippov  wrote:

> Histidine


How about: N[C@@H](Cc1c[nH]cn1)C(O)=O

>>> Chem.MolFromSmiles('N[C@@H](Cc1c[nH]cn1)C(O)=O')

--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] "Could not embed molecule." (The Anthony Conundrum)

2013-06-21 Thread JP
On 21 June 2013 10:10, Greg Landrum  wrote:

> Do you mind doing a bug report on github for this?


Not at all.  Done.  I cannot assign labels or milestones to it - I assume
this is on purpose, so you can organize the issues list yourself (mind you,
this is a good idea to have you assigning those labels).
https://github.com/rdkit/rdkit/issues/55

Thanks for your explanation Greg.

-
Jean-Paul Ebejer
Early Stage Researcher
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] "Could not embed molecule." (The Anthony Conundrum)

2013-06-19 Thread JP
Dearest RDKitters,

I am trying to help a friend of mine, with an RDKit issue (using the latest
RDKit) and I am surprised by some output we are getting.  Perhaps someone
here has an explanation.

import rdkit
from rdkit import Chem
from rdkit.Chem import AllChem

core = Chem.MolFromSmiles('c1cncs1') # first molecule
print AllChem.EmbedMolecule(core)
AllChem.UFFOptimizeMolecule(core)

Chem.MolToMolBlock(core) # we have some coordinates
print ""
mol = Chem.MolFromSmiles('C(=O)(O)c1cncs1')

AllChem.ConstrainedEmbed(mol, core, randomseed=123)
Chem.MolToMolBlock(mol)

We get the following error:

---
ValueErrorTraceback (most recent call last)
/home/jp/ in ()
 11 mol = Chem.MolFromSmiles('C(=O)(O)c1cncs1')
 12
---> 13 AllChem.ConstrainedEmbed(mol, core, randomseed=123)
 14 Chem.MolToMolBlock(mol)

/opt/RDKit_2012_12_1/rdkit/Chem/AllChem.pyc in ConstrainedEmbed(mol, core,
useTethers, coreConfId, randomseed)
295   ci = EmbedMolecule(mol,coordMap=coordMap,randomSeed=randomseed)
296   if ci<0:
--> 297 raise ValueError,'Could not embed molecule.'
298
299   algMap=[(j,i) for i,j in enumerate(match)]

ValueError: Could not embed molecule.

0


*(0) Why does this happen?*  We have coordinates in core and the match
between core and mol is obvious.

(1) If we change the aromatic s to o, the code works and we get coorindates
for mol - but these coordinates do not match exactly core.  Why is this?

>>> Chem.MolToMolBlock(mol)
 RDKit  3D

  8  8  0  0  0  0  0  0  0  0999 V2000
   -2.49270.70250.1023 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.82501.88750.5227 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.5158   -0.2123   -0.3041 O   0  0  0  0  0  0  0  0  0  0  0  0
   -1.08220.31300.0474 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6171   -0.9082   -0.3781 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7280   -0.8842   -0.2830 N   0  0  0  0  0  0  0  0  0  0  0  0
1.03220.34820.1959 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.06081.13120.4178 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
  1  3  1  0
  1  4  1  0
  4  5  2  0
  5  6  1  0
  6  7  2  0
  7  8  1  0
  8  4  1  0
M  END
>>> Chem.MolToMolBlock(core) # we have some coordinates
 RDKit  3D

  5  5  0  0  0  0  0  0  0  0999 V2000
   -1.08800.31450.0476 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.6149   -0.9044   -0.3765 C   0  0  0  0  0  0  0  0  0  0  0  0
0.7299   -0.8840   -0.2828 N   0  0  0  0  0  0  0  0  0  0  0  0
1.03360.34700.1955 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.06061.12690.4162 O   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  1  1  0
M  END
>>>

Can you help explain the mystery please?

Many Thanks
JP
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Counting neighboring atoms including H (both implicit and explicit)

2013-06-18 Thread JP
Can you post some of the molecules generating the precondition violation
error?  It is easier to see what is wrong with a practical example.  I'd
imagine this should work on *all* atom instances which come from a
sanitized molecule.

GetNeighbours() will not return implicit or explicit Hydrogen atoms - so
you cannot really use it to count neighbours (you could if you calculated
the valency of the atom, number of non H neighbour connected to it etc.)

GetTotalDegree() will give you the number of neighbouring atoms (including
implicit and explicit Hs)

Try these simple examples:

m = Chem.MolFromSmiles('C')
for a in m.GetAtoms():
print a.GetSymbol()
for b in a.GetNeighbors():
print b   # never prints
print a.GetTotalDegree()

m = Chem.MolFromSmiles('[CH4]')
for a in m.GetAtoms():
print a.GetSymbol()
for b in a.GetNeighbors():
print b # never prints
print a.GetTotalDegree()




On 18 June 2013 15:54, Syeda Sabrina  wrote:

> Thanks a lot JP. So the number of  neighbors for an atom, does not include
> only the the directly connected atoms while the GetNeightbors will return
> atoms directly connected to the atom of interest, right? I also encountered
> some precondition violation while trying GetTotalDegree on some molecules.
> Is there any restriction of this function that it cannot operate on some
> certain type of molecule?
>
> *Syeda Sabrina*
> *Graduate Assistant*
> *Department of Chemical Engineering, Penn State University*
> *University Park, PA*
> *
> *
>
>
> On Tue, Jun 18, 2013 at 5:08 AM, JP  wrote:
>
>> Before Greg intervenes which a much more
>> compact/efficient/readable/portable/etc. answer... :-)
>>
>> GetTotalDegree returns the number (including Hs) of neighbours of an
>> atom.  GetNeighbours gets the actual atoms list (does not include H).
>>
>> import rdkit
>> from rdkit import Chem
>>
>> m = Chem.MolFromSmiles('CCO')
>> for a in m.GetAtoms():
>> print "Atom %s has %d neighboUring atoms." % (a.GetSymbol(),
>> a.GetTotalDegree())
>>
>>
>>
>
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Counting neighboring atoms including H (both implicit and explicit)

2013-06-18 Thread JP
Before Greg intervenes which a much more
compact/efficient/readable/portable/etc. answer... :-)

GetTotalDegree returns the number (including Hs) of neighbours of an atom.
 GetNeighbours gets the actual atoms list (does not include H).

import rdkit
from rdkit import Chem

m = Chem.MolFromSmiles('CCO')
for a in m.GetAtoms():
print "Atom %s has %d neighboUring atoms." % (a.GetSymbol(),
a.GetTotalDegree())
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] install error

2013-06-05 Thread JP
,

Before running cmake - have you run ./External/INCHI-API/download-inchi.sh ?




On 5 June 2013 04:13, Yingfeng Wang  wrote:

> After getting the latest code by git, I install RDKit on my ubuntu 12.04.
>
> In the step of make install, I got
>
> CMake Error at External/INCHI-API/cmake_install.cmake:124 (FILE):
>
> It seems the file $RDBASE/lib/libRDInchiLib.so.1.2013.06.1pre can't be
> found. Please note that I have turned on the flag of inchi.
>
> Thanks.
>
> Yingfeng
>
>
> --
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] sudo apt-get install python-rdkit not working

2013-05-15 Thread JP
oopppsss sorry, you are right ...


On 15 May 2013 10:33, Greg Landrum  wrote:

>
> Why does it need an update? The RDKit is available in ubuntu in revs 12.04
> and later. It's just not always the most up-to-date RDKit (debian packages
> are rarely the most up-to-date)
>
> -greg
>
>
> On Wed, May 15, 2013 at 10:58 AM, JP  wrote:
>
>> Merci Greg,
>>
>> This needs an update then:
>> https://github.com/rdkit/rdkit/blob/master/Docs/Book/Install.rst
>>
>> "Ubuntu 12.04 and later"
>>
>>
>> On 15 May 2013 04:49, Greg Landrum  wrote:
>>
>>> Hi JP,
>>>
>>>
>>> On Tue, May 14, 2013 at 3:43 PM, JP  wrote:
>>>
>>>> On:
>>>>
>>>> jpebe@ned:~/dphil/ligity_vs_es_tests$ lsb_release -a
>>>> No LSB modules are available.
>>>> Distributor ID: Ubuntu
>>>> Description: Ubuntu 12.04.2 LTS
>>>> Release: 12.04
>>>> Codename: precise
>>>>
>>>> Fresh rdkit install via:
>>>>
>>>> sudo apt-get install python-rdkit librdkit1 rdkit-data
>>>>
>>>> Gives:
>>>>
>>>> >>> import rdkit
>>>> >>> from rdkit import Chem
>>>> >>> s = Chem.ForwardSDMolSupplier('test.sdf')
>>>> Traceback (most recent call last):
>>>>   File "", line 1, in 
>>>> AttributeError: 'module' object has no attribute 'ForwardSDMolSupplier'
>>>> >>>
>>>>
>>>> Any ideas?
>>>>
>>>
>>> The version of the RDKit that is available via deb in ubuntu 12.04 is
>>> v2011.06. The ForwardSDMolSupplier was not added to the python wrappers
>>> until v2011.12.1. ubuntu 12.10 includes v2012.03, so it should have
>>> ForwardSDMolSupplier.
>>>
>>> I don't believe that there's a backport version of the v2012.03 package
>>> available, so, unfortunately, on the current LTS version of Ubuntu (12.04),
>>> you will need to install from source to get the functionality.
>>>
>>> -greg
>>>
>>>
>>>
>>
>>
>
--
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] sudo apt-get install python-rdkit not working

2013-05-15 Thread JP
Merci Greg,

This needs an update then:
https://github.com/rdkit/rdkit/blob/master/Docs/Book/Install.rst

"Ubuntu 12.04 and later"


On 15 May 2013 04:49, Greg Landrum  wrote:

> Hi JP,
>
>
> On Tue, May 14, 2013 at 3:43 PM, JP  wrote:
>
>> On:
>>
>> jpebe@ned:~/dphil/ligity_vs_es_tests$ lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description: Ubuntu 12.04.2 LTS
>> Release: 12.04
>> Codename: precise
>>
>> Fresh rdkit install via:
>>
>> sudo apt-get install python-rdkit librdkit1 rdkit-data
>>
>> Gives:
>>
>> >>> import rdkit
>> >>> from rdkit import Chem
>> >>> s = Chem.ForwardSDMolSupplier('test.sdf')
>> Traceback (most recent call last):
>>   File "", line 1, in 
>> AttributeError: 'module' object has no attribute 'ForwardSDMolSupplier'
>> >>>
>>
>> Any ideas?
>>
>
> The version of the RDKit that is available via deb in ubuntu 12.04 is
> v2011.06. The ForwardSDMolSupplier was not added to the python wrappers
> until v2011.12.1. ubuntu 12.10 includes v2012.03, so it should have
> ForwardSDMolSupplier.
>
> I don't believe that there's a backport version of the v2012.03 package
> available, so, unfortunately, on the current LTS version of Ubuntu (12.04),
> you will need to install from source to get the functionality.
>
> -greg
>
>
>
--
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] sudo apt-get install python-rdkit not working

2013-05-14 Thread JP
On:

jpebe@ned:~/dphil/ligity_vs_es_tests$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise

Fresh rdkit install via:

sudo apt-get install python-rdkit librdkit1 rdkit-data

Gives:

>>> import rdkit
>>> from rdkit import Chem
>>> s = Chem.ForwardSDMolSupplier('test.sdf')
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'ForwardSDMolSupplier'
>>>

Any ideas?
--
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Building rdkit on Ubuntu 12.10

2013-04-26 Thread JP
Trying to be helpful...

A few weeks ago I wrote a blog entry on how to install RDKit on Ubuntu
(tested on 12.04, 12.10):
http://blopig.com/blog/?p=315





-
Jean-Paul Ebejer
Early Stage Researcher


On 26 April 2013 03:39, Paul Emsley  wrote:

> On 25/04/13 23:43, hari jayaram wrote:
> > Hi
> > I did a
> > export RDBASE=/home/hari/RDKit_2012_09_1
> > export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$RDBASE/lib
> >
> > Then cd into the build directory
> > Run cmake ..
> > Then run make
> >
> > At around 24% I get the following error ( see below)
> >
> > I installed the Ubuntu blessed libboost-python1.49-dev
> >
> > Any ideas how to get around this. On a related noted the Ubuntu
> > synaptic package repository did have a rdkit library but it does not
> > work and complains
> >
> > >>> from rdkit import Chem
> > Traceback (most recent call last):
> > File "", line 1, in 
> > File "/usr/local/lib/python2.7/dist-packages/rdkit/Chem/__init__.py",
> > line 18, in 
> > from rdkit import rdBase
> > ImportError: cannot import name rdBase
>
> Hmm... AFAIR, synaptic packages should not use files in /usr/local. Do
> you still get the same problem if you have not defined PYTHONPATH,
> PYTHONHOME or LD_LIBRARY_PATH?
>
> >
> >
> >
> > Linking CXX static library libCatalogs_static.a
> > [ 24%] Built target Catalogs_static
> > Scanning dependencies of target GraphMol
> > [ 25%] Building CXX object
> > Code/GraphMol/CMakeFiles/GraphMol.dir/Atom.cpp.o
> > [ 25%] Building CXX object
> > Code/GraphMol/CMakeFiles/GraphMol.dir/QueryAtom.cpp.o
> > In file included from
> > /usr/local/include/boost/thread/detail/platform.hpp:17:0,
> > from /usr/local/include/boost/thread/mutex.hpp:12,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryOps.h:20,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.h:15,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.cpp:11:
> > /usr/local/include/boost/config/requires_threads.hpp:29:4: error:
> > #error "Threading support unavaliable: it has been explicitly disabled
> > with BOOST_DISABLE_THREADS"
> > In file included from /usr/local/include/boost/thread/mutex.hpp:12:0,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryOps.h:20,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.h:15,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.cpp:11:
> > /usr/local/include/boost/thread/detail/platform.hpp:67:9: error:
> > #error "Sorry, no boost threads are available for this platform."
> > In file included from
> > /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryOps.h:20:0,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.h:15,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.cpp:11:
> > /usr/local/include/boost/thread/mutex.hpp:18:2: error: #error "Boost
> > threads unavailable on this platform"
> > In file included from
> > /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.h:15:0,
> > from /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryAtom.cpp:11:
> > /home/hari/RDKit_2012_09_1/Code/GraphMol/QueryOps.h:313:5: error:
> > ‘mutex’ in namespace ‘boost’ does not name a type
> > make[2]: *** [Code/GraphMol/CMakeFiles/GraphMol.dir/QueryAtom.cpp.o]
> > Error 1
> > make[1]: *** [Code/GraphMol/CMakeFiles/GraphMol.dir/all] Error 2
> > make: *** [all] Error 2
> >
> >
>
>
> There are 2 issues here:
>
> 1) "Threading support unavaliable: it has been explicitly disabled with
> BOOST_DISABLE_THREADS" (Someone at Boost Central can't spell :-) I doubt
> that compiling Boost without threads was a good idea.
>
> 2) and as this the above is unusual, this error:
> "/home/hari/RDKit_2012_09_1/Code/GraphMol/QueryOps.h:313:5: error:
> ‘mutex’ in namespace ‘boost’ does not name a type" was not trapped
> before. Ideally QueryOps (or anything else) should not use boost mutex
> if threads have been disabled (this might be a pain to code up).
>
>
> HTH,
>
> Paul.
>
>
>
>
>
> --
> Try New Relic Now & We'll Send You this Cool Shirt
> New Relic is the only SaaS-based application performance monitoring service
> that delivers powerful full stack analytics. Optimize and monitor your
> browser, app, & servers with just a few lines of code. Try New Relic
> and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Rdkit-discuss mailing list
Rdkit-discuss

[Rdkit-discuss] Coding convention/style?

2013-04-24 Thread JP
A soft question, RDKitters.

Is there an official coding convention/style when contributing to RDKit?
 Just wondering.  Of course, it is easy to copy whatever is in
https://github.com/rdkit/rdkit/tree/master/Code (but one hopes to pick some
of the beautiful looking code!)

And just to confirm, doxygen is used for API documentation - and an example
of the style can be extracted from
https://github.com/rdkit/rdkit/blob/master/Code/GraphMol/ROMol.h , right?
 (How is the doxygen documentation generated?  What is the command like?)

Also, can you explain the bit of magic which synchronizes svn and git
repositories?  Is it bidirectional (e.g. will pull requests etc work)?

Many Thanks,

-
Jean-Paul Ebejer
Early Stage Researcher
--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Feature definition of cations and anions ... a question of semantics.

2013-04-15 Thread JP
As a followup to this question - further questions/problems :)

   - Why is AmideN (and SulfonamideN) defined in the BaseFeatures.fdef ?
(I cannot understand how/where these two definitions are used).
   - One of the H Bond Donor definitions is AtomType NDonor
   [$([Nv3](-C)(-C)-C)] :- but if a Nv3 is connected to 3 C - then there are
   no hydrogens.  How is this a donor?  The v3 according to daylight
means "atom
   with bond orders totaling 3 (includes implicit H's)"
   - I had a look around and thought the
   Contrib/M_Kossner/BaseFeatures_DIP2_NoMicrospecies.fdef file looked more
   complete in terms of definition.  Unfortunately this file does not load
   with BuildFeatureFactory (ValueError).  Anyone knows the history of that
   file?  Or why it came to being?

FACTORY =
ChemicalFeatures.BuildFeatureFactory("/opt/RDKit_2012_12_1/Contrib/M_Kossner/BaseFeatures_DIP2_NoMicrospecies.fdef")

ValueError:  pattern->getNumAtoms() != len(feature weight vector)

Many thanks and sorry for the repeated emails,
JP

-
Jean-Paul Ebejer
Early Stage Researcher


On 15 April 2013 17:02, JP  wrote:

> Hi there RDKitters,
>
> I was wondering if there is any reason why the feature factory detects
> NegIonizable (or PosIonizable) as a feature - but not the actual charges
> i.e. Anion (or cation).
>
> If you are doing feature extraction, to build pharmacophoric models, this
> electrostatics data is important.  The SMARTS patterns i.e. [+] and [-] and
> subsequent fdef definition are trivial, so why aren't these used?
>
> What am I missing?
>
> # some code, because I am rambling
>
> import rdkit
>
> from rdkit import RDConfig
> from rdkit import Chem
> from rdkit.Chem import ChemicalFeatures
> from rdkit.Chem import AllChem
>
> fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
> factory = ChemicalFeatures.BuildFeatureFactory(fdefName)
>
> def testMol(molTxt):
> m = Chem.MolFromSmiles(molTxt)
> feats=factory.GetFeaturesForMol(m)
> print [x.GetFamily() for x in feats]
>
> testMol('C(=O)O')
> testMol('[C-]')
> testMol('[Br-]')
> testMol('[Na+]')
>
> Output (this ipy notebook is awesome):
>
> ['Donor', 'Acceptor', 'Acceptor', 'NegIonizable']
> []
> []
> []
>
>
>
> Many thanks,
>
>
> -
> Jean-Paul Ebejer
> Early Stage Researcher
>
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Feature definition of cations and anions ... a question of semantics.

2013-04-15 Thread JP
Hi there RDKitters,

I was wondering if there is any reason why the feature factory detects
NegIonizable (or PosIonizable) as a feature - but not the actual charges
i.e. Anion (or cation).

If you are doing feature extraction, to build pharmacophoric models, this
electrostatics data is important.  The SMARTS patterns i.e. [+] and [-] and
subsequent fdef definition are trivial, so why aren't these used?

What am I missing?

# some code, because I am rambling

import rdkit

from rdkit import RDConfig
from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit.Chem import AllChem

fdefName = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
factory = ChemicalFeatures.BuildFeatureFactory(fdefName)

def testMol(molTxt):
m = Chem.MolFromSmiles(molTxt)
feats=factory.GetFeaturesForMol(m)
print [x.GetFamily() for x in feats]

testMol('C(=O)O')
testMol('[C-]')
testMol('[Br-]')
testMol('[Na+]')

Output (this ipy notebook is awesome):


['Donor', 'Acceptor', 'Acceptor', 'NegIonizable']
[]
[]
[]



Many thanks,


-
Jean-Paul Ebejer
Early Stage Researcher
--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] looking for suggestions: github vs bitbucket vs google code

2013-02-02 Thread JP
I have used cvs and later svn, and never really have needed DVCS.  Still, I
have projects in both GitHub and BitBucket using Git (I never got around
using mercurial - so I don't know which is better).  CVS was quirky and
buggy, but I never had any problems with SVN which is straightforward and
easy to work with.  Migrating to git, on the other hand, and as has been
previously mentioned took some effort - but this is mostly because I come
from the simplicity of SVN.  The trick is thinking in terms of your local
repo.  Some time ago I made some research on github vs bitbucket and at the
time we found that bitbucket was inferior as a product but we ended up
going for that because of the private repositories.  After the UI rewrite
of Bitbucket in 2012 - I find the two mostly equivalent even if the
bitbucket wiki is pretty unsophisticated (so is GutHubs but at least I have
never lost pages in the GitHub one).  Greg feels uneasy because of the
trend factor - but both GIT (isn't it used for the Linux kernel?) and
github are well proven projects with a community the size of St. Petersburg.

Either way you go (git vs mercurial, github vs bitbucket) I will be the
first one to celebrate the fact that RDKit is moving away from sourceforge
and its crappy, lousy, 2001 look and feel, unresponsive, Ad-ridden,
unusable interface.  For the talk I gave at the user meeting I had to count
the number of posts I created, and I ended up counting them manually in the
archives as there was no obvious way how to do it.  To this day that
interface makes me cringe.

p.s.  I agree with Eddie, +1 for "Git changes the way you think, the same
thing learning functional programming would do to you."

-
Jean-Paul Ebejer
Early Stage Researcher


On 2 February 2013 07:00, Eddie Cao  wrote:

> Hi,
>
> As a switcher, I feel I should share my experience.
>
> I am never a power user of any VCS, but I've used RCS, CVS, Subversion,
> Mercurial and Git, and my level is always best characterized as *barely
> enough to get work done*. I chose Mercurial instead of Git during my
> first encounter with the concept of DVCS, mostly because the belief that
> "they two are pretty much the same", and also, because I am a Python
> person, choosing Mercurial seems like a loyal thing to do. I sticked with
> Mercurial for over three years, and resisted the hype around Git and
> Github. My understanding of Git stayed at the level of believing "rebase is
> evil, and hg is safer", "staging area solves a problem I don't have", and
> "mercurial can do that too, with these extensions".
>
> This was until my wife started to use Git for work, and rave about it. So
> I checked it out.
>
> And I switched. Not because there are things that are inherently
> impossible in Mercurial, but there is a culture component of Git that
> emerges around an open door design (which Python does too and proudly
> labels it as *a language for adults*).
>
> For example, Git infuses this attitude into you that commit quality is as
> important as your code quality, and Git is optimized to make beautiful
> commits. With Git, you tend to *compose* and *edit* commits carefully as
> you would write beautiful and elegant code. Have uncommitted changes in one
> file deal with two irrelevant bugs? Easily make two separate commits by
> picking lines to commit. Have uncommitted changes but there is an emergent
> bug to fix? Avoid a half-baked commit by simply stashing your changes and
> reverse it when you are done with the bugfix. Some seems to fear the power
> to rewrite history, but it is a very powerful tool. Have a commit that just
> corrects a typo? You can combine it with an earlier commit. Regret that the
> summary of a previous commit is not clear enough? You can edit that
> message.
>
> The learning curve is absolutely steeper for Git for people with prior
> knowledge of other VCS. This is mostly because Linus' vision about Git is a
> file system on top of a file system 
> and
> he did not try to emulate existing VCS systems. However, if you want to
> support real distributed workflow, I would argue Git and Mercurial require
> the same amount of learning, as you could see in the comparison in PEP 
> 374. But
> if you care about commit history as much as you care about code quality -
> and I believe commit history is essential in distributed collaborative
> workflow enabled by a DVCS - then you will appreciate the Git workflow and
> the Git way.
>
> And bonus: Git changes the way you think, the same thing learning
> functional programming would do to you.
>
> Regards,
> Eddie
>
>
> On Feb 1, 2013, at 10:17 AM, Patrick Fuller wrote:
>
> Seconding Markus - My biggest issue switching from svn to git was honestly
> the word "checkout". It means two different things between them, and I
> found myself doing stupid things all the time. Outside of that, and the
> weird "staging area" thing I never got a

Re: [Rdkit-discuss] Depicting R groups follow-up

2013-02-01 Thread JP
On 1 February 2013 04:04, Greg Landrum  wrote:

> Short answer: it's a bug. I will fix it, but in the meantime you can
> work around it by passing the "kekulize=False" argument to the drawing
> function (you should probably kekulize the molecule yourself first if
> you want to see single and double bonds instead of the ugly dashed
> aromatic bonds).
>


Greg,

Thanks for the UpdatePropertyCache hint, I should know better after 2.5
years of using this toolkit.

Are you sure about the workaround you suggest above?

m = Chem.MolFromSmiles('CCN')
for a in m.GetAtoms():
 a.SetProp("dummyLabel", "R" + str(a.GetIdx()))
Draw.MolToFile(m, 'test.png', kekulize=False)

Still does not show me the dummy labels ... (using RDKit 2012 12 1)

-
Jean-Paul Ebejer
Early Stage Researcher
<>--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Depicting R groups follow-up

2013-02-01 Thread JP
I try setting the "kekulize=False" argument in this way:

print Chem.MolToSmiles(core)
core.Debug()
# need kekulization before passing this to the drawing algorithm
Chem.Kekulize(core)
for r_id in rGroups.keys(): # rGoups is just a list of a few atoms indices
I want to label
  core.GetAtomWithIdx(int(r_id)).SetProp("dummyLabel", "R" + r_id)
Draw.MolToFile(core, core_image, size=(600, 200), kekulize=False)

But I get a ton of these when calling the debug and draw function:



N/A imp: -1 hyb: 0 arom?: 0 chi: 0
29 8 O chg: 0  deg: 1 exp: [09:59:59]


Pre-condition Violation
getExplicitValence() called without call to calcExplicitValence()
Violation occurred on line 177 in file
/opt/RDKit_2012_12_1/Code/GraphMol/Atom.cpp
Failed Expression: d_explicitValence>-1



The core molecule I get using: core =
Chem.MolFromSmarts(MCS.FindMCS(mols).smarts)

This is the SMILES for my
molecule CCC(C)C(NC(C)=O)C(=O)NC(C(=O)NC(C)C(=O)NC(C)CC(N)=O)C(C)O.

And this is the Debug() call on it (this also generates the above errors):

N/A imp: -1 hyb: 0 arom?: 0 chi: 0
Bonds:
0 0->1 order: 1 conj?: 0 aromatic?: 0
 1 1->2 order: 2 conj?: 0 aromatic?: 0
2 1->3 order: 1 conj?: 0 aromatic?: 0
3 3->4 order: 1 conj?: 0 aromatic?: 0
 4 5->6 order: 1 conj?: 0 aromatic?: 0
5 5->7 order: 1 conj?: 0 aromatic?: 0
6 7->8 order: 1 conj?: 0 aromatic?: 0
 7 4->5 order: 1 conj?: 0 aromatic?: 0
8 4->9 order: 1 conj?: 0 aromatic?: 0
9 10->11 order: 1 conj?: 0 aromatic?: 0
 10 12->13 order: 1 conj?: 0 aromatic?: 0
11 12->14 order: 1 conj?: 0 aromatic?: 0
12 11->12 order: 1 conj?: 0 aromatic?: 0
 13 11->15 order: 1 conj?: 0 aromatic?: 0
14 15->16 order: 2 conj?: 0 aromatic?: 0
15 15->17 order: 1 conj?: 0 aromatic?: 0
 16 17->18 order: 1 conj?: 0 aromatic?: 0
17 18->19 order: 1 conj?: 0 aromatic?: 0
18 18->20 order: 1 conj?: 0 aromatic?: 0
 19 21->22 order: 1 conj?: 0 aromatic?: 0
20 22->23 order: 1 conj?: 0 aromatic?: 0
21 22->24 order: 1 conj?: 0 aromatic?: 0
 22 24->25 order: 1 conj?: 0 aromatic?: 0
23 25->26 order: 1 conj?: 0 aromatic?: 0
24 25->27 order: 2 conj?: 0 aromatic?: 0
 25 20->21 order: 1 conj?: 0 aromatic?: 0
26 20->28 order: 2 conj?: 0 aromatic?: 0
27 9->10 order: 1 conj?: 0 aromatic?: 0
 28 9->29 order: 2 conj?: 0 aromatic?: 0

I am quite lost, anyone knows what this actually means - any help/idea what
I am doing wrong?

Many Thanks,
JP

-
Jean-Paul Ebejer
Early Stage Researcher


On 1 February 2013 04:04, Greg Landrum  wrote:

> On Thu, Jan 31, 2013 at 4:50 PM, JP  wrote:
> > I am trying to depict R groups using a labels like "R1", "R2" etc.
> >
> > From a previous discussion:
> >
> http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01793.html
> >
> > "Here's what's going on currently:
> >
> > By default the rendering code uses atom.GetSymbol() to determine what
> should
> > show up in the drawing. atom.GetSymbol() using the atomic number, unless
> the
> > atom has the property "dummyLabel" set. If that property is set, it's
> used.
> > It should also be checking for the property "_MolFileRLabel"."
> >
> >
> > I assumed that setting dummyLabel should be enough and this would work:
> >
> > m = Chem.MolFromSmiles('CCN')
> > for a in m.GetAtoms():
> > a.SetProp("dummyLabel", "R" + str(a.GetIdx()))
> > Draw.MolToFile(m, 'test.png')
> >
> > Any idea why I do not get my R labels?
>
> Short answer: it's a bug. I will fix it, but in the meantime you can
> work around it by passing the "kekulize=False" argument to the drawing
> function (you should probably kekulize the molecule yourself first if
> you want to see single and double bonds instead of the ugly dashed
> aromatic bonds).
>
> Somewhat longer answer:
> Doing the SetProp actually does work, you can convince yourself of
> this like this:
>
> In [9]: m = Chem.MolFromSmiles('*C')
>
> In [10]: m.Debug()
> Atoms:
> 0 0 * chg: 0  deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0
> 1 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: 4 arom?: 0 chi: 0
> Bonds:
> 0 0->1 order: 1 conj?: 0 aromatic?: 0
>
> In [11]: m.GetAtomWithIdx(0).SetProp("dummyLabel","R")
>
> In [12]: m.Debug()
> Atoms:
> 0 0 R chg: 0  deg: 1 exp: 1 imp: 0 hyb: 0 arom?: 0 chi: 0
> 1 6 C chg: 0  deg: 1 exp: 1 imp: 3 hyb: 4 arom?: 0 chi: 0
> Bonds:
> 0 0->1 order: 1 conj?: 0 aromatic?: 0
>
>
> Unfortunately that information does not survive when the molecule is
> copied:
>
>
> In [13]: m2=  Chem.Mol(m.ToBinary())
>
> In [14]: m2.Debug()
> Atoms:

Re: [Rdkit-discuss] computing all possible molecular descriptors from a smile string rather than .sdf file

2013-01-31 Thread JP
I think this is what you need:
http://www.rdkit.org/docs/GettingStartedInPython.html#descriptor-calculation



-
Jean-Paul Ebejer
Early Stage Researcher


On 31 January 2013 16:52, Leela Velautham
wrote:

>  Hey,
>
>
>
> I'm in Python and have a molecule in a smile string format (rather than a
> .sdf file) and would like to compute all possible molecular descriptors in
> the rdkit for the molecule. I can only find example scripts for doing this
> with a .sdf file and am having trouble - can anyone help?
>
>
>
> Thanks,
>
> L.
>
>
> --
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_jan
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Depicting R groups follow-up

2013-01-31 Thread JP
I am trying to depict R groups using a labels like "R1", "R2" etc.

>From a previous discussion:
http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01793.html

"Here's what's going on currently:

By default the rendering code uses atom.GetSymbol() to determine what
should show up in the drawing. atom.GetSymbol() using the atomic
number, unless the atom has the property "dummyLabel" set. If that
property is set, it's used. It should also be checking for the
property "_MolFileRLabel"."


I assumed that setting dummyLabel should be enough and this would work:

m = Chem.MolFromSmiles('CCN')
for a in m.GetAtoms():
a.SetProp("dummyLabel", "R" + str(a.GetIdx()))
Draw.MolToFile(m, 'test.png')

Any idea why I do not get my R labels?

Many Thanks,


-
Jean-Paul Ebejer
Early Stage Researcher
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Detecting R groups using RDKit and the MCS code (R group decomposition).

2013-01-25 Thread JP
Hi Greg,

On 25 January 2013 04:36, Greg Landrum  wrote:

>
> That's pretty much what I would do. Fortunately, you don't have to
> code it, because it's already there:[1]
>

[1] Incidentally, I love being given that answer :)



>
> In [6]: [Chem.MolToSmiles(x,True) for x in pieces]
> Out[6]: ['[1*]O', '[2*]C', '[6*]CC']
>
>
Out of pedantry, why do some labels *not* have a numeric label (using
2012_12_1)?  All atoms have a numeric id; so the label should all be
attached to a numeric label e.g.

mols = [ Chem.MolFromSmiles('CC(=O)CN(C)C'),
Chem.MolFromSmiles('c1c1C(=O)CN(c1c1)C'),
Chem.MolFromSmiles('COC(=O)CN')]
if MCS.FindMCS(mols).smarts:
core = Chem.MolFromSmarts(MCS.FindMCS(mols).smarts)
for m in mols:
chains = Chem.ReplaceCore(m,core,labelByIndex=True)
print "chains", Chem.MolToSmiles(chains, True)

Gives:

chains [*]C.[2*]C.[2*]C
chains [*]c1c1.[2*]C.[2*]c1c1
chains [*]OC

Now, where is the number label on each first entry?  Not a big deal of
course, but wrecks havoc with my regex.

Also should these lists be uniquified or not?  Take a look at the first
example (e.g. [2*]C.[2*]C)?

Thank-you,
JP
--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Detecting R groups using RDKit and the MCS code (R group decomposition).

2013-01-24 Thread JP
Hola RDkitters,

I have a number of analogue molecules (how lucky) - from which I can
extract a scaffold using Dalke's MCS code (great piece of work, btw).

I would like to identify each R group from each molecule.  My current idea
which I wanted to bounce with you was, for every molecule that I have:

0. Substructure search for the MCS scaffold, this will give ma a set of
atom ids.
1. For every atom id above find if it is connected to something else (so
get neighbours and check for indices which are not in the MCS scaffold set)
2. If there is a connection to an R group break that bond
3. Somehow (how?) retrieve the fragment part and label it Rn (I need to
have distinct sets; R1 R2 R3 etc.)

Is there a better way to do this?  Am I missing something?

Many Thanks,
-
Jean-Paul Ebejer
Early Stage Researcher
--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Volume Overlap using RDKit

2013-01-22 Thread JP
Thanks Niko, Taka for your prompt replies.

My molecules are not guaranteed to have some common substructure, so an
RMSD align is out of the question.  I [think I] want a 3D alignment which
maximizes volume overlap between the two molecules.



-
Jean-Paul Ebejer
Early Stage Researcher


On 22 January 2013 14:10, Stiefl, Nikolaus wrote:

>  Hi JP,
> Do you want to do a shape align or just any sort of alignment?
>
>  There is a MolAlign in All.Chem which will give you an RMSD align. This
> works well if you have reasonably similar molecules (do a GetSubstructMatch
> before to get the atom list).
> Don't think there is a shape alignment for whole molecules – there is
> however the subshapeAligner module in rdkit.Chem but I never used this one.
>
>  Ciao
> Nik
>
>
>
>   From: JP 
> Date: Tue, 22 Jan 2013 12:20:15 +
> To: "rdkit-discuss@lists.sourceforge.net" <
> rdkit-discuss@lists.sourceforge.net>
> Subject: [Rdkit-discuss] Volume Overlap using RDKit
>
>
>  RDKitters,
>
>  Long time no type, I've been busy with that little chestnut of my PhD...
>
>  I would like to align two molecules and calculate the shape tanimoto
> with ShapeTanimotoDist(...).  The issue is that this method requires a
> pre-defined alignment - which I do not have.
>
>  Is there a way how to do a molecular volume overlap in RDKit?  I cannot
> seem to find it and the only related discussion I can find is 
> here<http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg00512.html>.
>  But the fourth slide 
> here<http://www.slideshare.net/baoilleach/cinfony-bring-cheminformatics-toolkits-into-tune>,
> clearly states that RDKit is able to do this.
>
>  If this is not RDKit-doable anyone else has come across some publicly
> available tools to do this?  A quick search lead me to 
> Shape-it<http://silicos-it.com/software/shape-it/1.0.1/shape-it.html>,
> from Hans (who I met at the user group meeting) - anyone used this before?
>
>  p.s. no one ever sent/made available the group photo we took at the 1st
> RDKit meeting :(
>
> -
> Jean-Paul Ebejer
> Early Stage Researcher
> --
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS, MVC,
> Windows 8 Apps, JavaScript and much more. Keep your skills current with
> LearnDevNow - 3,200 step-by-step video tutorials by Microsoft MVPs and
> experts. ON SALE this month only -- learn more at:
> http://p.sf.net/sfu/learnnow-d2d___Rdkit-discuss
>  mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Volume Overlap using RDKit

2013-01-22 Thread JP
RDKitters,

Long time no type, I've been busy with that little chestnut of my PhD...

I would like to align two molecules and calculate the shape tanimoto with
ShapeTanimotoDist(...).  The issue is that this method requires a
pre-defined alignment - which I do not have.

Is there a way how to do a molecular volume overlap in RDKit?  I cannot
seem to find it and the only related discussion I can find is
here.
 But the fourth slide
here,
clearly states that RDKit is able to do this.

If this is not RDKit-doable anyone else has come across some publicly
available tools to do this?  A quick search lead me to
Shape-it,
from Hans (who I met at the user group meeting) - anyone used this before?

p.s. no one ever sent/made available the group photo we took at the 1st
RDKit meeting :(

-
Jean-Paul Ebejer
Early Stage Researcher
--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. ON SALE this month only -- learn more at:
http://p.sf.net/sfu/learnnow-d2d___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] kekulize form not matching aromatic "equivalent"

2012-11-12 Thread JP
Hi there RDkitters,

I have a molecule defined in aromatic form ("c1n[nH]nn1") and I am
trying to do a replace substructs on it using the equivalent kekule
form as a query ("'C1=N[NH1]N=N1").  However the substitution does not
happen.  As shown by the following code:

from rdkit import Chem
from rdkit.Chem import AllChem

mol = Chem.MolFromSmiles('c1n[nH]nn1')
r = Chem.MolFromSmiles('c1[n-]nnn1', sanitize=False) # some arbitrary tautomer
q = Chem.MolFromSmarts('C1=N[NH1]N=N1')
replaced = AllChem.ReplaceSubstructs(mol, q, r, replaceAll=True)[0]
print Chem.MolToSmiles(replaced)

Any ideas why this is the case/what I am doing wrong?

Many thanks,

-
Jean-Paul Ebejer
Early Stage Researcher

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] better error messages (vol 3)

2012-11-08 Thread JP
Please disregard this "bad hair day" message.

I am, allegedly, a computer scientist, and I should know that counting
starts from 0.

Bleh, Sorry,
-
Jean-Paul Ebejer
Early Stage Researcher


On 8 November 2012 15:33, JP  wrote:
> Hi there RDkitters,
>
> Poll season: Does anyone else feel the need for more informative error 
> messages?
>
> For example the below code gives me:
>
> [15:25:17] non-ring atom 22 marked aromatic
>
> But which is atom 22 ?  Any ideas?
> Is it possible to have the equivalent atom rank/index in the file?
>
> I tried the
>
> m = Chem.MolFromMol2Block(mol_block, sanitize=False)
> m.Debug()
>
> trick -- but I am none wiser.
>
> Thanks for your attention,
> JP
>
>
>
> #!/usr/bin/env python
>
> import rdkit
> from rdkit import Chem
>
> mol_block="""@MOLECULE
> 2oc2_RX3
> 7882 1
> SMALL
> NO_CHARGES
>
>
> @ATOM
>   1 O1 -19.51524.565-8.011 O.co2 1 <1> 0.
>   2 C2 -19.77625.440-8.864 C.2   1 <1> 0.
>   3 O3 -20.24626.552-8.531 O.co2 1 <1> 0.
>   4 C4 -19.51225.155   -10.329 C.3   1 <1> 0.
>   5 C5 -20.81424.987   -11.129 C.3   1 <1> 0.
>   6 C6 -21.64223.767   -10.778 C.2   1 <1> 0.
>   7 C7 -22.68223.712-9.857 C.2   1 <1> 0.
>   8 N8 -23.19122.452-9.833 N.pl3 1 <1> 0.
>   9 C10-22.51321.684   -10.723 C.ar  1 <1> 0.
>  10 C11-22.61420.330   -11.100 C.ar  1 <1> 0.
>  11 C12-21.75019.816   -12.075 C.ar  1 <1> 0.
>  12 C13-20.79020.638   -12.669 C.ar  1 <1> 0.
>  13 C14-20.67921.982   -12.302 C.ar  1 <1> 0.
>  14 C15-21.54422.491   -11.327 C.ar  1 <1> 0.
>  15 N16-18.74323.936   -10.541 N.am  1 <1> 0.
>  16 C17-17.45923.796   -10.209 C.2   1 <1> 0.
>  17 O18-16.76824.685-9.703 O.2   1 <1> 0.
>  18 C19-16.90922.423   -10.508 C.3   1 <1> 0.
>  19 C20-17.52421.376-9.584 C.3   1 <1> 0.
>  20 C21-16.45620.972-8.565 C.3   1 <1> 0.
>  21 C22-15.12321.615-8.959 C.3   1 <1> 0.
>  22 C23-15.43422.259   -10.280 C.3   1 <1> 0.
>  23 P24-14.27122.624   -11.492 P.3   1 <1> 0.
>  24 O25-13.83321.276   -11.986 O.2   1 <1> 0.
>  25 O26-14.97023.450   -12.557 O.2   1 <1> 0.
>  26 C27-12.88623.592   -11.095 C.3   1 <1> 0.
>  27 C28-12.79324.508-9.881 C.3   1 <1> 0.
>  28 C29-11.48025.268-9.723 C.ar  1 <1> 0.
>  29 C30-10.70325.075-8.574 C.ar  1 <1> 0.
>  30 C31 -9.49125.757-8.403 C.ar  1 <1> 0.
>  31 C32 -9.04226.649-9.384 C.ar  1 <1> 0.
>  32 C33 -9.81026.851   -10.535 C.ar  1 <1> 0.
>  33 C34-11.02226.171   -10.695 C.ar  1 <1> 0.
>  34 N35-11.91123.631   -12.008 N.am  1 <1> 0.
>  35 C36-11.05222.673   -12.362 C.2   1 <1> 0.
>  36 O37-10.99021.566   -11.861 O.2   1 <1> 0.
>  37 O38-10.11322.980   -13.418 O.3   1 <1> 0.
>  38 C39 -9.79222.030   -14.432 C.3   1 <1> 0.
>  39 C40 -9.12222.726   -15.601 C.ar  1 <1> 0.
>  40 C41 -9.88823.248   -16.644 C.ar  1 <1> 0.
>  41 C42 -9.25923.881   -17.718 C.ar  1 <1> 0.
>  42 C43 -7.86524.003   -17.757 C.ar  1 <1> 0.
>  43 C44 -7.09923.484   -16.717 C.ar  1 <1> 0.
>  44 C45 -7.72722.849   -15.642 C.ar  1 <1> 0.
>  45 H1 -18.94826.006   -10.739 H 1 <1> 0.
&g

[Rdkit-discuss] better error messages (vol 3)

2012-11-08 Thread JP
Hi there RDkitters,

Poll season: Does anyone else feel the need for more informative error messages?

For example the below code gives me:

[15:25:17] non-ring atom 22 marked aromatic

But which is atom 22 ?  Any ideas?
Is it possible to have the equivalent atom rank/index in the file?

I tried the

m = Chem.MolFromMol2Block(mol_block, sanitize=False)
m.Debug()

trick -- but I am none wiser.

Thanks for your attention,
JP



#!/usr/bin/env python

import rdkit
from rdkit import Chem

mol_block="""@MOLECULE
2oc2_RX3
7882 1
SMALL
NO_CHARGES


@ATOM
  1 O1 -19.51524.565-8.011 O.co2 1 <1> 0.
  2 C2 -19.77625.440-8.864 C.2   1 <1> 0.
  3 O3 -20.24626.552-8.531 O.co2 1 <1> 0.
  4 C4 -19.51225.155   -10.329 C.3   1 <1> 0.
  5 C5 -20.81424.987   -11.129 C.3   1 <1> 0.
  6 C6 -21.64223.767   -10.778 C.2   1 <1> 0.
  7 C7 -22.68223.712-9.857 C.2   1 <1> 0.
  8 N8 -23.19122.452-9.833 N.pl3 1 <1> 0.
  9 C10-22.51321.684   -10.723 C.ar  1 <1> 0.
 10 C11-22.61420.330   -11.100 C.ar  1 <1> 0.
 11 C12-21.75019.816   -12.075 C.ar  1 <1> 0.
 12 C13-20.79020.638   -12.669 C.ar  1 <1> 0.
 13 C14-20.67921.982   -12.302 C.ar  1 <1> 0.
 14 C15-21.54422.491   -11.327 C.ar  1 <1> 0.
 15 N16-18.74323.936   -10.541 N.am  1 <1> 0.
 16 C17-17.45923.796   -10.209 C.2   1 <1> 0.
 17 O18-16.76824.685-9.703 O.2   1 <1> 0.
 18 C19-16.90922.423   -10.508 C.3   1 <1> 0.
 19 C20-17.52421.376-9.584 C.3   1 <1> 0.
 20 C21-16.45620.972-8.565 C.3   1 <1> 0.
 21 C22-15.12321.615-8.959 C.3   1 <1> 0.
 22 C23-15.43422.259   -10.280 C.3   1 <1> 0.
 23 P24-14.27122.624   -11.492 P.3   1 <1> 0.
 24 O25-13.83321.276   -11.986 O.2   1 <1> 0.
 25 O26-14.97023.450   -12.557 O.2   1 <1> 0.
 26 C27-12.88623.592   -11.095 C.3   1 <1> 0.
 27 C28-12.79324.508-9.881 C.3   1 <1> 0.
 28 C29-11.48025.268-9.723 C.ar  1 <1> 0.
 29 C30-10.70325.075-8.574 C.ar  1 <1> 0.
 30 C31 -9.49125.757-8.403 C.ar  1 <1> 0.
 31 C32 -9.04226.649-9.384 C.ar  1 <1> 0.
 32 C33 -9.81026.851   -10.535 C.ar  1 <1> 0.
 33 C34-11.02226.171   -10.695 C.ar  1 <1> 0.
 34 N35-11.91123.631   -12.008 N.am  1 <1> 0.
 35 C36-11.05222.673   -12.362 C.2   1 <1> 0.
 36 O37-10.99021.566   -11.861 O.2   1 <1> 0.
 37 O38-10.11322.980   -13.418 O.3   1 <1> 0.
 38 C39 -9.79222.030   -14.432 C.3   1 <1> 0.
 39 C40 -9.12222.726   -15.601 C.ar  1 <1> 0.
 40 C41 -9.88823.248   -16.644 C.ar  1 <1> 0.
 41 C42 -9.25923.881   -17.718 C.ar  1 <1> 0.
 42 C43 -7.86524.003   -17.757 C.ar  1 <1> 0.
 43 C44 -7.09923.484   -16.717 C.ar  1 <1> 0.
 44 C45 -7.72722.849   -15.642 C.ar  1 <1> 0.
 45 H1 -18.94826.006   -10.739 H 1 <1> 0.
 46 H2 -20.55224.923   -12.195 H 1 <1> 0.
 47 H3 -21.43525.878   -10.957 H 1 <1> 0.
 48 H4 -23.03324.546-9.251 H 1 <1> 0.
 49 H5 -23.96722.132-9.235 H 1 <1> 0.
 50 H6 -23.35619.690   -10.637 H 1 <1> 0.
 51 H7 -21.82618.775   -12.369 H 1 <1> 0.
 52 H8 -20.12620.230   -13.423 H 1 <1> 0.
 53 H9 -19.93422.619   -12.764

Re: [Rdkit-discuss] Getting an RDKit cookbook started

2012-10-29 Thread JP
Hi there Greg,

This is a good idea - and one which is bound to save me lots of
questions later on.

I have added an Index to Cookbook.rst, but I cannot commit it back
"svn: Authorization failed" .  Unsurprising, as I have checked out a
"read-only version" from the sf svn code page.  Also, can you add a
recipe on how to sphinx-generate the documentation (shamefully I have
to admit I haven't tested this, bad bad boy).  I would like to make
hyperlinks from each of the entries in the ToC to the actual code
snippets - can I just htmlize the Cookbook documentation with anchors
and hrefs ?

I think an index section is important, not now, but when you have 20+
entries and you want to check if some bit of functionality is already
available or not (stops you from reading the whole page).  Here is
what it looks like, it comes right after the "What is this" section.

Table of Contents
*

The following recipes are found on this page:

*  Cleaning up heterocycles - this fixes some [most?] of the annoying "can't
kekulize errors"
*  Parallel conformation generation - this bit of code makes use of your
multi-core machine to generate conformers for a particular molecule
*  Neutralizing Charged Molecules - removes charges from a molecule based on a
set of SMARTS patterns

NB more details can be found in every mailing-list cited post.



p.s. how is the windows 8 build coming along ? :-)

-
Jean-Paul Ebejer
Early Stage Researcher




On 27 October 2012 10:59, Greg Landrum  wrote:
>
> [Another UGM followup]
>
> Dear all,
>
> I've started a first pass at creating an RDKit Cookbook. The idea is
> to have somewhere to capture small bits of RDKit code that do
> something useful but that may not be large enough to be worth doing a
> Contrib/ entry.
>
> I decided to integrate this with the rest of the Sphinx-generated
> documentation so that it's easy to find, can be tested, and shows up
> in web searches.
>
> Here's a first pass, with three recipes that have shown up either here
> on the mailing list or on the wiki:
> http://www.rdkit.org/docs-beta/Cookbook.html
>
> The source is in SVN:
> https://sourceforge.net/p/rdkit/code/2251/tree/trunk/Docs/Book/Cookbook.rst
>
> It should show up in the github mirror in the near future.
>
> Feedback and contributions, particularly in the form of either patches
> to the docs or standalone pieces of RST I can integrate, are very
> welcome.
>
> Andrew, James, and Hans: the first set of recipes are from posts from
> you guys. If you have objections to me using them, please let me know.
>
> Best,
> -greg
>
> --
> WINDOWS 8 is here.
> Millions of people.  Your app in 30 days.
> Visit The Windows 8 Center at Sourceforge for all your go to resources.
> http://windows8center.sourceforge.net/
> join-generation-app-and-make-money-coding-fast/
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
The Windows 8 Center - In partnership with Sourceforge
Your idea - your app - 30 days.
Get started!
http://windows8center.sourceforge.net/
what-html-developers-need-to-know-about-coding-windows-8-metro-style-apps/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Location of C++ UFFOptimizeMolecule() function?

2012-10-12 Thread JP
Finally, a question I [think I] can answer and Greg hasn't beaten me to it.
 Now what is the chance of that happening?

So, Hans, the method is in a wrapper in $RDBASE/Code

GraphMol/ForceFieldHelpers/Wrap/rdForceFields.cpp:

int UFFOptimizeMolecule(ROMol &mol, int maxIters=200,
  double vdwThresh=10.0, int confId=-1,
  bool ignoreInterfragInteractions=true )

The main instance in this method comes from:

ForceFields::ForceField *ff=UFF::constructForceField(mol,vdwThresh,
confId, ignoreInterfragInteractions);

Which is a method found in found
in GraphMol/ForceFieldHelpers/UFF/Builder.h.  The forcefield is defined in
ForceField/ForceField.h.

Hope this is somewhat helpful!



-
Jean-Paul Ebejer
Early Stage Researcher


On 12 October 2012 06:09, Hans De Winter  wrote:

> Hi all,
>
> according the manual, the C++ signature of the Python function:
>
> UFFOptimizeMolecule()
>
> should be:
>
> int UFFOptimizeMolecule(RDKit::ROMol {lvalue} [,int=200 [,double=10.0
> [,int=-1 [,bool=True)
>
> However, I cannot find the header file of this function anywhere in the
> 2012.03.1 distribution.
>
> Any hints?
>
> Thank you,
> Hans
>
>
>
> --
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ForwardSDMolSupplier::atEnd bug

2012-10-10 Thread JP
On 10 October 2012 14:24, Greg Landrum  wrote:

> On Tue, Oct 9, 2012 at 4:44 PM, Toby Wright 
> wrote:
> >
> > Working in C++, I am calling ForwardsSDMolSupplier's method "atEnd()",
> > expecting that it returns false if there are more molecules and true
> > if there are no more molecules. As such the line:   while
> > (!molSupplier->atEnd()) ought to be the obvious looping test with
> > molSupplier->next()  providing the molecules, one per loop. Sadly it
> > seems that atEnd doesn't work. For example if I have a file with 2
> > molecules in it, the loop executes 3 times. On the first 2 "next()"
> > provides me with the correct molecule. On the third "next()" returns
> > NULL. Then when the loop test is checked for a fourth time atEnd()
> > returns true.
> >
> > I can work around this very easily so there is no urgency from me,
> > just thought I should make the bug known. Below is code in full that
> > demonstrates the problem.
>
> Thanks for raising it. This is a known characteristic of the
> ForwardSDMolSupplier. This supplier does not do a look-ahead at the
> end of the molecule, so if there's a blank line after the last
> molecule, the supplier has no way of knowing the file is actually at
> the end until you try to read the next molecule. Fixing this requires
> adding a cache to the supplier so that it can do look-ahead. This is
> doable (and would help address the outstanding feature request to have
> a getLastItemText method on ForwardSDMolSuppliers.



https://sourceforge.net/p/rdkit/bugs/259/

>:-)

(feel free to reject it, of course)
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] parallel conformation generation

2012-10-09 Thread JP
This is great Andrew (especially the subsequent explanation)!  Many Thanks.

Considering that this is a task lots of people will want to do - is
this code CONTRIB dir material?

(perhaps max_workers should be a fourth command line argument defaulting to 1)

A few months (years?) back Greg suggested to always .flush()
explicitly before closing the sd writer.

Cheers,

-
Jean-Paul Ebejer
Early Stage Researcher


On 5 October 2012 09:24, George Papadatos  wrote:
> Hi Andrew,
>
> Thanks for this. I didn't know about the futures and progressbar modules.
>
> You wrote:
> ---
> I have to use the "zip" because map(f, iterable, [chunksize=None]) only
> takes a single iterable. This also means I need to change the
> "generateconformations"
> function so that it takes a single element as input, which a 2-element
> tuple of the molecule and the count.
> ---
>
> For such cases, there is a more elegant and pythonic way: functools.partial
> http://docs.python.org/library/functools.html#functools.partial
> It just freezes some of the arguments of a function, so you can use map with
> a single argument.
>
> In your case:
> newfunc = partial(generateconformations, size=n)
> map(newfunc, mols)
>
>
> Best regards,
>
> George P.
>
>
>
> On 4 October 2012 22:47, Andrew Dalke  wrote:
>>
>> Hi again,
>>
>>  Greg asked why I used the concurrent.futures module rather than
>> the multiprocessing module which is standard with Python 2.6.
>>
>>
>> There are a few differences in the API which makes the futures
>> module more interesting. First off, here's how you could write
>> the same process pool part using the existing multiprocessing module:
>>
>>
>> from multiprocessing import Pool
>> p = Pool(5)
>> for mol, ids in p.map(generateconformations, zip(suppl, [n]*len(suppl))):
>>for id in ids:
>>writer.write(mol, confId=id)
>>
>> I have to use the "zip" because map(f, iterable, [chunksize=None]) only
>> takes a single iterable. This also means I need to change the
>> "generateconformations"
>> function so that it takes a single element as input, which a 2-element
>> tuple of the molecule and the count. (That is, change from
>>
>> def generateconformations(m, n):
>>   ...
>>
>> to
>>
>> def generateconformations((m, n)):
>>   ...
>>
>> ).
>>
>> That's a touch uglier, but doable.
>>
>> Now, when I posted the code yesterday, I should have posted the simplest
>> version of the code, which is:
>>
>> with futures.ProcessPoolExecutor(max_workers=max_workers) as executor:
>>for mol, ids in executor.map(generateconformations, suppl,
>> [n]*len(suppl)):
>>for id in ids:
>>writer.write(mol, confId=id)
>>
>>
>> Then Greg wouldn't have asked me about how complex my code was. ;)
>>
>>
>> This is the easiest to understand. You can see that this API supports
>> multiple iterators. I used [n]*len(suppl) to make a new list containing
>> repeats of the count, so I could have the twin iterators of the molecules
>> and the count. This is a bit simpler than the multiprocessing code.
>>
>> In addition, the "with" statement know how to work with an executor. Here
>> it means that all submitted jobs must finish before leaving the with
>> block,
>> and the process pool will be shut down; even if there's an exception.
>> With the multiprocessing module, you need to manage that yourself, or
>> trust in the memory manager.
>>
>>
>> But I yesterday wrote something more like this:
>>
>># Submit a set of asynchronous jobs
>>jobs = []
>>for mol in suppl:
>>if mol:
>>job = executor.submit(generateconformations, mol, n)
>>jobs.append(job)
>>
>># Process the job results (in submission order) and save the
>> conformers.
>>for job in jobs:
>>mol, ids = job.result()
>>for id in ids:
>>writer.write(mol, confId=id)
>>
>>
>> The "submit" immediately returns a 'future' object, which is called a
>> "promise" in some other language. You can ask for its .result() to
>> get its result. That call will block (up to a timeout) if the result
>> isn't there. You can also check to see if there is a result.
>>
>> The reason I did this is because I usually 1) show a progress bar
>> and 2) have enough memory to store all the results in memory.
>>
>> I've enjoyed using the 'progressbar' module, from
>>  http://pypi.python.org/pypi/progressbar/
>>
>> I have code which looks like this:
>>
>>with futures.ProcessPoolExecutor(max_workers=4) as executor:
>>for (collection, first_id, last_id) in blocks:
>>jobs.append(executor.submit(process_block, tmpdir, config,
>> collection, first_id, last_id))
>>
>>widgets = ["Fingerprinting ", progressbar.Percentage(), " ",
>> progressbar.ETA(), " ", progressbar.Bar()]
>>pbar = progressbar.ProgressBar(widgets=widgets, maxval=len(jobs))
>>for job in pbar(futures.as_completed(jobs)):
>>job.result()
>>
>>
>> This submits all of the fingerprinting jobs to the process pool.
>> The "futures.a

[Rdkit-discuss] Molecule equality (override == in python)

2012-09-25 Thread JP
So excited for next week folks!

Now to a real issue.

I must be really missing something basic...

I understand the below returns false because of the different Mol
instances, but is there an "easyish" way (without comparing inchis,
fingerprints, converting to canonical Smiles etc) how to override the == to
return True?

>>> import rdkit
>>> from rdkit import Chem
>>> m1 = Chem.MolFromSmiles('CC')
>>> m2 = Chem.MolFromSmiles('CC')
>>> m1 == m2
False

Many Thanks and See you soon !

-
Jean-Paul Ebejer
Early Stage Researcher
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Order of definitions in fdef file

2012-08-21 Thread JP
I am sorry to pick up on this again - but I still cannot get it to work.

I fixed the SMARTS definition to something obvious - just mark all
aliphatic nitrogens as acceptors

When the order in the definition file is

AtomType NAcceptor [N]
AtomType NAcceptor [n;+0;!X3;!$([n;H1](cc)cc)]

Debugging with your suggested method gives:

Acceptor.SingleAtomAcceptor [$([N,$([n;+0;!X3;!$([n;H1](cc)cc)])])]

which looks good.

But when I invert the order of the two I get:

Acceptor.SingleAtomAcceptor [$([n;+0;!X3;!$([n;H1](cc)cc),$([N])])]

Which is NOT the same as the above because ";" (AND) has lower precedence
than ",".  So this is the first bit which is being evaluated is the
spurious clause !$([n;H1](cc)cc),$([N]) and then following the other ANDs.
 I think.  This of course gives me no acceptor points.

The correct way to write this was *perhaps*

Acceptor.SingleAtomAcceptor [$([n;+0;!X3;!$([n;H1](cc)cc))],$([N])]

-
Jean-Paul Ebejer
Early Stage Researcher


On 15 August 2012 04:49, Greg Landrum  wrote:

> On Tue, Aug 14, 2012 at 11:43 AM, JP  wrote:
> >
> > Anyway enough of the blabber.  I am using the feature definition file
> > in RDKit and was wondering why the order of the rules in the file
> > makes a difference.
> >
> > So
> >
> > AtomType NAcceptor C[N;H0]=C
> > AtomType NAcceptor [N&v3;H0;$(Nc)]
> >
> > Gives different results than
> >
> > AtomType NAcceptor [N&v3;H0;$(Nc)]
> > AtomType NAcceptor C[N;H0]=C
> >
> > These are different rules affecting different chemotypes...  why does
> > the above find the CN=C acceptor feature and the below does not?
>
> The short answer is that you're using the wrong SMARTS. An AtomType
> definition should match a single Atom. What I think you mean here is:
>
> AtomType NAcceptor [N&v3;H0;$(Nc)]
> AtomType NAcceptor [$(N(C)=C)]
>
> Here's a demonstration that using this makes the order dependence go away:
>
> In [31]: fdf="""AtomType NAcceptor3 [N&v3;H0;$(Nc)]
>: AtomType NAcceptor3 [$(N(C)=C)]
>: DefineFeature SingleAtomAcceptor3 [{NAcceptor3}]
>:   Family Acceptor3
>:   Weights 1
>: EndFeature
>:
>: AtomType NAcceptor4 [$(N(C)=C)]
>: AtomType NAcceptor4 [N&v3;H0;$(Nc)]
>: DefineFeature SingleAtomAcceptor3 [{NAcceptor4}]
>:   Family Acceptor4
>:   Weights 1
>: EndFeature
>: """
>
> In [32]: m = Chem.MolFromSmiles('CN=C')
>
> In [33]: ff = AllChem.BuildFeatureFactoryFromString(fdf)
>
> In [34]: feats=ff.GetFeaturesForMol(m)
>
> In [35]: [x.GetFamily() for x in feats]
> Out[35]: ['Acceptor3', 'Acceptor4']
>
> Hopefully that gets your code working. You may want to stop reading here.
> :-)
>
>
> Here's what happens when I do the same thing with your definitions:
>
> In [36]: fdf="""AtomType NAcceptor1 C[N;H0]=C
>: AtomType NAcceptor1 [N&v3;H0;$(Nc)]
>: DefineFeature SingleAtomAcceptor1 [{NAcceptor1}]
>:   Family Acceptor1
>:   Weights 1
>: EndFeature
>:
>: AtomType NAcceptor2 [N&v3;H0;$(Nc)]
>: AtomType NAcceptor2 C[N;H0]=C
>: DefineFeature SingleAtomAcceptor2 [{NAcceptor2}]
>:   Family Acceptor2
>:   Weights 1
>: EndFeature
>: """
>
> In [37]: ff = AllChem.BuildFeatureFactoryFromString(fdf)
>
> In [38]: feats=ff.GetFeaturesForMol(m)
>
> In [39]: [x.GetFamily() for x in feats]
> Out[39]: ['Acceptor1']
>
> This is the behavior you were seeing.
>
> To understand why this happens, you need to look at the SMARTS that
> ends up being produced for each of your feature definitions:
>
> In [40]: for k,v in ff.GetFeatureDefs().iteritems(): print k,v
> Acceptor1.SingleAtomAcceptor1 [$(C[N;H0,$([N&v3;H0;$(Nc)])]=C)]
> Acceptor2.SingleAtomAcceptor2 [$([N&v3;H0;$(Nc),$(C[N;H0]=C)])]
>
> The fdef parser combines the different atom type defintions with each
> other based on the assumption that each defines a single atom using
> simple string manipulations. It's really expecting your AtomType
> definition to start and end with a square bracket.  It should be
> testing for that, but it's not.
>
> -greg
>
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


  1   2   3   >