Re: [Rdkit-discuss] Difference between ECFP and MorganFingerprint

2015-09-30 Thread Greg Landrum
On Wed, Sep 30, 2015 at 8:05 AM, Guillaume GODIN <
guillaume.go...@firmenich.com> wrote:

>
>
> When you say as closely as I could, do you mean that all the paramaters
> are the sames in ECFP and Morgan but the only divergence between them is on
> the way RDKit/Pipeline handle aromaticity + hashing ?
>

I don't think that there are any parameters. I followed the algorithm
description in the paper, but since the fingerprints include information
about chemistry (specifically about bond types, which include aromaticity),
differences could arise. The hashing algorithm is not described in the
paper, so that will definitely be different.

The Morgan fingerprints in the RDKit will not produce the same fingerprint
as PP's ECFP implementation, but they should produce very similar
similarity values (as the presentation I referenced earlier demonstrates).

Best,
-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Difference between ECFP and MorganFingerprint

2015-09-30 Thread Guillaume GODIN
Dear Greg,

When you say as closely as I could, do you mean that all the paramaters are the 
sames in ECFP and Morgan but the only divergence between them is on the way 
RDKit/Pipeline handle aromaticity + hashing ?

Thanks

Guillaume

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: mercredi 30 septembre 2015 08:01
To: Jing Lu
Cc: RDKit Discuss
Subject: Re: [Rdkit-discuss] Difference between ECFP and MorganFingerprint


On Wed, Sep 30, 2015 at 6:47 AM, Greg Landrum 
<greg.land...@gmail.com<mailto:greg.land...@gmail.com>> wrote:

On Tue, Sep 29, 2015 at 8:22 PM, Jing Lu 
<ajin...@gmail.com<mailto:ajin...@gmail.com>> wrote:

I was treating AllChem.GetMorganFingerprint(m1,2) the same as ECFP4. I am 
writing a paper for a open source tool, so I need to be very accurate. I have 
seen one open source implementation for ECFP, which is from CDK. Most 
researchers are using Pipeline Pilot to calculate ECFP. But, Pipeline Pilot is 
not open source.

To be very clear: the only implementation of ECFP is the one in Pipeline Pilot. 
The other implementations like the one in the CDK and the RDKit, may have 
followed the algorithm description that was published, but due to differences 
in aromaticity perception and hashing algorithms the results will not be 
exactly the same.

Sorry, should have been more explicit here: when I did the Morgan fingerprint 
implementation for the RDKit, I followed the published algorithm description as 
closely as I could.

-greg

**  
DISCLAIMER  
This email and any files transmitted with it, including replies and forwarded 
copies (which may contain alterations) subsequently transmitted from Firmenich, 
are confidential and solely for the use of the intended recipient. The contents 
do not represent the opinion of Firmenich except to the extent that it relates 
to their official business.  
**--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Difference between ECFP and MorganFingerprint

2015-09-30 Thread Greg Landrum
On Wed, Sep 30, 2015 at 6:47 AM, Greg Landrum 
wrote:

>
> On Tue, Sep 29, 2015 at 8:22 PM, Jing Lu  wrote:
>
>>
>> I was treating AllChem.GetMorganFingerprint(m1,2) the same as ECFP4. I am
>> writing a paper for a open source tool, so I need to be very accurate. I
>> have seen one open source implementation for ECFP, which is from CDK. Most
>> researchers are using Pipeline Pilot to calculate ECFP. But, Pipeline Pilot
>> is not open source.
>>
>
> To be very clear: the only implementation of ECFP is the one in Pipeline
> Pilot. The other implementations like the one in the CDK and the RDKit, may
> have followed the algorithm description that was published, but due to
> differences in aromaticity perception and hashing algorithms the results
> will not be exactly the same.
>

Sorry, should have been more explicit here: when I did the Morgan
fingerprint implementation for the RDKit, I followed the published
algorithm description as closely as I could.

-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Difference between ECFP and MorganFingerprint

2015-09-29 Thread Greg Landrum
On Tue, Sep 29, 2015 at 8:22 PM, Jing Lu  wrote:

>
> I was treating AllChem.GetMorganFingerprint(m1,2) the same as ECFP4. I am
> writing a paper for a open source tool, so I need to be very accurate. I
> have seen one open source implementation for ECFP, which is from CDK. Most
> researchers are using Pipeline Pilot to calculate ECFP. But, Pipeline Pilot
> is not open source.
>

To be very clear: the only implementation of ECFP is the one in Pipeline
Pilot. The other implementations like the one in the CDK and the RDKit, may
have followed the algorithm description that was published, but due to
differences in aromaticity perception and hashing algorithms the results
will not be exactly the same.


> I calculate taminoto similarity based on Morgan fingerprint. The
> similarity matrix is the input for my tool. I am wondering how different it
> is for Morgan fingerprint and ECFP. Will they give different answers in
> some situations? Can we use MorganFingerprint instead of ECFP most of the
> time?
>

I have, in the past, done a comparison between similarity values calculated
with the RDKit and those with PP's ECFP implementation. I've presented
those results in a couple different places; here's one of them:
http://rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf

Best,
-greg
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss