Re: [Rdkit-discuss] Tanimoto Similarity

2018-07-04 Thread Patrick Walters
I would highly recommend this paper where the authors describe an
alternative to arbitrary similarity cutoffs

https://pubs.acs.org/doi/pdf/10.1021/ci7004498

Pat

On Wed, Jul 4, 2018 at 9:31 AM Maciek Wójcikowski 
wrote:

> Hi
>
> As Nils has mentioned this is fingerprint dependent. ECFP4 have the
> significant cutoff ~0.4, see https://pubs.acs.org/doi/10.1021/ci7004498
>
> 
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2018-07-04 8:44 GMT+02:00 Nils Weskamp :
>
>> Dear Phuong,
>>
>> unfortunately, there is no generic answer to this question since it is
>> highly dependent on the fingerprint, the type of compounds, your
>> specific application and also your chemical intuition. I can only
>> recommend to test a range of different cutoff values and to see how
>> happy you are with the results.
>>
>> If you have access to a list of analogs that you definitely want to find
>> ("known actives") and a large set of known irrelevant compounds, you
>> might be able to use statistical analyses to derive some kind of
>> "optimal" threshold.
>>
>> If we are talking about path-oriented fingerprints (like the RDKit
>> Chemical Fingerprints) and "normal" drug-like molecules, I would
>> typically go down to 0.70 - 0.75 and then manually weed out false hits.
>>
>> Hope this helps,
>> Nils
>>
>> Am 04.07.2018 um 02:24 schrieb Phuong Chau:
>> > To whom it may concern,
>> >
>> > I was working on finding a group of possible neighbors (similar)
>> > chemicals based on Tanimoto Similarity. I am not sure what is the
>> > optimal cutoff for finding similar chemicals. I searched online and they
>> > said it is 0.85 but there are also many exceptions they mentioned about.
>> > Do you have any suggestions?
>> >
>> > Thank you so much for your help
>> >
>> >
>> >
>> --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> >
>> >
>> >
>> > ___
>> > Rdkit-discuss mailing list
>> > Rdkit-discuss@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>> >
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Tanimoto Similarity

2018-07-04 Thread Maciek Wójcikowski
Hi

As Nils has mentioned this is fingerprint dependent. ECFP4 have the
significant cutoff ~0.4, see https://pubs.acs.org/doi/10.1021/ci7004498


Pozdrawiam,  |  Best regards,
Maciek Wójcikowski
mac...@wojcikowski.pl

2018-07-04 8:44 GMT+02:00 Nils Weskamp :

> Dear Phuong,
>
> unfortunately, there is no generic answer to this question since it is
> highly dependent on the fingerprint, the type of compounds, your
> specific application and also your chemical intuition. I can only
> recommend to test a range of different cutoff values and to see how
> happy you are with the results.
>
> If you have access to a list of analogs that you definitely want to find
> ("known actives") and a large set of known irrelevant compounds, you
> might be able to use statistical analyses to derive some kind of
> "optimal" threshold.
>
> If we are talking about path-oriented fingerprints (like the RDKit
> Chemical Fingerprints) and "normal" drug-like molecules, I would
> typically go down to 0.70 - 0.75 and then manually weed out false hits.
>
> Hope this helps,
> Nils
>
> Am 04.07.2018 um 02:24 schrieb Phuong Chau:
> > To whom it may concern,
> >
> > I was working on finding a group of possible neighbors (similar)
> > chemicals based on Tanimoto Similarity. I am not sure what is the
> > optimal cutoff for finding similar chemicals. I searched online and they
> > said it is 0.85 but there are also many exceptions they mentioned about.
> > Do you have any suggestions?
> >
> > Thank you so much for your help
> >
> >
> > 
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> >
> >
> >
> > ___
> > Rdkit-discuss mailing list
> > Rdkit-discuss@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> >
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Tanimoto Similarity

2018-07-04 Thread Nils Weskamp
Dear Phuong,

unfortunately, there is no generic answer to this question since it is
highly dependent on the fingerprint, the type of compounds, your
specific application and also your chemical intuition. I can only
recommend to test a range of different cutoff values and to see how
happy you are with the results.

If you have access to a list of analogs that you definitely want to find
("known actives") and a large set of known irrelevant compounds, you
might be able to use statistical analyses to derive some kind of
"optimal" threshold.

If we are talking about path-oriented fingerprints (like the RDKit
Chemical Fingerprints) and "normal" drug-like molecules, I would
typically go down to 0.70 - 0.75 and then manually weed out false hits.

Hope this helps,
Nils

Am 04.07.2018 um 02:24 schrieb Phuong Chau:
> To whom it may concern,
> 
> I was working on finding a group of possible neighbors (similar)
> chemicals based on Tanimoto Similarity. I am not sure what is the
> optimal cutoff for finding similar chemicals. I searched online and they
> said it is 0.85 but there are also many exceptions they mentioned about.
> Do you have any suggestions? 
> 
> Thank you so much for your help 
> 
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> 
> 
> 
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
> 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Tanimoto Similarity

2018-07-03 Thread Phuong Chau
To whom it may concern,

I was working on finding a group of possible neighbors (similar) chemicals
based on Tanimoto Similarity. I am not sure what is the optimal cutoff for
finding similar chemicals. I searched online and they said it is 0.85 but
there are also many exceptions they mentioned about. Do you have any
suggestions?

Thank you so much for your help
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Tanimoto Similarity of Sparse Int Vects

2012-08-30 Thread Greg Landrum
Dear Nick,

On Thu, Aug 30, 2012 at 3:08 PM, Nicholas Firth
nicholas.fi...@icr.ac.uk wrote:

 Hi RDKiters,
 I've been calculating the Tanimoto similarity of sparse in vectors using C++ 
 and I can't seem to work out whether or not this is the binary Tanimoto index 
 or the Integer version. I've managed to track the answer to the 
 calcVectParams function in the SparseIntVect.h file in the source, but I get 
 a bit lost in that function.


If you are working with a SparseIntVect, then it is using the integer
version of the tanimoto index.
The function calcVectParams tries to be reasonably efficient, which
ends up making it a bit tricky to wade through. The logic that is used
is the following:

v1Sum = sum(abs(v1[i]))
v2Sum = sum(abs(v2[i]))
andSum = sum(abs(min(vi[i],v2[i])))

Does that help?

-greg

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] tanimoto similarity calculation for topological fp

2012-04-14 Thread Greg Landrum
Dear Gonzalo,

On Sat, Apr 14, 2012 at 3:06 PM, Gonzalo Colmenarejo-Sanchez
gonzalo.2.colmenar...@gsk.com wrote:

 But I’m getting the following error message: “error: no matching function
 for call to ‘TanimotoSimilarity(ExplicitBitVect, ExplicitBitVect)’”

you need to #include DataStructs/BitOps.h

-greg

--
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss