Re: [Rdkit-discuss] Tanimoto Similarity
I would highly recommend this paper where the authors describe an alternative to arbitrary similarity cutoffs https://pubs.acs.org/doi/pdf/10.1021/ci7004498 Pat On Wed, Jul 4, 2018 at 9:31 AM Maciek Wójcikowski wrote: > Hi > > As Nils has mentioned this is fingerprint dependent. ECFP4 have the > significant cutoff ~0.4, see https://pubs.acs.org/doi/10.1021/ci7004498 > > > Pozdrawiam, | Best regards, > Maciek Wójcikowski > mac...@wojcikowski.pl > > 2018-07-04 8:44 GMT+02:00 Nils Weskamp : > >> Dear Phuong, >> >> unfortunately, there is no generic answer to this question since it is >> highly dependent on the fingerprint, the type of compounds, your >> specific application and also your chemical intuition. I can only >> recommend to test a range of different cutoff values and to see how >> happy you are with the results. >> >> If you have access to a list of analogs that you definitely want to find >> ("known actives") and a large set of known irrelevant compounds, you >> might be able to use statistical analyses to derive some kind of >> "optimal" threshold. >> >> If we are talking about path-oriented fingerprints (like the RDKit >> Chemical Fingerprints) and "normal" drug-like molecules, I would >> typically go down to 0.70 - 0.75 and then manually weed out false hits. >> >> Hope this helps, >> Nils >> >> Am 04.07.2018 um 02:24 schrieb Phuong Chau: >> > To whom it may concern, >> > >> > I was working on finding a group of possible neighbors (similar) >> > chemicals based on Tanimoto Similarity. I am not sure what is the >> > optimal cutoff for finding similar chemicals. I searched online and they >> > said it is 0.85 but there are also many exceptions they mentioned about. >> > Do you have any suggestions? >> > >> > Thank you so much for your help >> > >> > >> > >> -- >> > Check out the vibrant tech community on one of the world's most >> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> > >> > >> > >> > ___ >> > Rdkit-discuss mailing list >> > Rdkit-discuss@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Tanimoto Similarity
Hi As Nils has mentioned this is fingerprint dependent. ECFP4 have the significant cutoff ~0.4, see https://pubs.acs.org/doi/10.1021/ci7004498 Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2018-07-04 8:44 GMT+02:00 Nils Weskamp : > Dear Phuong, > > unfortunately, there is no generic answer to this question since it is > highly dependent on the fingerprint, the type of compounds, your > specific application and also your chemical intuition. I can only > recommend to test a range of different cutoff values and to see how > happy you are with the results. > > If you have access to a list of analogs that you definitely want to find > ("known actives") and a large set of known irrelevant compounds, you > might be able to use statistical analyses to derive some kind of > "optimal" threshold. > > If we are talking about path-oriented fingerprints (like the RDKit > Chemical Fingerprints) and "normal" drug-like molecules, I would > typically go down to 0.70 - 0.75 and then manually weed out false hits. > > Hope this helps, > Nils > > Am 04.07.2018 um 02:24 schrieb Phuong Chau: > > To whom it may concern, > > > > I was working on finding a group of possible neighbors (similar) > > chemicals based on Tanimoto Similarity. I am not sure what is the > > optimal cutoff for finding similar chemicals. I searched online and they > > said it is 0.85 but there are also many exceptions they mentioned about. > > Do you have any suggestions? > > > > Thank you so much for your help > > > > > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > > > > > ___ > > Rdkit-discuss mailing list > > Rdkit-discuss@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Tanimoto Similarity
Dear Phuong, unfortunately, there is no generic answer to this question since it is highly dependent on the fingerprint, the type of compounds, your specific application and also your chemical intuition. I can only recommend to test a range of different cutoff values and to see how happy you are with the results. If you have access to a list of analogs that you definitely want to find ("known actives") and a large set of known irrelevant compounds, you might be able to use statistical analyses to derive some kind of "optimal" threshold. If we are talking about path-oriented fingerprints (like the RDKit Chemical Fingerprints) and "normal" drug-like molecules, I would typically go down to 0.70 - 0.75 and then manually weed out false hits. Hope this helps, Nils Am 04.07.2018 um 02:24 schrieb Phuong Chau: > To whom it may concern, > > I was working on finding a group of possible neighbors (similar) > chemicals based on Tanimoto Similarity. I am not sure what is the > optimal cutoff for finding similar chemicals. I searched online and they > said it is 0.85 but there are also many exceptions they mentioned about. > Do you have any suggestions? > > Thank you so much for your help > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Tanimoto Similarity
To whom it may concern, I was working on finding a group of possible neighbors (similar) chemicals based on Tanimoto Similarity. I am not sure what is the optimal cutoff for finding similar chemicals. I searched online and they said it is 0.85 but there are also many exceptions they mentioned about. Do you have any suggestions? Thank you so much for your help -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Tanimoto Similarity of Sparse Int Vects
Dear Nick, On Thu, Aug 30, 2012 at 3:08 PM, Nicholas Firth nicholas.fi...@icr.ac.uk wrote: Hi RDKiters, I've been calculating the Tanimoto similarity of sparse in vectors using C++ and I can't seem to work out whether or not this is the binary Tanimoto index or the Integer version. I've managed to track the answer to the calcVectParams function in the SparseIntVect.h file in the source, but I get a bit lost in that function. If you are working with a SparseIntVect, then it is using the integer version of the tanimoto index. The function calcVectParams tries to be reasonably efficient, which ends up making it a bit tricky to wade through. The logic that is used is the following: v1Sum = sum(abs(v1[i])) v2Sum = sum(abs(v2[i])) andSum = sum(abs(min(vi[i],v2[i]))) Does that help? -greg -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] tanimoto similarity calculation for topological fp
Dear Gonzalo, On Sat, Apr 14, 2012 at 3:06 PM, Gonzalo Colmenarejo-Sanchez gonzalo.2.colmenar...@gsk.com wrote: But I’m getting the following error message: “error: no matching function for call to ‘TanimotoSimilarity(ExplicitBitVect, ExplicitBitVect)’” you need to #include DataStructs/BitOps.h -greg -- For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss