Hi,
I am trying to calculate the inter-classes Tanimoto similarity using RDkit 
fingerprints for several classes of GPCR ligands and provided the results in 
the form of a square matrix or a heatmap.I have attempted using the 
Bulktanimoto similarity function for that purpose, it works. I am looping over 
every pair of compounds lists, for pair[0] class, I am looping over every 
compound, and the average Tanimoto scores of each ligand in pair[0] is compared 
against every ligand in pair[1] class, and put the data into a numpy array. At 
the end and to get an idea about how similar are ligands between the lists 
pair, p[0] and p[1], I am taking the mean of this array
my question is if this is the valid approach of comparison, particularly if the 
class sizes vary widely and the average similarity will be inevitably affected 
by the size of each item in each pair. As a check, it looks that the diagonal 
is having the highest inter-classes similarity overall, which is anyway 
expected.
I am also wondering if a size-weighted normalization approach could handle this 
situation?
Any clue,
Thanks,Marawan
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to