Hi,
I am trying to calculate the inter-classes Tanimoto similarity using RDkit
fingerprints for several classes of GPCR ligands and provided the results in
the form of a square matrix or a heatmap.I have attempted using the
Bulktanimoto similarity function for that purpose, it works. I am looping over
every pair of compounds lists, for pair[0] class, I am looping over every
compound, and the average Tanimoto scores of each ligand in pair[0] is compared
against every ligand in pair[1] class, and put the data into a numpy array. At
the end and to get an idea about how similar are ligands between the lists
pair, p[0] and p[1], I am taking the mean of this array
my question is if this is the valid approach of comparison, particularly if the
class sizes vary widely and the average similarity will be inevitably affected
by the size of each item in each pair. As a check, it looks that the diagonal
is having the highest inter-classes similarity overall, which is anyway
expected.
I am also wondering if a size-weighted normalization approach could handle this
situation?
Any clue,
Thanks,Marawan
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss