Hello everyone, I'm currently working with a dataset of chemical compounds, aiming to cluster them into different series to create a 3D-QSAR model. Up to this point, I've been using Morgan Fingerprints to generate the descriptors and cluster the compounds based on their Tanimoto Similarity:
``` # Generate fingerprint descriptor database fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols] # Calculate pairwise Tanimoto similarity between fingerprints similarity_matrix = [] for i in range(len(fps)): similarities = [] for j in range(len(fps)): similarities.append(DataStructs.TanimotoSimilarity(fps[i], fps[j])) similarity_matrix.append(similarities) ``` With the similarity matrix, I applied hierarchical clustering based on a Tanimoto Similarity threshold to group similar compounds: ``` # Cluster based on Tanimoto similarity dists = 1 - np.array(similarity_matrix) hc = hierarchy.linkage(squareform(dists), method='single') # Specify a distance threshold or number of clusters threshold = 0.6 # Adjust this value based on your dendrogram and similarity values clusters = hierarchy.fcluster(hc, threshold, criterion='distance') ``` However, I'm not satisfied with the results and would like to experiment with MACCS Keys to see if they yield better clustering outcomes. Does anyone know how to cluster compounds using MACCS fingerprints? Any insights on the best approach to calculate similarities and cluster using these fingerprints would be highly appreciated. Thank you in advance for your suggestions! Ariadna Llop
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss