Hello everyone,

I'm currently working with a dataset of chemical compounds, aiming to
cluster them into different series to create a 3D-QSAR model. Up to this
point, I've been using Morgan Fingerprints to generate the descriptors and
cluster the compounds based on their Tanimoto Similarity:

```
# Generate fingerprint descriptor database
fps = [AllChem.GetMorganFingerprintAsBitVect(m, 2) for m in mols]


# Calculate pairwise Tanimoto similarity between fingerprints
similarity_matrix = []
for i in range(len(fps)):
    similarities = []
    for j in range(len(fps)):
        similarities.append(DataStructs.TanimotoSimilarity(fps[i], fps[j]))

    similarity_matrix.append(similarities)
```


With the similarity matrix, I applied hierarchical clustering based on a
Tanimoto Similarity threshold to group similar compounds:

```
# Cluster based on Tanimoto similarity
dists = 1 - np.array(similarity_matrix)
hc = hierarchy.linkage(squareform(dists), method='single')

# Specify a distance threshold or number of clusters
threshold = 0.6  # Adjust this value based on your dendrogram and
similarity values
clusters = hierarchy.fcluster(hc, threshold, criterion='distance')
```

However, I'm not satisfied with the results and would like to experiment
with MACCS Keys to see if they yield better clustering outcomes. Does
anyone know how to cluster compounds using MACCS fingerprints? Any insights
on the best approach to calculate similarities and cluster using these
fingerprints would be highly appreciated.

Thank you in advance for your suggestions!

Ariadna Llop
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to