[Rdkit-discuss] Taylor-Butina clustering: cut-off

2021-09-24 Thread Francesca Magarotto - francesca.magarot...@studio.unibo.it
Hi, I have a question related to the cut-off in Taylor-Butina algorithm. I retrieved a set of 190,792 molecules in Smiles format from ZINC15. I split this dataset (190,792) in order to first perform the cluster analysis only on two small subsets (one contains 310 molecules and the other 1396 mole

Re: [Rdkit-discuss] Taylor-Butina clustering

2021-07-21 Thread David Cosgrove
Hi Francesca, The Taylor-Butina clustering is not hierarchical. It is a type of sphere exclusion algorithm. A useful image for the results would be the "centroid" of each cluster, possibly followed by the other cluster members. You will need to generate the images from the original input molecu

[Rdkit-discuss] Taylor-Butina clustering

2021-07-21 Thread Francesca Magarotto - francesca.magarot...@studio.unibo.it
Hi, I managed to performe Taylor-Butina clustering on a dataset of 193 571 fragments retrieved from ZINC20. I used the indications in this link https://www.macinchem.org/reviews/clustering/clustering.php Actually, I've never used RDKit before and never did a cluster analysis, so I'm really new t