Hi Francesca,

technically, it should be possible to read MOL2 files with RDKit (and to convert the structures into SDF, SMILES etc.) I found

https://chem-workflows.com/articles/2020/03/23/building-a-multi-molecule-mol2-reader-for-rdkit-v2/

as one example. Having said that, I'm wondering whether it would be easier to just download your structures (again) as SDF from ZINC.

Doing a similarity-based clustering for 191K compounds might take a while and / or require a lot of memory if you don't do it in a clever way. You may want to take a look at

https://www.macinchem.org/reviews/clustering/clustering.php

for an example of how to apply Taylor-Butina clustering to larger compound sets.

I personally prefer the topological fingerprints in RDKit for these kinds of tasks, others might suggest Morgan fingerprints. If you "only" want to pick a diverse subset, both approaches should give you a decent result.

Hope this helps,
Nils    

Am 29.06.2021 um 09:18 schrieb Francesca Magarotto - francesca.magarot...@studio.unibo.it:


<https://stackoverflow.com/posts/68168491/timeline>

  Hi,

I'm new to RDKit. I need to do a cluster analysis of a database of  compounds. I've downloaded 191K compounds from ZINC database in 3D mol2 format and now I need to obtain fingerprints using RDKit. First, I don't understand if it's possible to convert mol2 format into fingerprints and - above all - what kind of fingerprints is better for this type of analysis (I need to understand what chemotypes I have in the database in order to - eventually - find some representatives).

Does anyone have suggestions?(practical suggestions are really appreciated, too).

Thanks





_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
  • [Rdkit-discuss]... Francesca Magarotto - francesca.magarot...@studio.unibo.it
    • Re: [Rdkit... Nils Weskamp

Reply via email to