Hi Francesca,
technically, it should be possible to read MOL2 files with RDKit (and to
convert the structures into SDF, SMILES etc.) I found
https://chem-workflows.com/articles/2020/03/23/building-a-multi-molecule-mol2-reader-for-rdkit-v2/
as one example. Having said that, I'm wondering whether it would be
easier to just download your structures (again) as SDF from ZINC.
Doing a similarity-based clustering for 191K compounds might take a
while and / or require a lot of memory if you don't do it in a clever
way. You may want to take a look at
https://www.macinchem.org/reviews/clustering/clustering.php
for an example of how to apply Taylor-Butina clustering to larger
compound sets.
I personally prefer the topological fingerprints in RDKit for these
kinds of tasks, others might suggest Morgan fingerprints. If you "only"
want to pick a diverse subset, both approaches should give you a decent
result.
Hope this helps,
Nils
Am 29.06.2021 um 09:18 schrieb Francesca Magarotto -
francesca.magarot...@studio.unibo.it:
<https://stackoverflow.com/posts/68168491/timeline>
Hi,
I'm new to RDKit. I need to do a cluster analysis of a database of
compounds. I've downloaded 191K compounds from ZINC database in 3D
mol2 format and now I need to obtain fingerprints using RDKit. First, I
don't understand if it's possible to convert mol2 format into
fingerprints and - above all - what kind of fingerprints is better for
this type of analysis (I need to understand what chemotypes I have in
the database in order to - eventually - find some representatives).
Does anyone have suggestions?(practical suggestions are really
appreciated, too).
Thanks
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss