Hi, how about scaffold based clustering . You extract the scaffolds and then 
cluster it and then put the respective scaffold compounds inside the cluster . 

Sent from my iPhone

> On Aug 22, 2015, at 8:43 PM, Jing Lu <ajin...@gmail.com> wrote:
> 
> Dear RDKit users,
> 
> If I want to cluster more than 1M molecules by ECFP4. How could I do it? If I 
> calculate the distance between every pair of molecules, the size of distance 
> matrix will be too big. Does RDKit support any heuristic clustering algorithm 
> without calculating the distance matrix of the whole library?
> 
> 
> 
> Thanks,
> Jing
> ------------------------------------------------------------------------------
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to